A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html below:

Website Navigation


n_features_in_ attribute — Scikit-learn enhancement proposals documentation

SLEP010: n_features_in_ attribute
Author:

Nicolas Hug

Status:

Final

Type:

Standards Track

Created:

2019-11-23

Implemented with v0.23.

Abstract

This SLEP proposes the introduction of a public n_features_in_ attribute for most estimators (where relevant).

Motivation

Knowing the number of features that an estimator expects is useful for inspection purposes. This is also useful for implementing the feature names propagation (SLEP 8) . For example any of the scaler can easily create feature names if they know n_features_in_.

Solution

The proposed solution is to replace most calls to check_array() or check_X_y() by calls to a newly created private method:

def _validate_data(self, X, y=None, reset=True, **check_array_params)
    ...

The _validate_data() method will call check_array() or check_X_y() function depending on the y parameter.

If the reset parameter is True (default), the method will set the n_feature_in_ attribute of the estimator, regardless of its potential previous value. This should typically be used in fit(), or in the first partial_fit() call. Passing reset=False will not set the attribute but instead check against it, and potentially raise an error. This should typically be used in predict() or transform(), or on subsequent calls to partial_fit.

In most cases, the n_features_in_ attribute exists only once fit has been called, but there are exceptions (see below).

A new common check is added: it makes sure that for most estimators, the n_features_in_ attribute does not exist until fit is called, and that its value is correct. Instead of raising an exception, this check will raise a warning for the next two releases. This will give downstream packages some time to adjust (see considerations below).

Since the introduced method is private, third party libraries are recommended not to rely on it.

The logic that is proposed here (calling a stateful method instead of a stateless function) is a pre-requisite to fixing the dataframe column ordering issue: with a stateless check_array, there is no way to raise an error if the column ordering of a dataframe was changed between fit and predict. This is however out os scope for this SLEP, which only focuses on the introduction of the n_features_in_ attribute.

Considerations

The main consideration is that the addition of the common test means that existing estimators in downstream libraries will not pass our test suite, unless the estimators also have the n_features_in_ attribute.

The newly introduced checks will only raise a warning instead of an exception for the next 2 releases, so this will give more time for downstream packages to adjust.

There are other minor considerations:

References and Footnotes Copyright

This document has been placed in the public domain. [1]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4