class CatBoostClassifier(iterations=None,
learning_rate=None,
depth=None,
l2_leaf_reg=None,
model_size_reg=None,
rsm=None,
loss_function=None,
border_count=None,
feature_border_type=None,
per_float_feature_quantization=None,
input_borders=None,
output_borders=None,
fold_permutation_block=None,
od_pval=None,
od_wait=None,
od_type=None,
nan_mode=None,
counter_calc_method=None,
leaf_estimation_iterations=None,
leaf_estimation_method=None,
thread_count=None,
random_seed=None,
use_best_model=None,
verbose=None,
logging_level=None,
metric_period=None,
ctr_leaf_count_limit=None,
store_all_simple_ctr=None,
max_ctr_complexity=None,
has_time=None,
allow_const_label=None,
classes_count=None,
class_weights=None,
auto_class_weights=None,
one_hot_max_size=None,
random_strength=None,
name=None,
ignored_features=None,
train_dir=None,
custom_loss=None,
custom_metric=None,
eval_metric=None,
bagging_temperature=None,
save_snapshot=None,
snapshot_file=None,
snapshot_interval=None,
fold_len_multiplier=None,
used_ram_limit=None,
gpu_ram_part=None,
allow_writing_files=None,
final_ctr_computation_mode=None,
approx_on_full_history=None,
boosting_type=None,
simple_ctr=None,
combinations_ctr=None,
per_feature_ctr=None,
task_type=None,
device_config=None,
devices=None,
bootstrap_type=None,
subsample=None,
sampling_unit=None,
dev_score_calc_obj_block_size=None,
max_depth=None,
n_estimators=None,
num_boost_round=None,
num_trees=None,
colsample_bylevel=None,
random_state=None,
reg_lambda=None,
objective=None,
eta=None,
max_bin=None,
scale_pos_weight=None,
gpu_cat_features_storage=None,
data_partition=None
metadata=None,
early_stopping_rounds=None,
cat_features=None,
grow_policy=None,
min_data_in_leaf=None,
min_child_samples=None,
max_leaves=None,
num_leaves=None,
score_function=None,
leaf_estimation_backtracking=None,
ctr_history_unit=None,
monotone_constraints=None,
feature_weights=None,
penalties_coefficient=None,
first_feature_use_penalties=None,
model_shrink_rate=None,
model_shrink_mode=None,
langevin=None,
diffusion_temperature=None,
posterior_sampling=None,
boost_from_average=None,
text_features=None,
tokenizers=None,
dictionaries=None,
feature_calcers=None,
text_processing=None,
fixed_binary_splits=None)
Purpose
Training and applying models for the classification problems. Provides compatibility with the scikit-learn tools.
The default optimized objective depends on various conditions:
target_border
parameter is not None.border_count
parameter is None.The key-value string pairs to store in the model's metadata storage after the training.
Default value
None
cat_features DescriptionA one-dimensional array of categorical columns indices (specified as integers) or names (specified as strings).
This array can contain both indices and names for different elements.
If any features in the cat_features
parameter are specified as names instead of indices, feature names must be provided for the training dataset. Therefore, the type of the X
parameter in the future calls of the fit
function must be either catboost.Pool with defined feature names data or pandas.DataFrame with defined column names.
Note
If this parameter is not None and the training dataset passed as the value of the X parameter to the fit function of this class has the catboost.Pool type, CatBoost checks the equivalence of the categorical features indices specification in this object and the one in the catboost.Pool object.
If this parameter is not None, passing objects of the catboost.FeaturesData type as the X parameter to the fit function of this class is prohibited.
Default value
None (all features are either considered numerical or of other types if specified precisely)
text_features DescriptionA one-dimensional array of text columns indices (specified as integers) or names (specified as strings).
Use only if the data parameter is a two-dimensional feature matrix (has one of the following types: list, numpy.ndarray, pandas.DataFrame, pandas.Series).
If any elements in this array are specified as names instead of indices, names for all columns must be provided. To do this, either use the feature_names
parameter of this constructor to explicitly specify them or pass a pandas.DataFrame with column names specified in the data parameter.
Default value
None (all features are either considered numerical or of other types if specified precisely)
See Python package training parameters for the full list of parameters.
Note
Some parameters duplicate the ones specified for the fit method. In these cases the values specified for the fit method take precedence.
Attributes tree_count_Return the number of trees in the model.
This number can differ from the value specified in the --iterations
training parameter in the following cases:
--use-best-model
training parameter is set to True.
Return the calculated feature importances. The output data depends on the type of the model's loss function:
The random seed used for training.
learning_rate_The learning rate used for training.
feature_names_The names of features in the dataset.
evals_result_Return the values of metrics calculated during the training.
best_score_Return the best result for each metric calculated on each validation dataset.
best_iteration_Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.
classes_Return the names of classes for classification models. An empty list is returned for all other models.
The order of classes in this list corresponds to the order of classes in resulting predictions.
Methods fitTrain a model.
predictApply the model to the given dataset.
predict_probaApply the model to the given dataset to predict the probability that the object belongs to the given classes.
calc_leaf_indexesReturns indexes of leafs to which objects from pool are mapped by model trees.
calc_feature_statisticsCalculate and plot a set of statistics for the chosen feature.
compareDraw train and evaluation metrics in Jupyter Notebook for two trained models.
copyCopy the CatBoost object.
eval_metricsCalculate the specified metrics for the specified dataset.
get_all_paramsReturn the values of all training parameters (including the ones that are not explicitly specified by users).
get_best_iterationReturn the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.
get_best_scoreReturn the best result for each metric calculated on each validation dataset.
get_bordersReturn the list of borders for numerical features.
get_evals_resultReturn the values of metrics calculated during the training.
get_feature_importanceCalculate and return the feature importances.
get_metadataReturn a proxy object with metadata from the model's internal key-value string storage.
get_object_importanceCalculate the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:
Return the value of the given parameter if it is explicitly by the user before starting the training. If this parameter is used with the default value, this function returns None.
get_paramsReturn the values of training parameters that are explicitly specified by the user. If all parameters are used with their default values, this function returns an empty dict.
get_probability_thresholdGet a threshold for class separation in binary classification task for a trained model.
get_scale_and_biasReturn the scale and bias of the model.
These values affect the results of applying the model, since the model prediction results are calculated as follows:
∑ l e a f _ v a l u e s ⋅ s c a l e + b i a s \sum leaf\_values \cdot scale + bias ∑leaf_values⋅scale+bias
Return the formula values that were calculated for the objects from the validation dataset provided for training.
grid_searchA simple grid search over specified parameter values for a model.
is_fittedCheck whether the model is trained.
load_modelLoad the model from a file.
plot_predictionsSequentially vary the value of the specified features to put them into all buckets and calculate predictions for the input objects accordingly.
plot_treeVisualize the CatBoost decision trees.
randomized_searchA simple randomized search on hyperparameters.
save_bordersSave the model borders to a file.
save_modelSave the model to a file.
scoreCalculate the Accuracy metric for the objects in the given dataset.
select_featuresSelect the best features from the dataset using the Recursive Feature Elimination algorithm.
set_feature_namesSet names for all features in the model.
set_paramsSet the training parameters.
set_probability_thresholdSet a threshold for class separation in binary classification task for a trained model.
set_scale_and_biasSet the scale and bias.
shrinkShrink the model. Only trees with indices from the range [ntree_start, ntree_end)
are kept.
Apply the model to the given dataset and calculate the results taking into consideration only the trees in the range [0; i).
staged_predict_probaApply the model to the given dataset to predict the probability that the object belongs to the class and calculate the results taking into consideration only the trees in the range [0; i).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4