RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://automl.github.io/auto-sklearn/master/api.html below:

APIs — AutoSklearn 0.15.0 documentation

time_left_for_this_taskint, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limitint, optional (default=1/10 of time_left_for_this_task)

Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

initial_configurations_via_metalearningint, optional (default=25)

Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_classType[AbstractEnsemble] | “default”, optional (default=”default”)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use SingleBest to obtain only use the single best model instead of an ensemble.

If set to “default” it will use EnsembleSelection for single-objective problems and MultiObjectiveDummyEnsemble for multi-objective problems.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

ensemble_nbestint, optional (default=50)

Only consider the ensemble_nbest models when building an ensemble. This is inspired by a concept called library pruning introduced in Getting Most out of Ensemble Selection. This is independent of the ensemble_class argument and this pruning step is done prior to constructing an ensemble.

max_models_on_disc: int, optional (default=50),

Defines the maximum number of models that are kept in the disc. The additional number of models are permanently deleted. Due to the nature of this variable, it sets the upper limit on how many models can be used for an ensemble. It must be an integer greater or equal than 1. If set to None, all models are kept on the disc.

seedint, optional (default=1)

Used to seed SMAC. Will determine the output file names.

memory_limitint, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB.

Important notes:

If None is provided, no memory limit is set.
In case of multi-processing, memory_limit will be per job, so the total usage is n_jobs x memory_limit.
The memory limit also applies to the ensemble creation process.

includeOptional[Dict[str, List[str]]] = None

If None, all possible algorithms are used.

Otherwise, specifies a step and the components that are included in search. See /pipeline/components/<step>/* for available components.

Incompatible with parameter exclude.

Possible Steps:

"data_preprocessor"
"balancing"
"feature_preprocessor"
"classifier" - Only for when when using AutoSklearnClasssifier
"regressor" - Only for when when using AutoSklearnRegressor

Example:

include = {
    'classifier': ["random_forest"],
    'feature_preprocessor': ["no_preprocessing"]
}

excludeOptional[Dict[str, List[str]]] = None

If None, all possible algorithms are used.

Otherwise, specifies a step and the components that are excluded from search. See /pipeline/components/<step>/* for available components.

Incompatible with parameter include.

Possible Steps:

"data_preprocessor"
"balancing"
"feature_preprocessor"
"classifier" - Only for when when using AutoSklearnClasssifier
"regressor" - Only for when when using AutoSklearnRegressor

Example:

exclude = {
    'classifier': ["random_forest"],
    'feature_preprocessor': ["no_preprocessing"]
}

resampling_strategystr | BaseCrossValidator | _RepeatedSplits | BaseShuffleSplit = “holdout”

How to to handle overfitting, might need to use resampling_strategy_arguments if using "cv" based method or a Splitter object.

Options
- "holdout" - Use a 67:33 (train:test) split
- "cv": perform cross validation, requires “folds” in resampling_strategy_arguments
- "holdout-iterative-fit" - Same as “holdout” but iterative fit where possible
- "cv-iterative-fit": Same as “cv” but iterative fit where possible
- "partial-cv": Same as “cv” but uses intensification.
- BaseCrossValidator - any BaseCrossValidator subclass (found in scikit-learn model_selection module)
- _RepeatedSplits - any _RepeatedSplits subclass (found in scikit-learn model_selection module)
- BaseShuffleSplit - any BaseShuffleSplit subclass (found in scikit-learn model_selection module)

If using a Splitter object that relies on the dataset retaining it’s current size and order, you will need to look at the dataset_compression argument and ensure that "subsample" is not included in the applied compression "methods" or disable it entirely with False.

resampling_strategy_argumentsOptional[Dict] = None

Additional arguments for resampling_strategy, this is required if using a cv based strategy. The default arguments if left as None are:

{
    "train_size": 0.67,     # The size of the training set
    "shuffle": True,        # Whether to shuffle before splitting data
    "folds": 5              # Used in 'cv' based resampling strategies
}

If using a custom splitter class, which takes n_splits such as PredefinedSplit, the value of "folds" will be used.

tmp_folderstring, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

delete_tmp_folder_after_terminate: bool, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

n_jobsint, optional, experimental

The number of jobs to run in parallel for fit(). -1 means using all processors.

Important notes:

By default, Auto-sklearn uses one core.
Ensemble building is not affected by n_jobs but can be controlled by the number of models in the ensemble.
predict() is not affected by n_jobs (in contrast to most scikit-learn models)
If dask_client is None, a new dask client is created.

dask_clientdask.distributed.Client, optional

User-created dask client, can be used to start a dask cluster and then attach auto-sklearn to it.

disable_evaluator_output: bool or list, optional (False)

If True, disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this True. Can also be used as a list to pass more fine-grained information on what to save. Allowed elements in the list are:

'y_optimization' : do not save the predictions for the optimization set, which would later on be used to build an ensemble.
model : do not save any model files

smac_scenario_argsdict, optional (None)

Additional arguments inserted into the scenario of SMAC. See the SMAC documentation for a list of available arguments.

get_smac_object_callbackcallable

Callback function to create an object of class smac.optimizer.smbo.SMBO. The function must accept the arguments scenario_dict, instances, num_params, runhistory, seed and ta. This is an advanced feature. Use only if you are familiar with SMAC.

logging_configdict, optional (None)

dictionary object specifying the logger configuration. If None, the default logging.yaml file is used, which can be found in the directory util/logging.yaml relative to the installation.

metadata_directorystr, optional (None)

path to the metadata directory. If None, the default directory (autosklearn.metalearning.files) is used.

metricScorer, optional (None)

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics. If None is provided, a default metric is selected depending on the task.

scoring_functionsList[Scorer], optional (None)

List of scorers which will be calculated for each pipeline and results will be available via cv_results

load_modelsbool, optional (True)

Whether to load the models after fitting Auto-sklearn.

get_trials_callback: callable

A callable with the following definition.

(smac.SMBO, smac.RunInfo, smac.RunValue, time_left: float) -> bool | None

This will be called after SMAC, the underlying optimizer for autosklearn, finishes training each run.

You can use this to record your own information about the optimization process. You can also use this to enable a early stopping based on some critera.

See the example: Early Stopping And Callbacks.

dataset_compression: Union[bool, Mapping[str, Any]] = True

We compress datasets so that they fit into some predefined amount of memory. Currently this does not apply to dataframes or sparse arrays, only to raw numpy arrays.

NOTE - If using a custom resampling_strategy that relies on specific size or ordering of data, this must be disabled to preserve these properties.

You can disable this entirely by passing False or leave as the default True for configuration below.

{
    "memory_allocation": 0.1,
    "methods": ["precision", "subsample"]
}

You can also pass your own configuration with the same keys and choosing from the available "methods".

The available options are described here:

memory_allocation

By default, we attempt to fit the dataset into 0.1 * memory_limit. This float value can be set with "memory_allocation": 0.1. We also allow for specifying absolute memory in MB, e.g. 10MB is "memory_allocation": 10.

The memory used by the dataset is checked after each reduction method is performed. If the dataset fits into the allocated memory, any further methods listed in "methods" will not be performed.

For example, if methods: ["precision", "subsample"] and the "precision" reduction step was enough to make the dataset fit into memory, then the "subsample" reduction step will not be performed.
methods
We provide the following methods for reducing the dataset size. These can be provided in a list and are performed in the order as given.
- "precision" - We reduce floating point precision as follows: * np.float128 -> np.float64 * np.float96 -> np.float64 * np.float64 -> np.float32
- subsample - We subsample data such that it fits directly into the memory allocation memory_allocation * memory_limit. Therefore, this should likely be the last method listed in "methods". Subsampling takes into account classification labels and stratifies accordingly. We guarantee that at least one occurrence of each label is included in the sampled set.

allow_string_features: bool = True

Whether autosklearn should process string features. By default the textpreprocessing is enabled.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4