RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://cloud.google.com/python/docs/reference/bigframes/2.5.0/bigframes.ml.model_selection below:

Module model_selection (2.5.0) | Python client library

Skip to main content Module model_selection (2.5.0)

Stay organized with collections Save and categorize content based on your preferences.

Functions for test/train split and model tuning. This module is styled after scikit-learn's model_selection module: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection.

Classes KFold

KFold(n_splits: int = 5, *, random_state: typing.Optional[int] = None)

K-Fold cross-validator.

Split data in train/test sets. Split dataset into k consecutive folds.

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import KFold
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> kf = KFold(n_splits=3, random_state=42)
>>> for i, (X_train, X_test, y_train, y_test) in enumerate(kf.split(X, y)):
...     print(f"Fold {i}:")
...     print(f"  X_train: {X_train}")
...     print(f"  X_test: {X_test}")
...     print(f"  y_train: {y_train}")
...     print(f"  y_test: {y_test}")
...
Fold 0:
  X_train:    feat0  feat1
1      3      4
2      5      6
<BLANKLINE>
[2 rows x 2 columns]
  X_test:    feat0  feat1
0      1      2
<BLANKLINE>
[1 rows x 2 columns]
  y_train:    label
1      2
2      3
<BLANKLINE>
[2 rows x 1 columns]
  y_test:    label
0      1
<BLANKLINE>
[1 rows x 1 columns]
Fold 1:
  X_train:    feat0  feat1
0      1      2
2      5      6
<BLANKLINE>
[2 rows x 2 columns]
  X_test:    feat0  feat1
1      3      4
<BLANKLINE>
[1 rows x 2 columns]
  y_train:    label
0      1
2      3
<BLANKLINE>
[2 rows x 1 columns]
  y_test:    label
1      2
<BLANKLINE>
[1 rows x 1 columns]
Fold 2:
  X_train:    feat0  feat1
0      1      2
1      3      4
<BLANKLINE>
[2 rows x 2 columns]
  X_test:    feat0  feat1
2      5      6
<BLANKLINE>
[1 rows x 2 columns]
  y_train:    label
0      1
1      2
<BLANKLINE>
[2 rows x 1 columns]
  y_test:    label
2      3
<BLANKLINE>
[1 rows x 1 columns]

Parameters Name Description n_splits int

Number of folds. Must be at least 2. Default to 5.

random_state Optional[int]

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time. Default to None.

Modules Functions cross_validate

cross_validate(
    estimator,
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
    *,
    cv: typing.Optional[typing.Union[int, bigframes.ml.model_selection.KFold]] = None
) -> dict[str, list]

Evaluate metric(s) by cross-validation and also record fit/score times.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import cross_validate, KFold
>>> from bigframes.ml.linear_model import LinearRegression
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> model = LinearRegression()
>>> scores = cross_validate(model, X, y, cv=3) # doctest: +SKIP
>>> for score in scores["test_score"]: # doctest: +SKIP
...   print(score["mean_squared_error"][0])
...
5.218167286047954e-19
2.726229944928669e-18
1.6197635612324266e-17

Returns Type Description Dict[str, List] A dict of arrays containing the score/time arrays for each scorer is returned. The keys for this dict are: test_score The score array for test scores on each cv split. fit_time The time for fitting the estimator on the train set for each cv split. score_time The time for scoring the estimator on the test set for each cv split. train_test_split

train_test_split(
    *arrays: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    test_size: typing.Optional[float] = None,
    train_size: typing.Optional[float] = None,
    random_state: typing.Optional[int] = None,
    stratify: typing.Optional[bigframes.series.Series] = None
) -> typing.List[typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]]

Splits dataframes or series into random train and test subsets.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import train_test_split
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [0, 2, 4, 6, 8], "feat1": [1, 3, 5, 7, 9]})
>>> y = bpd.DataFrame({"label": [0, 1, 2, 3, 4]})
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
>>> X_train
    feat0  feat1
0      0      1
1      2      3
4      8      9
<BLANKLINE>
[3 rows x 2 columns]
>>> y_train
    label
0      0
1      1
4      4
<BLANKLINE>
[3 rows x 1 columns]
>>> X_test
    feat0  feat1
2      4      5
3      6      7
<BLANKLINE>
[2 rows x 2 columns]
>>> y_test
    label
2      2
3      3
<BLANKLINE>
[2 rows x 1 columns]

Parameters Name Description \*arrays bigframes.dataframe.DataFrame or bigframes.series.Series

A sequence of BigQuery DataFrames or Series that can be joined on their indexes.

test_size default None

The proportion of the dataset to include in the test split. If None, this will default to the complement of train_size. If both are none, it will be set to 0.25.

train_size default None

The proportion of the dataset to include in the train split. If None, this will default to the complement of test_size.

random_state default None

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-12 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-12 UTC."],[],[]]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4