Stay organized with collections Save and categorize content based on your preferences.
DataFrame(
data=None,
index: vendored_pandas_typing.Axes | None = None,
columns: vendored_pandas_typing.Axes | None = None,
dtype: typing.Optional[
bigframes.dtypes.DtypeString | bigframes.dtypes.Dtype
] = None,
copy: typing.Optional[bool] = None,
*,
session: typing.Optional[bigframes.session.Session] = None
)
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Properties TThe transpose of the DataFrame.
All columns must be the same dtype (numerics can be coerced to a common supertype).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
col1 col2
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.T
0 1
col1 1 2
col2 3 4
<BLANKLINE>
[2 rows x 2 columns]
Returns Type Description bigframes.pandas.DataFrame
The transposed DataFrame. ai
Returns the accessor for AI operators.
atAccess a single value for a row/column label pair.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
A B C
4 0 2 3
5 0 4 1
6 10 20 30
<BLANKLINE>
[3 rows x 3 columns]
Get value at specified row/column pair
>>> df.at[4, 'B']
np.int64(2)
Get value within a series
>>> df.loc[5].at['B']
np.int64(4)
bqclient
BigQuery REST API Client the DataFrame uses for operations.
columnsThe column labels of the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can access the column labels of a DataFrame via columns
property.
>>> df = bpd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
... 'Age': [25, 30, 35],
... 'Location': ['Seattle', 'New York', 'Kona']},
... index=([10, 20, 30]))
>>> df
Name Age Location
10 Alice 25 Seattle
20 Bob 30 New York
30 Aritra 35 Kona
<BLANKLINE>
[3 rows x 3 columns]
>>> df.columns
Index(['Name', 'Age', 'Location'], dtype='object')
You can also set new labels for columns.
>>> df.columns = ["NewName", "NewAge", "NewLocation"]
>>> df
NewName NewAge NewLocation
10 Alice 25 Seattle
20 Bob 30 New York
30 Aritra 35 Kona
<BLANKLINE>
[3 rows x 3 columns]
>>> df.columns
Index(['NewName', 'NewAge', 'NewLocation'], dtype='object')
dtypes
Return the dtypes in the DataFrame.
This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types aren't supported yet in BigQuery DataFrames.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'float': [1.0], 'int': [1], 'string': ['foo']})
>>> df.dtypes
float Float64
int Int64
string string[pyarrow]
dtype: object
Returns Type Description pandas.Series
A *pandas* Series with the data type of each column. empty
Indicates whether Series/DataFrame is empty.
True if Series/DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
Note: If Series/DataFrame contains only NA values, it is still not considered empty. Returns Type Descriptionbool
If Series/DataFrame is empty, return True, if not return False. iat
Access a single value for a row/column pair by integer position.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C'])
>>> df
A B C
0 0 2 3
1 0 4 1
2 10 20 30
<BLANKLINE>
[3 rows x 3 columns]
Get value at specified row/column pair
>>> df.iat[1, 2]
np.int64(1)
Get value within a series
>>> df.loc[0].iat[1]
np.int64(2)
iloc
Purely integer-location based indexing for selection by position.
indexThe index (row labels) of the DataFrame.
The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type. The index is used for label-based access and alignment, and can be accessed or modified using this attribute.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can access the index of a DataFrame via index
property.
>>> df = bpd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
... 'Age': [25, 30, 35],
... 'Location': ['Seattle', 'New York', 'Kona']},
... index=([10, 20, 30]))
>>> df
Name Age Location
10 Alice 25 Seattle
20 Bob 30 New York
30 Aritra 35 Kona
<BLANKLINE>
[3 rows x 3 columns]
>>> df.index # doctest: +ELLIPSIS
Index([10, 20, 30], dtype='Int64')
>>> df.index.values
array([10, 20, 30])
Let's try setting a new index for the dataframe and see that reflect via
index
property.
>>> df1 = df.set_index(["Name", "Location"])
>>> df1
Age
Name Location
Alice Seattle 25
Bob New York 30
Aritra Kona 35
<BLANKLINE>
[3 rows x 1 columns]
>>> df1.index # doctest: +ELLIPSIS
MultiIndex([( 'Alice', 'Seattle'),
( 'Bob', 'New York'),
('Aritra', 'Kona')],
names=['Name', 'Location'])
>>> df1.index.values
array([('Alice', 'Seattle'), ('Bob', 'New York'), ('Aritra', 'Kona')],
dtype=object)
Returns Type Description Index
The index object of the DataFrame. loc
Access a group of rows and columns by label(s) or a boolean array.
ndimReturn an int representing the number of axes / array dimensions.
Returns Type Descriptionint
Return 1 if Series. Otherwise return 2 if DataFrame. plot
Make plots of Dataframes.
query_jobBigQuery job metadata for the most recent query.
Returns Type DescriptionNone or google.cloud.bigquery.QueryJob
The most recent QueryJob
_. semantics
API documentation for semantics
property.
Return a tuple representing the dimensionality of the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2, 3],
... 'col2': [4, 5, 6]})
>>> df.shape
(3, 2)
Returns Type Description Tuple[int, int]
Tuple of array dimensions. size
Return an int representing the number of elements in this object.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4
Returns Type Description int
Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame. sql
Compiles this DataFrame's expression tree to SQL.
Returns Type Descriptionstr
string representing the compiled SQL. struct
API documentation for struct
property.
Return the values of DataFrame in the form of a NumPy array.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.values
array([[1, 3],
[2, 4]], dtype=object)
Returns Type Description numpy.ndarray
The values of the DataFrame. Methods __add__
__add__(other) -> bigframes.dataframe.DataFrame
Get addition of DataFrame and other, column-wise, using arithmetic operator +
.
Equivalent to DataFrame.add(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'height': [1.5, 2.6],
... 'weight': [500, 800]
... },
... index=['elk', 'moose'])
>>> df
height weight
elk 1.5 500
moose 2.6 800
<BLANKLINE>
[2 rows x 2 columns]
Adding a scalar affects all rows and columns.
>>> df + 1.5
height weight
elk 3.0 501.5
moose 4.1 801.5
<BLANKLINE>
[2 rows x 2 columns]
You can add another DataFrame with index and columns aligned.
>>> delta = bpd.DataFrame({
... 'height': [0.5, 0.9],
... 'weight': [50, 80]
... },
... index=['elk', 'moose'])
>>> df + delta
height weight
elk 2.0 550
moose 3.5 880
<BLANKLINE>
[2 rows x 2 columns]
Adding any mis-aligned index and columns will result in invalid values.
>>> delta = bpd.DataFrame({
... 'depth': [0.5, 0.9, 1.0],
... 'weight': [50, 80, 100]
... },
... index=['elk', 'moose', 'bison'])
>>> df + delta
depth height weight
elk <NA> <NA> 550
moose <NA> <NA> 880
bison <NA> <NA> <NA>
<BLANKLINE>
[3 rows x 3 columns]
Parameter Name Description other
scalar or DataFrame
Object to be added to the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of adding other
to DataFrame. __and__
__and__(
other: bool | int | bigframes.series.Series,
) -> bigframes.dataframe.DataFrame
Get bitwise AND of DataFrame and other, element-wise, using operator &
.
other
scalar, Series or DataFrame
Object to bitwise AND with the DataFrame.
__array____array__(dtype=None, copy: typing.Optional[bool] = None) -> numpy.ndarray
Returns the rows as NumPy array.
Equivalent to DataFrame.to_numpy(dtype)
.
Users should not call this directly. Rather, it is invoked by numpy.array
and numpy.asarray
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [11, 22, 33]})
>>> np.array(df)
array([[1, 11],
[2, 22],
[3, 33]], dtype=object)
>>> np.asarray(df)
array([[1, 11],
[2, 22],
[3, 33]], dtype=object)
Parameters Name Description dtype
str or numpy.dtype, optional
The dtype to use for the resulting NumPy array. By default, the dtype is inferred from the data.
copy
bool or None, optional
Whether to copy the data, False is not supported.
Returns Type Descriptionnumpy.ndarray
The rows in the DataFrame converted to a numpy.ndarray
with the specified dtype. __array_ufunc__
__array_ufunc__(
ufunc: numpy.ufunc, method: str, *inputs, **kwargs
) -> bigframes.dataframe.DataFrame
Used to support numpy ufuncs. See: https://numpy.org/doc/stable/reference/ufuncs.html
__eq____eq__(other) -> bigframes.dataframe.DataFrame
Check equality of DataFrame and other, element-wise, using logical operator ==
.
Equivalent to DataFrame.eq(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, 3, 4],
... 'b': [360, 0, 180]
... })
>>> df == 0
a b
0 True False
1 False True
2 False False
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to be compared to the DataFrame for equality.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of comparing other
to DataFrame. __floordiv__
Get integer division of DataFrame by other, using arithmetic operator //
.
Equivalent to DataFrame.floordiv(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can divide by a scalar:
>>> df = bpd.DataFrame({"a": [15, 15, 15], "b": [30, 30, 30]})
>>> df // 2
a b
0 7 15
1 7 15
2 7 15
<BLANKLINE>
[3 rows x 2 columns]
You can also divide by another DataFrame with index and column labels aligned:
>>> divisor = bpd.DataFrame({"a": [2, 3, 4], "b": [5, 6, 7]})
>>> df // divisor
a b
0 7 6
1 5 5
2 3 4
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to divide the DataFrame by.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the integer divison. __ge__
__ge__(other) -> bigframes.dataframe.DataFrame
Check whether DataFrame is greater than or equal to other, element-wise, using logical operator >=
.
Equivalent to DataFrame.ge(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, -1, 1],
... 'b': [1, 0, -1]
... })
>>> df >= 0
a b
0 True True
1 False True
2 True False
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to be compared to the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of comparing other
to DataFrame. __getitem__
__getitem__(
key: typing.Union[
typing.Hashable,
typing.Sequence[typing.Hashable],
pandas.core.indexes.base.Index,
bigframes.series.Series,
],
)
Gets the specified column(s) from the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... "name" : ["alpha", "beta", "gamma"],
... "age": [20, 30, 40],
... "location": ["WA", "NY", "CA"]
... })
>>> df
name age location
0 alpha 20 WA
1 beta 30 NY
2 gamma 40 CA
<BLANKLINE>
[3 rows x 3 columns]
You can specify a column label to retrieve the corresponding Series.
>>> df["name"]
0 alpha
1 beta
2 gamma
Name: name, dtype: string
You can specify a list of column labels to retrieve a Dataframe.
>>> df[["name", "age"]]
name age
0 alpha 20
1 beta 30
2 gamma 40
<BLANKLINE>
[3 rows x 2 columns]
You can specify a condition as a series of booleans to retrieve matching rows.
>>> df[df["age"] > 25]
name age location
1 beta 30 NY
2 gamma 40 CA
<BLANKLINE>
[2 rows x 3 columns]
You can specify a pandas Index with desired column labels.
>>> import pandas as pd
>>> df[pd.Index(["age", "location"])]
age location
0 20 WA
1 30 NY
2 40 CA
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description key
index
Index or list of indices. It can be a column label, a list of column labels, a Series of booleans or a pandas Index of desired column labels
__gt____gt__(other) -> bigframes.dataframe.DataFrame
Check whether DataFrame is greater than other, element-wise, using logical operator >
.
Equivalent to DataFrame.gt(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, -1, 1],
... 'b': [1, 0, -1]
... })
>>> df > 0
a b
0 False True
1 False False
2 True False
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to be compared to the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of comparing other
to DataFrame. __invert__
__invert__() -> bigframes.dataframe.DataFrame
Returns the bitwise inversion of the DataFrame, element-wise using operator ````.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'a':[True, False, True], 'b':[-1, 0, 1]})
>>> `df`
a b
0 False 0
1 True -1
2 False -2
<BLANKLINE>
[3 rows x 2 columns]
Returns Type Description bigframes.pandas.DataFrame
The result of inverting elements in the input. __le__
__le__(other) -> bigframes.dataframe.DataFrame
Check whether DataFrame is less than or equal to other, element-wise, using logical operator <=
.
Equivalent to DataFrame.le(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, -1, 1],
... 'b': [1, 0, -1]
... })
>>> df <= 0
a b
0 True False
1 True True
2 False True
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to be compared to the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of comparing other
to DataFrame. __len__
Returns number of rows in the DataFrame, serves len
operator.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, 1, 2],
... 'b': [3, 4, 5]
... })
>>> len(df)
3
__lt__
__lt__(other) -> bigframes.dataframe.DataFrame
Check whether DataFrame is less than other, element-wise, using logical operator <
.
Equivalent to DataFrame.lt(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, -1, 1],
... 'b': [1, 0, -1]
... })
>>> df < 0
a b
0 False False
1 True False
2 False True
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to be compared to the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of comparing other
to DataFrame. __matmul__
__matmul__(other) -> bigframes.dataframe.DataFrame
Compute the matrix multiplication between the DataFrame and other, using operator @
.
Equivalent to DataFrame.dot(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> left = bpd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> left
0 1 2 3
0 0 1 -2 -1
1 1 1 1 1
<BLANKLINE>
[2 rows x 4 columns]
>>> right = bpd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> right
0 1
0 0 1
1 1 2
2 -1 -1
3 2 0
<BLANKLINE>
[4 rows x 2 columns]
>>> left @ right
0 1
0 1 4
1 2 2
<BLANKLINE>
[2 rows x 2 columns]
The operand can be a Series, in which case the result will also be a Series:
>>> right = bpd.Series([1, 2, -1,0])
>>> left @ right
0 4
1 2
dtype: Int64
Parameter Name Description other
DataFrame or Series
Object to be matrix multiplied with the DataFrame.
Returns Type DescriptionDataFrame or Series
The result of the matrix multiplication. __mod__
Get modulo of DataFrame with other, element-wise, using operator %
.
Equivalent to DataFrame.mod(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can modulo with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df % 3
a b
0 1 1
1 2 2
2 0 0
<BLANKLINE>
[3 rows x 2 columns]
You can also modulo with another DataFrame with index and column labels aligned:
>>> modulo = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df % modulo
a b
0 1 1
1 0 2
2 1 0
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to modulo the DataFrame by.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the modulo. __mul__
Get multiplication of DataFrame with other, element-wise, using operator *
.
Equivalent to DataFrame.mul(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can multiply with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df * 3
a b
0 3 12
1 6 15
2 9 18
<BLANKLINE>
[3 rows x 2 columns]
You can also multiply with another DataFrame with index and column labels aligned:
>>> df1 = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df * df1
a b
0 2 12
1 4 15
2 6 18
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to multiply with the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the multiplication. __ne__
__ne__(other) -> bigframes.dataframe.DataFrame
Check inequality of DataFrame and other, element-wise, using logical operator !=
.
Equivalent to DataFrame.ne(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'a': [0, 3, 4],
... 'b': [360, 0, 180]
... })
>>> df != 0
a b
0 False True
1 True False
2 True True
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to be compared to the DataFrame for inequality.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of comparing other
to DataFrame. __or__
__or__(
other: bool | int | bigframes.series.Series,
) -> bigframes.dataframe.DataFrame
Get bitwise OR of DataFrame and other, element-wise, using operator |
.
other
scalar, Series or DataFrame
Object to bitwise OR with the DataFrame.
__pow__Get exponentiation of DataFrame with other, element-wise, using operator **
.
Equivalent to DataFrame.pow(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can exponentiate with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df ** 2
a b
0 1 16
1 4 25
2 9 36
<BLANKLINE>
[3 rows x 2 columns]
You can also exponentiate with another DataFrame with index and column labels aligned:
>>> exponent = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df ** exponent
a b
0 1 64
1 4 125
2 9 216
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to exponentiate the DataFrame with.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the exponentiation. __radd__
__radd__(other) -> bigframes.dataframe.DataFrame
Get addition of DataFrame and other, column-wise, using arithmetic operator +
.
Equivalent to DataFrame.add(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'height': [1.5, 2.6],
... 'weight': [500, 800]
... },
... index=['elk', 'moose'])
>>> df
height weight
elk 1.5 500
moose 2.6 800
<BLANKLINE>
[2 rows x 2 columns]
Adding a scalar affects all rows and columns.
>>> df + 1.5
height weight
elk 3.0 501.5
moose 4.1 801.5
<BLANKLINE>
[2 rows x 2 columns]
You can add another DataFrame with index and columns aligned.
>>> delta = bpd.DataFrame({
... 'height': [0.5, 0.9],
... 'weight': [50, 80]
... },
... index=['elk', 'moose'])
>>> df + delta
height weight
elk 2.0 550
moose 3.5 880
<BLANKLINE>
[2 rows x 2 columns]
Adding any mis-aligned index and columns will result in invalid values.
>>> delta = bpd.DataFrame({
... 'depth': [0.5, 0.9, 1.0],
... 'weight': [50, 80, 100]
... },
... index=['elk', 'moose', 'bison'])
>>> df + delta
depth height weight
elk <NA> <NA> 550
moose <NA> <NA> 880
bison <NA> <NA> <NA>
<BLANKLINE>
[3 rows x 3 columns]
Parameter Name Description other
scalar or DataFrame
Object to be added to the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of adding other
to DataFrame. __rand__
__rand__(
other: bool | int | bigframes.series.Series,
) -> bigframes.dataframe.DataFrame
Get bitwise AND of DataFrame and other, element-wise, using operator &
.
other
scalar, Series or DataFrame
Object to bitwise AND with the DataFrame.
__repr__Converts a DataFrame to a string. Calls to_pandas.
Only represents the first <xref uid="bigframes.options">bigframes.options</xref>.display.max_rows
.
Get integer divison of other by DataFrame.
Equivalent to DataFrame.rfloordiv(other)
.
other
scalar or DataFrame
Object to divide by the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the integer divison. __rmod__
Get integer divison of other by DataFrame.
Equivalent to DataFrame.rmod(other)
.
other
scalar or DataFrame
Object to modulo by the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the modulo. __rmul__
Get multiplication of DataFrame with other, element-wise, using operator *
.
Equivalent to DataFrame.rmul(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can multiply with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df * 3
a b
0 3 12
1 6 15
2 9 18
<BLANKLINE>
[3 rows x 2 columns]
You can also multiply with another DataFrame with index and column labels aligned:
>>> df1 = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df * df1
a b
0 2 12
1 4 15
2 6 18
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to multiply the DataFrame with.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the multiplication. __ror__
__ror__(
other: bool | int | bigframes.series.Series,
) -> bigframes.dataframe.DataFrame
Get bitwise OR of DataFrame and other, element-wise, using operator |
.
other
scalar, Series or DataFrame
Object to bitwise OR with the DataFrame.
__rpow__Get exponentiation of other with DataFrame, element-wise, using operator **
.
Equivalent to DataFrame.rpow(other)
.
other
scalar or DataFrame
Object to exponentiate with the DataFrame.
Returns Type DescriptionDataFrame
The result of the exponentiation. __rsub__
Get subtraction of DataFrame from other, element-wise, using operator -
.
Equivalent to DataFrame.rsub(other)
.
other
scalar or DataFrame
Object to subtract the DataFrame from.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the subtraction. __rtruediv__
Get division of other by DataFrame, element-wise, using operator /
.
Equivalent to DataFrame.rtruediv(other)
.
other
scalar or DataFrame
Object to divide by the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the division. __rxor__
__rxor__(
other: bool | int | bigframes.series.Series,
) -> bigframes.dataframe.DataFrame
Get bitwise XOR of DataFrame and other, element-wise, using operator ^
.
other
scalar, Series or DataFrame
Object to bitwise XOR with the DataFrame.
__setitem____setitem__(key: str, value: SingleItemValue)
Modify or insert a column into the DataFrame.
Note: This does not modify the original table the DataFrame was derived from. Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... "name" : ["alpha", "beta", "gamma"],
... "age": [20, 30, 40],
... "location": ["WA", "NY", "CA"]
... })
>>> df
name age location
0 alpha 20 WA
1 beta 30 NY
2 gamma 40 CA
<BLANKLINE>
[3 rows x 3 columns]
You can add assign a constant to a new column.
>>> df["country"] = "USA"
>>> df
name age location country
0 alpha 20 WA USA
1 beta 30 NY USA
2 gamma 40 CA USA
<BLANKLINE>
[3 rows x 4 columns]
You can assign a Series to a new column.
>>> df["new_age"] = df["age"] + 5
>>> df
name age location country new_age
0 alpha 20 WA USA 25
1 beta 30 NY USA 35
2 gamma 40 CA USA 45
<BLANKLINE>
[3 rows x 5 columns]
You can assign a Series to an existing column.
>>> df["new_age"] = bpd.Series([29, 39, 19], index=[1, 2, 0])
>>> df
name age location country new_age
0 alpha 20 WA USA 19
1 beta 30 NY USA 29
2 gamma 40 CA USA 39
<BLANKLINE>
[3 rows x 5 columns]
Parameters Name Description key
column index
It can be a new column to be inserted, or an existing column to be modified.
value
scalar or Series
Value to be assigned to the column
__sub__Get subtraction of other from DataFrame, element-wise, using operator -
.
Equivalent to DataFrame.sub(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can subtract a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df - 2
a b
0 -1 2
1 0 3
2 1 4
<BLANKLINE>
[3 rows x 2 columns]
You can also subtract another DataFrame with index and column labels aligned:
>>> df1 = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df - df1
a b
0 -1 1
1 0 2
2 1 3
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to subtract from the DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the subtraction. __truediv__
Get division of DataFrame by other, element-wise, using operator /
.
Equivalent to DataFrame.truediv(other)
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can multiply with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df / 2
a b
0 0.5 2.0
1 1.0 2.5
2 1.5 3.0
<BLANKLINE>
[3 rows x 2 columns]
You can also multiply with another DataFrame with index and column labels aligned:
>>> denominator = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df / denominator
a b
0 0.5 1.333333
1 1.0 1.666667
2 1.5 2.0
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description other
scalar or DataFrame
Object to divide the DataFrame by.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of the division. __xor__
__xor__(
other: bool | int | bigframes.series.Series,
) -> bigframes.dataframe.DataFrame
Get bitwise XOR of DataFrame and other, element-wise, using operator ^
.
other
scalar, Series or DataFrame
Object to bitwise XOR with the DataFrame.
absabs() -> bigframes.dataframe.DataFrame
Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series
A Series or DataFrame containing the absolute value of each element. add
add(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get addition of DataFrame and other, element-wise (binary operator +
).
Equivalent to dataframe + other
. With reverse version, radd
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].add(df['B'])
0 5
1 7
2 9
dtype: Int64
You can also use arithmetic operator +
:
>>> df['A'] + df['B']
0 5
1 7
2 9
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. add_prefix
add_prefix(
prefix: str, axis: int | str | None = None
) -> bigframes.dataframe.DataFrame
Prefix labels with string prefix
.
For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.
Parameters Name Descriptionprefix
str
The string to add before each label.
axis
int or str or None, default None
{{0 or 'index', 1 or 'columns', None}}
, default None. Axis to add prefix on.
bigframes.pandas.DataFrame or bigframes.pandas.Series
New Series or DataFrame with updated labels. add_suffix
add_suffix(
suffix: str, axis: int | str | None = None
) -> bigframes.dataframe.DataFrame
Suffix labels with string suffix
.
For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series
New Series or DataFrame with updated labels. agg
agg(
func: typing.Union[
str,
typing.Sequence[str],
typing.Mapping[typing.Hashable, typing.Union[typing.Sequence[str], str]],
],
) -> bigframes.dataframe.DataFrame | bigframes.series.Series
Aggregate using one or more operations over columns.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
Using a single function:
>>> df.agg('sum')
A 6
B 6
dtype: Int64
Using a list of functions:
>>> df.agg(['sum', 'mean'])
A B
sum 6.0 6.0
mean 2.0 2.0
<BLANKLINE>
[2 rows x 2 columns]
Parameter Name Description func
function
Function to use for aggregating the data. Accepted combinations are: string function name, list of function names, e.g. ['sum', 'mean']
.
aggregate(
func: typing.Union[
str,
typing.Sequence[str],
typing.Mapping[typing.Hashable, typing.Union[typing.Sequence[str], str]],
],
) -> bigframes.dataframe.DataFrame | bigframes.series.Series
Aggregate using one or more operations over columns.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
Using a single function:
>>> df.agg('sum')
A 6
B 6
dtype: Int64
Using a list of functions:
>>> df.agg(['sum', 'mean'])
A B
sum 6.0 6.0
mean 2.0 2.0
<BLANKLINE>
[2 rows x 2 columns]
Parameter Name Description func
function
Function to use for aggregating the data. Accepted combinations are: string function name, list of function names, e.g. ['sum', 'mean']
.
align(
other: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
join: str = "outer",
axis: typing.Optional[typing.Union[str, int]] = None,
) -> typing.Tuple[
typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
]
Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
Parameters Name Descriptionjoin
{'outer', 'inner', 'left', 'right'}, default 'outer'
Type of alignment to be performed. left: use only keys from left frame, preserve key order. right: use only keys from right frame, preserve key order. outer: use union of keys from both frames, sort keys lexicographically. inner: use intersection of keys from both frames, preserve the order of the left keys.
axis
allowed axis of the other object, default None
Align on index (0), columns (1), or both (None).
Returns Type DescriptionTuple[bigframes.pandas.DataFrame or bigframes.pandas.Series, type of other]
Aligned objects. all
all(
axis: typing.Union[str, int] = 0, *, bool_only: bool = False
) -> bigframes.series.Series
Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a Series or along a DataFrame axis that is False or equivalent (e.g. zero or empty).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [True, True], "B": [False, False]})
>>> df
A B
0 True False
1 True False
<BLANKLINE>
[2 rows x 2 columns]
Checking if all values in each column are True(the default behavior without an explicit axis parameter):
>>> df.all()
A True
B False
dtype: boolean
Checking across rows to see if all values are True:
>>> df.all(axis=1)
0 False
1 False
dtype: boolean
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
bool_only
bool. default False
Include only boolean columns.
anyany(
*, axis: typing.Union[str, int] = 0, bool_only: bool = False
) -> bigframes.series.Series
Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [True, True], "B": [False, False]})
>>> df
A B
0 True False
1 True False
<BLANKLINE>
[2 rows x 2 columns]
Checking if each column contains at least one True element(the default behavior without an explicit axis parameter):
>>> df.any()
A True
B False
dtype: boolean
Checking if each row contains at least one True element:
>>> df.any(axis=1)
0 True
1 True
dtype: boolean
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
bool_only
bool. default False
Include only boolean columns.
applyapply(func, *, axis=0, args: typing.Tuple = (), **kwargs)
Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is the DataFrame's index (axis=0
) or the DataFrame's columns (axis=1
). The final return type is inferred from the return type of the applied function.
axis=1
scenario is in preview. Examples:
>>> import bigframes.pandas as bpd
>>> import pandas as pd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
col1 col2
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
>>> def square(x):
... return x * x
>>> df.apply(square)
col1 col2
0 1 9
1 4 16
<BLANKLINE>
[2 rows x 2 columns]
You could apply a user defined function to every row of the DataFrame by creating a remote function out of it, and using it with axis=1
. Within the function, each row is passed as a pandas.Series
. It is recommended to select only the necessary columns before calling apply()
. Note: This feature is currently in preview.
>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def foo(row: pd.Series) -> int:
... result = 1
... result += row["col1"]
... result += row["col2"]*row["col2"]
... return result
>>> df[["col1", "col2"]].apply(foo, axis=1)
0 11
1 19
dtype: Int64
You could return an array output for every input row from the remote function.
>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def marks_analyzer(marks: pd.Series) -> list[float]:
... import statistics
... average = marks.mean()
... median = marks.median()
... gemetric_mean = statistics.geometric_mean(marks.values)
... harmonic_mean = statistics.harmonic_mean(marks.values)
... return [
... round(stat, 2) for stat in
... (average, median, gemetric_mean, harmonic_mean)
... ]
>>> df = bpd.DataFrame({
... "physics": [67, 80, 75],
... "chemistry": [88, 56, 72],
... "algebra": [78, 91, 79]
... }, index=["Alice", "Bob", "Charlie"])
>>> stats = df.apply(marks_analyzer, axis=1)
>>> stats
Alice [77.67 78. 77.19 76.71]
Bob [75.67 80. 74.15 72.56]
Charlie [75.33 75. 75.28 75.22]
dtype: list<item: double>[pyarrow]
You could also apply a remote function which accepts multiple parameters to every row of a DataFrame by using it with axis=1
if the DataFrame has matching number of columns and data types. Note: This feature is currently in preview.
>>> df = bpd.DataFrame({
... 'col1': [1, 2],
... 'col2': [3, 4],
... 'col3': [5, 5]
... })
>>> df
col1 col2 col3
0 1 3 5
1 2 4 5
<BLANKLINE>
[2 rows x 3 columns]
>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def foo(x: int, y: int, z: int) -> float:
... result = 1
... result += x
... result += y/z
... return result
>>> df.apply(foo, axis=1)
0 2.6
1 3.8
dtype: Float64
Parameters Name Description args
tuple
Positional arguments to pass to func
in addition to the array/series.
func
function
Function to apply to each column or row. To apply to each row (i.e. when axis=1
is specified) the function can be of one of the two types: (1). It accepts a single input parameter of type Series
, in which case each row is delivered to the function as a pandas Series. (2). It accept one or more parameters, in which case column values are delivered to the function as separate arguments (mapping to those parameters) for each row. For this to work the DataFrame
must have same number of columns and matching data types.
axis
{index (0), columns (1)}
Axis along which the function is applied. Specify 0 or 'index' to apply function to each column. Specify 1 or 'columns' to apply function to each row.
Exceptions Type DescriptionValueError
If a remote function is not provided when axis=1
is specified. ValueError
If number or input params in the remote function are not the same as the number of columns in the dataframe. ValueError
If the dtypes of the columns in the dataframe are not compatible with the data types of the remote function input params. Returns Type Description bigframes.pandas.DataFrame or bigframes.pandas.Series
Result of applying func
along the given axis of the DataFrame. applymap
applymap(
func, na_action: typing.Optional[str] = None
) -> bigframes.dataframe.DataFrame
Apply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar to every element of a DataFrame.
Note: In pandas 2.1.0, DataFrame.applymap is deprecated and renamed to DataFrame.map. Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Let's use reuse=False
flag to make sure a new remote_function
is created every time we run the following code, but you can skip it to potentially reuse a previously deployed remote_function
from the same user defined function.
>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def minutes_to_hours(x: int) -> float:
... return x/60
>>> df_minutes = bpd.DataFrame(
... {"system_minutes" : [0, 30, 60, 90, 120],
... "user_minutes" : [0, 15, 75, 90, 6]})
>>> df_minutes
system_minutes user_minutes
0 0 0
1 30 15
2 60 75
3 90 90
4 120 6
<BLANKLINE>
[5 rows x 2 columns]
>>> df_hours = df_minutes.map(minutes_to_hours)
>>> df_hours
system_minutes user_minutes
0 0.0 0.0
1 0.5 0.25
2 1.0 1.25
3 1.5 1.5
4 2.0 0.1
<BLANKLINE>
[5 rows x 2 columns]
If there are NA
/None
values in the data, you can ignore applying the remote function on such values by specifying na_action='ignore'
.
>>> df_minutes = bpd.DataFrame(
... {
... "system_minutes" : [0, 30, 60, None, 90, 120, bpd.NA],
... "user_minutes" : [0, 15, 75, 90, 6, None, bpd.NA]
... }, dtype="Int64")
>>> df_hours = df_minutes.map(minutes_to_hours, na_action='ignore')
>>> df_hours
system_minutes user_minutes
0 0.0 0.0
1 0.5 0.25
2 1.0 1.25
3 <NA> 1.5
4 1.5 0.1
5 2.0 <NA>
6 <NA> <NA>
<BLANKLINE>
[7 rows x 2 columns]
Parameters Name Description func
function
Python function wrapped by remote_function
decorator, returns a single value from a single value.
na_action
Optional[str], default None
{None, 'ignore'}
, default None. If ignore
, propagate NaN values, without passing them to func.
TypeError
If value provided for func
is not callable. ValueError
If value provided for na_action
is not None
or ignore
. Returns Type Description bigframes.pandas.DataFrame
Transformed DataFrame. area
area(
x: typing.Optional[typing.Hashable] = None,
y: typing.Optional[typing.Hashable] = None,
stacked: bool = True,
**kwargs
)
Draw a stacked area plot. An area plot displays quantitative data visually.
This function calls pandas.plot
to generate a plot with a random sample of items. For consistent results, the random sampling is reproducible. Use the sampling_random_state
parameter to modify the sampling seed.
Examples:
Draw an area plot based on basic business metrics:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(
... {
... 'sales': [3, 2, 3, 9, 10, 6],
... 'signups': [5, 5, 6, 12, 14, 13],
... 'visits': [20, 42, 28, 62, 81, 50],
... },
... index=["01-31", "02-28", "03-31", "04-30", "05-31", "06-30"]
... )
>>> ax = df.plot.area()
Area plots are stacked by default. To produce an unstacked plot, pass stacked=False
:
>>> ax = df.plot.area(stacked=False)
Draw an area plot for a single column:
>>> ax = df.plot.area(y='sales')
Draw with a different x
:
>>> df = bpd.DataFrame({
... 'sales': [3, 2, 3],
... 'visits': [20, 42, 28],
... 'day': [1, 2, 3],
... })
>>> ax = df.plot.area(x='day')
Parameters Name Description x
label or position, optional
Coordinates for the X axis. By default uses the index.
y
label or position, optional
Column to plot. By default uses all columns.
stacked
bool, default True
Area plots are stacked by default. Set to False to create a unstacked plot.
sampling_n
int, default 100
Number of random items for plotting.
sampling_random_state
int, default 0
Seed for random number generator.
Returns Type Descriptionmatplotlib.axes.Axes or numpy.ndarray
Area plot, or array of area plots if subplots is True. assign
assign(**kwargs) -> bigframes.dataframe.DataFrame
Assign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
Note: Assigning multiple columns within the sameassign
is possible. Later items in '**kwargs' may refer to newly created or modified columns in 'df'; items are computed and assigned into 'df' in order. Returns Type Description bigframes.pandas.DataFrame
A new DataFrame with the new columns in addition to all the existing columns. astype
astype(
dtype: typing.Union[
typing.Literal[
"boolean",
"Float64",
"Int64",
"int64[pyarrow]",
"string",
"string[pyarrow]",
"timestamp[us, tz=UTC][pyarrow]",
"timestamp[us][pyarrow]",
"date32[day][pyarrow]",
"time64[us][pyarrow]",
"decimal128(38, 9)[pyarrow]",
"decimal256(76, 38)[pyarrow]",
"binary[pyarrow]",
],
pandas.core.arrays.boolean.BooleanDtype,
pandas.core.arrays.floating.Float64Dtype,
pandas.core.arrays.integer.Int64Dtype,
pandas.core.arrays.string_.StringDtype,
pandas.core.dtypes.dtypes.ArrowDtype,
geopandas.array.GeometryDtype,
type,
dict[
str,
typing.Union[
typing.Literal[
"boolean",
"Float64",
"Int64",
"int64[pyarrow]",
"string",
"string[pyarrow]",
"timestamp[us, tz=UTC][pyarrow]",
"timestamp[us][pyarrow]",
"date32[day][pyarrow]",
"time64[us][pyarrow]",
"decimal128(38, 9)[pyarrow]",
"decimal256(76, 38)[pyarrow]",
"binary[pyarrow]",
],
pandas.core.arrays.boolean.BooleanDtype,
pandas.core.arrays.floating.Float64Dtype,
pandas.core.arrays.integer.Int64Dtype,
pandas.core.arrays.string_.StringDtype,
pandas.core.dtypes.dtypes.ArrowDtype,
geopandas.array.GeometryDtype,
],
],
],
*,
errors: typing.Literal["raise", "null"] = "raise"
) -> bigframes.dataframe.DataFrame
Cast a pandas object to a specified dtype dtype
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Create a DataFrame:
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = bpd.DataFrame(data=d)
>>> df.dtypes
col1 Int64
col2 Int64
dtype: object
Cast all columns to Float64
:
>>> df.astype('Float64').dtypes
col1 Float64
col2 Float64
dtype: object
Create a series of type Int64
:
>>> ser = bpd.Series([2023010000246789, 1624123244123101, 1054834234120101], dtype='Int64')
>>> ser
0 2023010000246789
1 1624123244123101
2 1054834234120101
dtype: Int64
Convert to Float64
type:
>>> ser.astype('Float64')
0 2023010000246789.0
1 1624123244123101.0
2 1054834234120101.0
dtype: Float64
Convert to pd.ArrowDtype(pa.timestamp("us", tz="UTC"))
type:
>>> ser.astype("timestamp[us, tz=UTC][pyarrow]")
0 2034-02-08 11:13:20.246789+00:00
1 2021-06-19 17:20:44.123101+00:00
2 2003-06-05 17:30:34.120101+00:00
dtype: timestamp[us, tz=UTC][pyarrow]
Note that this is equivalent of using to_datetime
with unit='us'
:
>>> bpd.to_datetime(ser, unit='us', utc=True)
0 2034-02-08 11:13:20.246789+00:00
1 2021-06-19 17:20:44.123101+00:00
2 2003-06-05 17:30:34.120101+00:00
dtype: timestamp[us, tz=UTC][pyarrow]
Convert pd.ArrowDtype(pa.timestamp("us", tz="UTC"))
type to Int64
type:
>>> timestamp_ser = ser.astype("timestamp[us, tz=UTC][pyarrow]")
>>> timestamp_ser.astype('Int64')
0 2023010000246789
1 1624123244123101
2 1054834234120101
dtype: Int64
Parameters Name Description dtype
str, data type or pandas.ExtensionDtype
A dtype supported by BigQuery DataFrame include 'boolean'
, 'Float64'
, 'Int64'
, 'int64[pyarrow]'
, 'string'
, 'string[pyarrow]'
, 'timestamp[us, tz=UTC][pyarrow]'
, 'timestamp[us][pyarrow]'
, 'date32[day][pyarrow]'
, 'time64[us][pyarrow]'
. A pandas.ExtensionDtype include pandas.BooleanDtype()
, pandas.Float64Dtype()
, pandas.Int64Dtype()
, pandas.StringDtype(storage="pyarrow")
, pd.ArrowDtype(pa.date32())
, pd.ArrowDtype(pa.time64("us"))
, pd.ArrowDtype(pa.timestamp("us"))
, pd.ArrowDtype(pa.timestamp("us", tz="UTC"))
.
errors
{'raise', 'null'}, default 'raise'
Control raising of exceptions on invalid data for provided dtype. If 'raise', allow exceptions to be raised if any value fails cast If 'null', will assign null value if value fails cast
Returns Type Descriptionbigframes.pandas.DataFrame
A BigQuery DataFrame. bar
bar(
x: typing.Optional[typing.Hashable] = None,
y: typing.Optional[typing.Hashable] = None,
**kwargs
)
Draw a vertical bar plot.
This function calls pandas.plot
to generate a plot with a random sample of items. For consistent results, the random sampling is reproducible. Use the sampling_random_state
parameter to modify the sampling seed.
Examples:
Basic plot.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'lab':['A', 'B', 'C'], 'val':[10, 30, 20]})
>>> ax = df.plot.bar(x='lab', y='val', rot=0)
Plot a whole dataframe to a bar plot. Each column is assigned a distinct color, and each row is nested in a group along the horizontal axis.
>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
>>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
>>> index = ['snail', 'pig', 'elephant',
... 'rabbit', 'giraffe', 'coyote', 'horse']
>>> df = bpd.DataFrame({'speed': speed, 'lifespan': lifespan}, index=index)
>>> ax = df.plot.bar(rot=0)
Plot stacked bar charts for the DataFrame.
>>> ax = df.plot.bar(stacked=True)
If you don’t like the default colours, you can specify how you’d like each column to be colored.
>>> axes = df.plot.bar(
... rot=0, subplots=True, color={"speed": "red", "lifespan": "green"}
... )
Parameters Name Description x
label or position, optional
Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.
y
label or position, optional
Allows plotting of one column versus another. If not specified, all numerical columns are used.
Returns Type Descriptionmatplotlib.axes.Axes or numpy.ndarray
Area plot, or array of area plots if subplots is True. bfill
bfill(*, limit: typing.Optional[int] = None) -> bigframes.dataframe.DataFrame
Fill NA/NaN values by using the next valid observation to fill the gap.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series or None
Object with missing values filled. cache
Materializes the DataFrame to a temporary table.
Useful if the dataframe will be used multiple times, as this will avoid recomputating the shared intermediate value.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame combine
combine(
other: bigframes.dataframe.DataFrame,
func: typing.Callable[
[bigframes.series.Series, bigframes.series.Series], bigframes.series.Series
],
fill_value=None,
overwrite: bool = True,
*,
how: str = "outer"
) -> bigframes.dataframe.DataFrame
Perform column-wise combine with another DataFrame.
Combines a DataFrame with other
DataFrame using func
to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df1 = bpd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = bpd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
A B
0 0 3
1 0 3
<BLANKLINE>
[2 rows x 2 columns]
Parameters Name Description other
DataFrame
The DataFrame to merge column-wise.
func
function
Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.
fill_value
scalar value, default None
The value to fill NaNs with prior to passing any column to the merge func.
overwrite
bool, default True
If True, columns in self
that do not exist in other
will be overwritten with NaNs.
ValueError
If func
return value is not Series. Returns Type Description bigframes.pandas.DataFrame
Combination of the provided DataFrames. combine_first
combine_first(other: bigframes.dataframe.DataFrame)
Update null elements with value in the same location in other
.
Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. The resulting dataframe contains the 'first' dataframe values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df1 = bpd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = bpd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
A B
0 1.0 3.0
1 0.0 4.0
<BLANKLINE>
[2 rows x 2 columns]
Parameter Name Description other
DataFrame
Provided DataFrame to use to fill null values.
Returns Type Descriptionbigframes.pandas.DataFrame
The result of combining the provided DataFrame with the other object. copy
copy() -> bigframes.dataframe.DataFrame
Make a copy of this object's indices and data.
A new object will be created with a copy of the calling object's data and indices. Modifications to the data or indices of the copy will not be reflected in the original object.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Modification in the original Series will not affect the copy Series:
>>> s = bpd.Series([1, 2], index=["a", "b"])
>>> s
a 1
b 2
dtype: Int64
>>> s_copy = s.copy()
>>> s_copy
a 1
b 2
dtype: Int64
>>> s.loc['b'] = 22
>>> s
a 1
b 22
dtype: Int64
>>> s_copy
a 1
b 2
dtype: Int64
Modification in the original DataFrame will not affect the copy DataFrame:
>>> df = bpd.DataFrame({'a': [1, 3], 'b': [2, 4]})
>>> df
a b
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df_copy = df.copy()
>>> df_copy
a b
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.loc[df["b"] == 2, "b"] = 22
>>> df
a b
0 1 22
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df_copy
a b
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
corr
corr(
method="pearson", min_periods=None, numeric_only=False
) -> bigframes.dataframe.DataFrame
Compute pairwise correlation of columns, excluding NA/null values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600],
... 'C': [0.8, 0.4, 0.9]})
>>> df.corr(numeric_only=True)
A B C
A 1.0 1.0 0.188982
B 1.0 1.0 0.188982
C 0.188982 0.188982 1.0
<BLANKLINE>
[3 rows x 3 columns]
Parameters Name Description method
string, default "pearson"
Correlation method to use - currently only "pearson" is supported.
min_periods
int, default None
The minimum number of observations needed to return a result. Non-default values are not yet supported, so a result will be returned for at least two observations.
numeric_only
bool, default False
Include only float, int, boolean, decimal data.
Returns Type Descriptionbigframes.pandas.DataFrame
Correlation matrix. corrwith
corrwith(
other: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
*,
numeric_only: bool = False
)
Compute pairwise correlation.
Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> index = ["a", "b", "c", "d", "e"]
>>> columns = ["one", "two", "three", "four"]
>>> df1 = bpd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
>>> df2 = bpd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)
>>> df1.corrwith(df2)
one 1.0
two 1.0
three 1.0
four 1.0
dtype: Float64
Parameters Name Description other
DataFrame, Series
Object with which to compute correlations.
numeric_only
bool, default False
Include only float
, int
or boolean
data.
count(*, numeric_only: bool = False) -> bigframes.series.Series
Count non-NA cells for each column.
The values None
, NaN
, NaT
, and optionally numpy.inf
(depending on pandas.options.mode.use_inf_as_na
) are considered NA.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, None, 3, 4, 5],
... "B": [1, 2, 3, 4, 5],
... "C": [None, 3.5, None, 4.5, 5.0]})
>>> df
A B C
0 1.0 1 <NA>
1 <NA> 2 3.5
2 3.0 3 <NA>
3 4.0 4 4.5
4 5.0 5 5.0
<BLANKLINE>
[5 rows x 3 columns]
Counting non-NA values for each column:
>>> df.count()
A 4
B 5
C 3
dtype: Int64
Parameter Name Description numeric_only
bool, default False
Include only float
, int
or boolean
data.
bigframes.pandas.Series
For each column/row the number of non-NA/null entries. If level
is specified returns a DataFrame
. cov
cov(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame
Compute pairwise covariance of columns, excluding NA/null values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600],
... 'C': [0.8, 0.4, 0.9]})
>>> df.cov(numeric_only=True)
A B C
A 1.0 100.0 0.05
B 100.0 10000.0 5.0
C 0.05 5.0 0.07
<BLANKLINE>
[3 rows x 3 columns]
Parameter Name Description numeric_only
bool, default False
Include only float, int, boolean, decimal data.
Returns Type Descriptionbigframes.pandas.DataFrame
The covariance matrix of the series of the DataFrame. cummax
cummax() -> bigframes.dataframe.DataFrame
Return cumulative maximum over columns.
Returns a DataFrame of the same size containing the cumulative maximum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cummax()
A B
0 3 1
1 3 2
2 3 3
<BLANKLINE>
[3 rows x 2 columns]
Returns Type Description bigframes.pandas.DataFrame
Return cumulative maximum of DataFrame. cummin
cummin() -> bigframes.dataframe.DataFrame
Return cumulative minimum over columns.
Returns a DataFrame of the same size containing the cumulative minimum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cummin()
A B
0 3 1
1 1 1
2 1 1
<BLANKLINE>
[3 rows x 2 columns]
Returns Type Description bigframes.pandas.DataFrame
Return cumulative minimum of DataFrame. cumprod
cumprod() -> bigframes.dataframe.DataFrame
Return cumulative product over columns.
Returns a DataFrame of the same size containing the cumulative product.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cumprod()
A B
0 3.0 1.0
1 3.0 2.0
2 6.0 6.0
<BLANKLINE>
[3 rows x 2 columns]
Exceptions Type Description ValueError
If values are not of numeric type. Returns Type Description bigframes.pandas.DataFrame
Return cumulative product of DataFrame. cumsum
Return cumulative sum over columns.
Returns a DataFrame of the same size containing the cumulative sum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cumsum()
A B
0 3 1
1 4 3
2 6 6
<BLANKLINE>
[3 rows x 2 columns]
Exceptions Type Description ValueError
If values are not of numeric type. Returns Type Description bigframes.pandas.DataFrame
Return cumulative sum of DataFrame. describe
describe(
include: typing.Union[None, typing.Literal["all"]] = None,
) -> bigframes.dataframe.DataFrame
Generate descriptive statistics.
Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN
values.
include
"all" or None, optional .. note:: Percentile values are approximates only. .. note:: For numeric data, the result's index will include count
, mean
, std
, min
, max
as well as lower, 50
and upper percentiles. By default the lower percentile is 25
and the upper percentile is 75
. The 50
percentile is the same as the median. **Examples:** >>> import bigframes.pandas as bpd >>> bpd.options.display.progress_bar = None >>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [0, 2, 8], "C": ["cat", "cat", "dog"]}) >>> df A B C 0 3 0 cat 1 1 2 cat 2 2 8 dog
If "all": All columns of the input will be included in the output. If None: The result will include all numeric columns.
Exceptions Type DescriptionValueError
If unsupported include
type is provided. Returns Type Description bigframes.pandas.DataFrame
Summary statistics of the Series or Dataframe provided. diff
diff(periods: int = 1) -> bigframes.dataframe.DataFrame
First discrete difference of element.
Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is element in previous row).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
Calculating difference with default periods=1:
>>> df.diff()
A B
0 <NA> <NA>
1 -2 1
2 1 1
<BLANKLINE>
[3 rows x 2 columns]
Calculating difference with periods=-1:
>>> df.diff(periods=-1)
A B
0 2 -1
1 -1 -1
2 <NA> <NA>
<BLANKLINE>
[3 rows x 2 columns]
Parameter Name Description periods
int, default 1
Periods to shift for calculating difference, accepts negative values.
Returns Type Descriptionbigframes.pandas.DataFrame
First differences of the Series. div
div(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get floating division of DataFrame and other, element-wise (binary operator /
).
Equivalent to dataframe / other
. With reverse version, rtruediv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].truediv(df['B'])
0 0.25
1 0.4
2 0.5
dtype: Float64
You can also use arithmetic operator /
:
>>> df['A'] / (df['B'])
0 0.25
1 0.4
2 0.5
dtype: Float64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. divide
divide(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get floating division of DataFrame and other, element-wise (binary operator /
).
Equivalent to dataframe / other
. With reverse version, rtruediv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].truediv(df['B'])
0 0.25
1 0.4
2 0.5
dtype: Float64
You can also use arithmetic operator /
:
>>> df['A'] / (df['B'])
0 0.25
1 0.4
2 0.5
dtype: Float64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. dot
dot(other: _DataFrameOrSeries) -> _DataFrameOrSeries
Compute the matrix multiplication between the DataFrame and other.
This method computes the matrix product between the DataFrame and the values of an other Series or DataFrame.
It can also be called using self @ other
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> left = bpd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> left
0 1 2 3
0 0 1 -2 -1
1 1 1 1 1
<BLANKLINE>
[2 rows x 4 columns]
>>> right = bpd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> right
0 1
0 0 1
1 1 2
2 -1 -1
3 2 0
<BLANKLINE>
[4 rows x 2 columns]
>>> left.dot(right)
0 1
0 1 4
1 2 2
<BLANKLINE>
[2 rows x 2 columns]
You can also use the operator @
for the dot product:
>>> left @ right
0 1
0 1 4
1 2 2
<BLANKLINE>
[2 rows x 2 columns]
The right input can be a Series, in which case the result will also be a Series:
>>> right = bpd.Series([1, 2, -1,0])
>>> left @ right
0 4
1 2
dtype: Int64
Any user defined index of the left matrix and columns of the right matrix will reflect in the result.
>>> left = bpd.DataFrame([[1, 2, 3], [2, 5, 7]], index=["alpha", "beta"])
>>> left
0 1 2
alpha 1 2 3
beta 2 5 7
<BLANKLINE>
[2 rows x 3 columns]
>>> right = bpd.DataFrame([[2, 4, 8], [1, 5, 10], [3, 6, 9]], columns=["red", "green", "blue"])
>>> right
red green blue
0 2 4 8
1 1 5 10
2 3 6 9
<BLANKLINE>
[3 rows x 3 columns]
>>> left.dot(right)
red green blue
alpha 13 32 55
beta 30 75 129
<BLANKLINE>
[2 rows x 3 columns]
Parameter Name Description other
Series or DataFrame
The other object to compute the matrix product with.
Exceptions Type DescriptionRuntimeError
If unable to construct all columns. Returns Type Description bigframes.pandas.DataFrame or bigframes.pandas.Series
If other
is a Series, return the matrix product between self and other as a Series. If other is a DataFrame, return the matrix product of self and other in a DataFrame. drop
drop(
labels: typing.Any = None,
*,
axis: typing.Union[int, str] = 0,
index: typing.Any = None,
columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
level: typing.Optional[typing.Hashable] = None
) -> bigframes.dataframe.DataFrame
Drop specified labels from columns.
Remove columns by directly specifying column names.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(np.arange(12).reshape(3, 4),
... columns=['A', 'B', 'C', 'D'])
>>> df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
<BLANKLINE>
[3 rows x 4 columns]
Drop columns:
>>> df.drop(['B', 'C'], axis=1)
A D
0 0 3
1 4 7
2 8 11
<BLANKLINE>
[3 rows x 2 columns]
>>> df.drop(columns=['B', 'C'])
A D
0 0 3
1 4 7
2 8 11
<BLANKLINE>
[3 rows x 2 columns]
Drop a row by index:
>>> df.drop([0, 1])
A B C D
2 8 9 10 11
<BLANKLINE>
[1 rows x 4 columns]
Drop columns and/or rows of MultiIndex DataFrame:
>>> import pandas as pd
>>> midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
... ['speed', 'weight', 'length']],
... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = bpd.DataFrame(index=midx, columns=['big', 'small'],
... data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
... [250, 150], [1.5, 0.8], [320, 250],
... [1, 0.8], [0.3, 0.2]])
>>> df
big small
llama speed 45.0 30.0
weight 200.0 100.0
length 1.5 1.0
cow speed 30.0 20.0
weight 250.0 150.0
length 1.5 0.8
falcon speed 320.0 250.0
weight 1.0 0.8
length 0.3 0.2
<BLANKLINE>
[9 rows x 2 columns]
Drop a specific index and column combination from the MultiIndex DataFrame, i.e., drop the index 'cow'
and column 'small'
:
>>> df.drop(index='cow', columns='small')
big
llama speed 45.0
weight 200.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
<BLANKLINE>
[6 rows x 1 columns]
>>> df.drop(index='length', level=1)
big small
llama speed 45.0 30.0
weight 200.0 100.0
cow speed 30.0 20.0
weight 250.0 150.0
falcon speed 320.0 250.0
weight 1.0 0.8
<BLANKLINE>
[6 rows x 2 columns]
Exceptions Type Description KeyError
If any of the labels is not found in the selected axis. ValueError
If values for both labels
and index
/columns
are provided. ValueError
If a multi-index tuple is provided as level
. ValueError
If either labels
or index
/columns
is not provided. Returns Type Description bigframes.pandas.DataFrame
DataFrame without the removed column labels. drop_duplicates
drop_duplicates(
subset: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
*,
keep: str = "first"
) -> bigframes.dataframe.DataFrame
Return DataFrame with duplicate rows removed.
Considering certain columns is optional. Indexes, including time indexes are ignored.
Parameters Name Descriptionsubset
column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns.
keep
{'first', 'last', False
}, default 'first'
Determines which duplicates (if any) to keep. - 'first' : Drop duplicates except for the first occurrence. - 'last' : Drop duplicates except for the last occurrence. - False
: Drop all duplicates.
bigframes.pandas.DataFrame
DataFrame with duplicates removed droplevel
droplevel(
level: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
axis: int | str = 0,
)
Return DataFrame with requested index / column level(s) removed.
Parameters Name Descriptionlevel
int, str, or list-like
If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.
axis
{0 or 'index', 1 or 'columns'}, default 0
Axis along which the level(s) is removed: * 0 or 'index': remove level(s) in column. * 1 or 'columns': remove level(s) in row.
Exceptions Type DescriptionValueError
If columns are not multi-index Returns Type Description bigframes.pandas.DataFrame
DataFrame with requested index / column level(s) removed. dropna
dropna(
*,
axis: int | str = 0,
how: str = "any",
subset: typing.Union[
None, typing.Hashable, typing.Sequence[typing.Hashable]
] = None,
inplace: bool = False,
ignore_index=False
) -> bigframes.dataframe.DataFrame
Remove missing values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
... "toy": [np.nan, 'Batmobile', 'Bullwhip'],
... "born": [bpd.NA, "1940-04-25", bpd.NA]})
>>> df
name toy born
0 Alfred <NA> <NA>
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip <NA>
<BLANKLINE>
[3 rows x 3 columns]
Drop the rows where at least one element is missing:
>>> df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
<BLANKLINE>
[1 rows x 3 columns]
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns')
name
0 Alfred
1 Batman
2 Catwoman
<BLANKLINE>
[3 rows x 1 columns]
Drop the rows where all elements are missing:
>>> df.dropna(how='all')
name toy born
0 Alfred <NA> <NA>
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip <NA>
<BLANKLINE>
[3 rows x 3 columns]
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'toy'])
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip <NA>
<BLANKLINE>
[2 rows x 3 columns]
Parameters Name Description axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Determine if rows or columns which contain missing values are removed. * 0, or 'index' : Drop rows which contain missing values. * 1, or 'columns' : Drop columns which contain missing value.
how
{'any', 'all'}, default 'any'
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. * 'any' : If any NA values are present, drop that row or column. * 'all' : If all values are NA, drop that row or column.
subset
column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Only supports axis=0.
inplace
bool, default False
Not supported.
ignore_index
bool, default False
If True
, the resulting axis will be labeled 0, 1, …, n - 1.
ValueError
If how
is not one of any
or all
. Returns Type Description bigframes.pandas.DataFrame
DataFrame with NA entries dropped from it. duplicated
duplicated(subset=None, keep: str = "first") -> bigframes.series.Series
Return boolean Series denoting duplicate rows.
Considering certain columns is optional.
Parameters Name Descriptionsubset
column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns.
keep
{'first', 'last', False}, default 'first'
Determines which duplicates (if any) to mark. - first
: Mark duplicates as True
except for the first occurrence. - last
: Mark duplicates as True
except for the last occurrence. - False : Mark all duplicates as True
.
eq(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrame
Get equal to of DataFrame and other, element-wise (binary operator eq
).
Among flexible wrappers (eq
, ne
, le
, lt
, ge
, gt
) to comparison operators.
Equivalent to ==
, !=
, <=
, <
, >=
, >
with support to choose axis (rows or columns) and level for comparison.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].eq(360)
circle True
triangle False
rectangle True
Name: degrees, dtype: boolean
You can also use logical operator ==
:
>>> df["degrees"] == 360
circle True
triangle False
rectangle True
Name: degrees, dtype: boolean
Parameters Name Description other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns (1 or 'columns').
Returns Type Descriptionbigframes.pandas.DataFrame
Result of the comparison. equals
equals(
other: typing.Union[bigframes.series.Series, bigframes.dataframe.DataFrame],
) -> bool
Test whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.
Parameter Name Descriptionother
Series or DataFrame
The other Series or DataFrame to be compared with the first.
Returns Type Descriptionbool
True if all elements are the same in both objects, False otherwise. eval
eval(expr: str) -> bigframes.dataframe.DataFrame
Evaluate a string describing operations on DataFrame columns.
Operates on columns only, not specific rows or elements. This allows eval
to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
A B
0 1 10
1 2 8
2 3 6
3 4 4
4 5 2
<BLANKLINE>
[5 rows x 2 columns]
>>> df.eval('A + B')
0 11
1 10
2 9
3 8
4 7
dtype: Int64
Assignment is allowed though by default the original DataFrame is not modified.
>>> df.eval('C = A + B')
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7
<BLANKLINE>
[5 rows x 3 columns]
>>> df
A B
0 1 10
1 2 8
2 3 6
3 4 4
4 5 2
<BLANKLINE>
[5 rows x 2 columns]
Multiple columns can be assigned to using multi-line expressions:
>>> df.eval(
... '''
... C = A + B
... D = A - B
... '''
... )
A B C D
0 1 10 11 -9
1 2 8 10 -6
2 3 6 9 -3
3 4 4 8 0
4 5 2 7 3
<BLANKLINE>
[5 rows x 4 columns]
Parameter Name Description expr
str
The expression string to evaluate.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result after the operation. expanding
expanding(min_periods: int = 1) -> bigframes.core.window.rolling.Window
Provide expanding window calculations.
Parameter Name Descriptionmin_periods
int, default 1
Minimum number of observations in window required to have a value; otherwise, result is np.nan
.
explode(
column: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
*,
ignore_index: typing.Optional[bool] = False
) -> bigframes.dataframe.DataFrame
Transform each element of an array to a row, replicating index values.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [[0, 1, 2], [], [], [3, 4]],
... 'B': 1,
... 'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
>>> df.explode('A')
A B C
0 0 1 ['a' 'b' 'c']
0 1 1 ['a' 'b' 'c']
0 2 1 ['a' 'b' 'c']
1 <NA> 1 []
2 <NA> 1 []
3 3 1 ['d' 'e']
3 4 1 ['d' 'e']
<BLANKLINE>
[7 rows x 3 columns]
>>> df.explode(list('AC'))
A B C
0 0 1 a
0 1 1 b
0 2 1 c
1 <NA> 1 <NA>
2 <NA> 1 <NA>
3 3 1 d
3 4 1 e
<BLANKLINE>
[7 rows x 3 columns]
Parameters Name Description column
str, Sequence[str]
Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length.
ignore_index
bool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
Exceptions Type DescriptionValueError
* If columns of the frame are not unique. * If specified columns to explode is empty list. * If specified columns to explode have not matching count of elements rowwise in the frame. KeyError
If incorrect column names are provided Returns Type Description bigframes.pandas.DataFrame
Exploded lists to rows of the subset columns; index will be duplicated for these rows. ffill
ffill(*, limit: typing.Optional[int] = None) -> bigframes.dataframe.DataFrame
Fill NA/NaN values by propagating the last valid observation to next valid.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[np.nan, 2, np.nan, 0],
... [3, 4, np.nan, 1],
... [np.nan, np.nan, np.nan, np.nan],
... [np.nan, 3, np.nan, 4]],
... columns=list("ABCD")).astype("Float64")
>>> df
A B C D
0 <NA> 2.0 <NA> 0.0
1 3.0 4.0 <NA> 1.0
2 <NA> <NA> <NA> <NA>
3 <NA> 3.0 <NA> 4.0
<BLANKLINE>
[4 rows x 4 columns]
Fill NA/NaN values in DataFrames:
>>> df.ffill()
A B C D
0 <NA> 2.0 <NA> 0.0
1 3.0 4.0 <NA> 1.0
2 3.0 4.0 <NA> 1.0
3 3.0 3.0 <NA> 4.0
<BLANKLINE>
[4 rows x 4 columns]
Fill NA/NaN values in Series:
>>> series = bpd.Series([1, np.nan, 2, 3])
>>> series.ffill()
0 1.0
1 1.0
2 2.0
3 3.0
dtype: Float64
Returns Type Description bigframes.pandas.DataFrame or bigframes.pandas.Series or None
Object with missing values filled. fillna
fillna(value=None) -> bigframes.dataframe.DataFrame
Fill NA/NaN values using the specified method.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[np.nan, 2, np.nan, 0],
... [3, 4, np.nan, 1],
... [np.nan, np.nan, np.nan, np.nan],
... [np.nan, 3, np.nan, 4]],
... columns=list("ABCD")).astype("Float64")
>>> df
A B C D
0 <NA> 2.0 <NA> 0.0
1 3.0 4.0 <NA> 1.0
2 <NA> <NA> <NA> <NA>
3 <NA> 3.0 <NA> 4.0
<BLANKLINE>
[4 rows x 4 columns]
Replace all NA elements with 0s.
>>> df.fillna(0)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 0.0
3 0.0 3.0 0.0 4.0
<BLANKLINE>
[4 rows x 4 columns]
You can use fill values from another DataFrame:
>>> df_fill = bpd.DataFrame(np.arange(12).reshape(3, 4),
... columns=['A', 'B', 'C', 'D'])
>>> df_fill
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
<BLANKLINE>
[3 rows x 4 columns]
>>> df.fillna(df_fill)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 6.0 1.0
2 8.0 9.0 10.0 11.0
3 <NA> 3.0 <NA> 4.0
<BLANKLINE>
[4 rows x 4 columns]
Parameter Name Description value
scalar, Series
Value to use to fill holes (e.g. 0), alternately a Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the Series will not be filled. This value cannot be a list.
Returns Type Descriptionbigframes.pandas.DataFrame
Object with missing values filled filter
filter(
items: typing.Optional[typing.Iterable] = None,
like: typing.Optional[str] = None,
regex: typing.Optional[str] = None,
axis: int | str | None = None,
) -> bigframes.dataframe.DataFrame
Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
Parameters Name Descriptionitems
list-like
Keep labels from axis which are in items.
like
str
Keep labels from axis for which "like in label == True".
regex
str (regular expression)
Keep labels from axis for which re.search(regex, label) == True.
axis
{0 or 'index', 1 or 'columns', None}, default None
The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, 'columns' for DataFrame. For Series
this parameter is unused and defaults to None
.
ValueError
If value provided is not exactly one of items
, like
, or regex
. first_valid_index
API documentation for first_valid_index
method.
floordiv(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get integer division of DataFrame and other, element-wise (binary operator //
).
Equivalent to dataframe // other
. With reverse version, rfloordiv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].floordiv(df['B'])
0 0
1 0
2 0
dtype: Int64
You can also use arithmetic operator //
:
>>> df['A'] // (df['B'])
0 0
1 0
2 0
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. from_dict
from_dict(
data: dict, orient: str = "columns", dtype=None, columns=None
) -> bigframes.dataframe.DataFrame
Construct DataFrame from dict of array-like or dicts.
Creates DataFrame object from dictionary by columns or by index allowing dtype specification.
Parameters Name Descriptiondata
dict
Of the form {field : array-like} or {field : dict}.
orient
{'columns', 'index', 'tight'}, default 'columns'
The "orientation" of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass 'columns' (default). Otherwise if the keys should be rows, pass 'index'. If 'tight', assume a dict with keys ['index', 'columns', 'data', 'index_names', 'column_names'].
dtype
dtype, default None
Data type to force after DataFrame construction, otherwise infer.
columns
list, default None
Column labels to use when orient='index'
.
ValueError
If used with orient='columns'
or orient='tight'
. Returns Type Description bigframes.pandas.DataFrame
DataFrame. from_records
from_records(
data,
index=None,
exclude=None,
columns=None,
coerce_float: bool = False,
nrows: typing.Optional[int] = None,
) -> bigframes.dataframe.DataFrame
Convert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.
Parameters Name Descriptiondata
structured ndarray, sequence of tuples or dicts
Structured input data.
index
str, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use.
exclude
sequence, default None
Columns or fields to exclude.
columns
sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).
coerce_float
bool, default False
Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
nrows
int, default None
Number of rows to read if data is an iterator.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame. ge
ge(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrame
Get 'greater than or equal to' of DataFrame and other, element-wise (binary operator >=
).
Among flexible wrappers (eq
, ne
, le
, lt
, ge
, gt
) to comparison operators.
Equivalent to ==
, !=
, <=
, <
, >=
, >
with support to choose axis (rows or columns) and level for comparison.
NaN
values in floating point columns are considered different (i.e. NaN
!= NaN
). Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].ge(360)
circle True
triangle False
rectangle True
Name: degrees, dtype: boolean
You can also use arithmetic operator >=
:
>>> df["degrees"] >= 360
circle True
triangle False
rectangle True
Name: degrees, dtype: boolean
Parameters Name Description other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns (1 or 'columns').
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame of bool. The result of the comparison. groupby
groupby(
by: typing.Optional[
typing.Union[
typing.Hashable,
bigframes.series.Series,
typing.Sequence[typing.Union[typing.Hashable, bigframes.series.Series]],
]
] = None,
*,
level: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
as_index: bool = True,
dropna: bool = True
) -> bigframes.core.groupby.dataframe_group_by.DataFrameGroupBy
Group DataFrame by columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
<BLANKLINE>
[4 rows x 2 columns]
>>> df.groupby(['Animal'])['Max Speed'].mean()
Animal
Falcon 375.0
Parrot 25.0
Name: Max Speed, dtype: Float64
We can also choose to include NA in group keys or not by setting dropna
:
>>> df = bpd.DataFrame([[1, 2, 3],[1, None, 4], [2, 1, 3], [1, 2, 2]],
... columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
a c
b
1.0 2 3
2.0 2 5
<BLANKLINE>
[2 rows x 2 columns]
>>> df.groupby(by=["b"], dropna=False).sum()
a c
b
1.0 2 3
2.0 2 5
<NA> 1 4
<BLANKLINE>
[3 rows x 2 columns]
We can also choose to return object with group labels or not by setting as_index
:
>>> df.groupby(by=["b"], as_index=False).sum()
b a c
0 1.0 2 3
1 2.0 2 5
<BLANKLINE>
[2 rows x 3 columns]
Parameters Name Description by
str, Sequence[str]
A label or list of labels may be passed to group by the columns in self
. Notice that a tuple is interpreted as a (single) key.
level
int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by
and level
.
as_index
bool, default True
Default True. Return object with group labels as the index. Only relevant for DataFrame input. as_index=False
is effectively "SQL-style" grouped output. This argument has no effect on filtrations such as head()
, tail()
, nth()
and in transformations.
dropna
bool, default True
Default True. If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.
Exceptions Type DescriptionValueError
If both by
and level
are specified. TypeError
If one of by
or level
is not specified. gt
gt(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrame
Get 'greater than' of DataFrame and other, element-wise (binary operator >
).
Among flexible wrappers (eq
, ne
, le
, lt
, ge
, gt
) to comparison operators.
Equivalent to ==
, !=
, <=
, <
, >=
, >
with support to choose axis (rows or columns) and level for comparison.
NaN
values in floating point columns are considered different (i.e. NaN
!= NaN
). Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].gt(360)
circle False
triangle False
rectangle False
Name: degrees, dtype: boolean
You can also use arithmetic operator >
:
>>> df["degrees"] > 360
circle False
triangle False
rectangle False
Name: degrees, dtype: boolean
Parameters Name Description other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns (1 or 'columns').
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame of bool: The result of the comparison. head
head(n: int = 5) -> bigframes.dataframe.DataFrame
Return the first n
rows.
This function returns the first n
rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
For negative values of n
, this function returns all rows except the last |n|
rows, equivalent to df[:n]
.
If n is larger than the number of rows, this function returns all rows.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
<BLANKLINE>
[9 rows x 1 columns]
Viewing the first 5 lines:
>>> df.head()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
<BLANKLINE>
[5 rows x 1 columns]
Viewing the first n
lines (three in this case):
>>> df.head(3)
animal
0 alligator
1 bee
2 falcon
<BLANKLINE>
[3 rows x 1 columns]
For negative values of n
:
>>> df.head(-3)
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
<BLANKLINE>
[6 rows x 1 columns]
Parameter Name Description n
int, default 5
Default 5. Number of rows to select.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series
The first n
rows of the caller object. hist
hist(by: typing.Optional[typing.Sequence[str]] = None, bins: int = 10, **kwargs)
Draw one histogram of the DataFrame’s columns.
A histogram is a representation of the distribution of data. This function groups the values of all given Series in the DataFrame into bins and draws all bins in one matplotlib.axes.Axes
. This is useful when the DataFrame's Series are in a similar scale.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(np.random.randint(1, 7, 6000), columns=['one'])
>>> df['two'] = np.random.randint(1, 7, 6000) + np.random.randint(1, 7, 6000)
>>> ax = df.plot.hist(bins=12, alpha=0.5)
Parameters Name Description by
str or sequence, optional
Column in the DataFrame to group by. It is not supported yet.
bins
int, default 10
Number of histogram bins to be used.
Returns Type Descriptionclass
matplotlib.AxesSubplot
: A histogram plot. idxmax
idxmax() -> bigframes.series.Series
Return index of first occurrence of maximum over columns.
NA/null values are excluded.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.idxmax()
A 0
B 2
dtype: Int64
idxmin
idxmin() -> bigframes.series.Series
Return index of first occurrence of minimum over columns.
NA/null values are excluded.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
A B
0 3 1
1 1 2
2 2 3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.idxmin()
A 1
B 0
dtype: Int64
info
info(
verbose: typing.Optional[bool] = None,
buf=None,
max_cols: typing.Optional[int] = None,
memory_usage: typing.Optional[bool] = None,
show_counts: typing.Optional[bool] = None,
)
Print a concise summary of a DataFrame.
This method prints information about a DataFrame including the index dtypeand columns, non-null values and memory usage.
Parameters Name Descriptionverbose
bool, optional
Whether to print the full summary. By default, the setting in pandas.options.display.max_info_columns
is followed.
buf
writable buffer, defaults to sys.stdout
Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.
max_cols
int, optional
When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols
columns, the truncated output is used. By default, the setting in pandas.options.display.max_info_columns
is used.
memory_usage
bool, optional
Specifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the pandas.options.display.memory_usage
setting. True always show memory usage. False never shows memory usage. Memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes.
show_counts
bool, optional
Whether to show the non-null counts. By default, this is shown only if the DataFrame is smaller than pandas.options.display.max_info_rows
and pandas.options.display.max_info_columns
. A value of True always shows the counts, and False never shows the counts.
None
This method prints a summary of a DataFrame and returns None. insert
insert(
loc: int,
column: blocks.Label,
value: SingleItemValue,
allow_duplicates: bool = False,
)
Insert column into DataFrame at specified location.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
Insert a new column named 'col3' between 'col1' and 'col2' with all entries set to 5.
>>> df.insert(1, 'col3', 5)
>>> df
col1 col3 col2
0 1 5 3
1 2 5 4
<BLANKLINE>
[2 rows x 3 columns]
Insert another column named 'col2' at the beginning of the DataFrame with values [5, 6]
>>> df.insert(0, 'col2', [5, 6], allow_duplicates=True)
>>> df
col2 col1 col3 col2
0 5 1 5 3
1 6 2 5 4
<BLANKLINE>
[2 rows x 4 columns]
Parameters Name Description loc
int
Insertion index. Must verify 0 <= loc <= len(columns).
column
str, number, or hashable object
Label of the inserted column.
value
Scalar, Series, or array-like
Content of the inserted column.
allow_duplicates
bool, default False
Allow duplicate column labels to be created.
Exceptions Type DescriptionIndexError
If column
index is out of bounds with the total count of columns. ValueError
If column
is already contained in the DataFrame, unless allow_duplicates
is set to True. interpolate
interpolate(method: str = "linear") -> bigframes.dataframe.DataFrame
Fill NaN values using an interpolation method.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3, None, None, 6],
... 'B': [None, 6, None, 2, None, 3],
... }, index=[0, 0.1, 0.3, 0.7, 0.9, 1.0])
>>> df.interpolate()
A B
0.0 1.0 <NA>
0.1 2.0 6.0
0.3 3.0 4.0
0.7 4.0 2.0
0.9 5.0 2.5
1.0 6.0 3.0
<BLANKLINE>
[6 rows x 2 columns]
>>> df.interpolate(method="values")
A B
0.0 1.0 <NA>
0.1 2.0 6.0
0.3 3.0 4.666667
0.7 4.714286 2.0
0.9 5.571429 2.666667
1.0 6.0 3.0
<BLANKLINE>
[6 rows x 2 columns]
Parameter Name Description method
str, default 'linear'
Interpolation technique to use. Only 'linear' supported. 'linear': Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. 'index', 'values': use the actual numerical values of the index. 'pad': Fill in NaNs using existing values. 'nearest', 'zero', 'slinear': Emulates scipy.interpolate.interp1d
bigframes.pandas.DataFrame
Returns the same object type as the caller, interpolated at some or all NaN
values isin
isin(values) -> bigframes.dataframe.DataFrame
Whether each element in the DataFrame is contained in values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
... index=['falcon', 'dog'])
>>> df
num_legs num_wings
falcon 2 2
dog 4 0
<BLANKLINE>
[2 rows x 2 columns]
When values
is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings).
>>> df.isin([0, 2])
num_legs num_wings
falcon True True
dog False True
<BLANKLINE>
[2 rows x 2 columns]
When values
is a dict, we can pass it to check for each column separately:
>>> df.isin({'num_wings': [0, 3]})
num_legs num_wings
falcon False False
dog False True
<BLANKLINE>
[2 rows x 2 columns]
Parameter Name Description values
iterable, or dict
The result will only be true at a location if all the labels match. If values
is a dict, the keys must be the column names, which must match.
TypeError
If values provided are not list-like objects. Returns Type Description bigframes.pandas.DataFrame
DataFrame of booleans showing whether each element in the DataFrame is contained in values. isna
isna() -> bigframes.dataframe.DataFrame
Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values get mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ''
or numpy.inf
are not considered NA values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame(dict(
... age=[5, 6, np.nan],
... born=[bpd.NA, "1940-04-25", "1940-04-25"],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker'],
... ))
>>> df
age born name toy
0 5.0 <NA> Alfred <NA>
1 6.0 1940-04-25 Batman Batmobile
2 <NA> 1940-04-25 Joker
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a DataFrame are NA:
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
>>> df.isnull()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a Series are NA:
>>> ser = bpd.Series([5, None, 6, np.nan, bpd.NA])
>>> ser
0 5
1 <NA>
2 6
3 <NA>
4 <NA>
dtype: Int64
>>> ser.isna()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
>>> ser.isnull()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
Returns Type Description bigframes.pandas.DataFrame or bigframes.pandas.Series
Mask of bool values for each element that indicates whether an element is an NA value. isnull
isnull() -> bigframes.dataframe.DataFrame
Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values get mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ''
or numpy.inf
are not considered NA values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame(dict(
... age=[5, 6, np.nan],
... born=[bpd.NA, "1940-04-25", "1940-04-25"],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker'],
... ))
>>> df
age born name toy
0 5.0 <NA> Alfred <NA>
1 6.0 1940-04-25 Batman Batmobile
2 <NA> 1940-04-25 Joker
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a DataFrame are NA:
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
>>> df.isnull()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a Series are NA:
>>> ser = bpd.Series([5, None, 6, np.nan, bpd.NA])
>>> ser
0 5
1 <NA>
2 6
3 <NA>
4 <NA>
dtype: Int64
>>> ser.isna()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
>>> ser.isnull()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
Returns Type Description bigframes.pandas.DataFrame or bigframes.pandas.Series
Mask of bool values for each element that indicates whether an element is an NA value. items
Iterate over (column name, Series) pairs.
Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
... 'population': [1864, 22000, 80000]},
... index=['panda', 'polar', 'koala'])
>>> df
species population
panda bear 1864
polar bear 22000
koala marsupial 80000
<BLANKLINE>
[3 rows x 2 columns]
>>> for label, content in df.items():
... print(f'--> label: {label}')
... print(f'--> content:\n{content}')
...
--> label: species
--> content:
panda bear
polar bear
koala marsupial
Name: species, dtype: string
--> label: population
--> content:
panda 1864
polar 22000
koala 80000
Name: population, dtype: Int64
Returns Type Description Iterator
Iterator of label, Series for each column. iterrows
iterrows() -> typing.Iterable[tuple[typing.Any, pandas.core.series.Series]]
Iterate over DataFrame rows as (index, Series) pairs.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> index, row = next(df.iterrows())
>>> index
np.int64(0)
>>> row
A 1
B 4
Name: 0, dtype: object
Returns Type Description Iterable[Tuple]
A tuple where data contains row values as a Series itertuples
itertuples(
index: bool = True, name: typing.Optional[str] = "Pandas"
) -> typing.Iterable[tuple[typing.Any, ...]]
Iterate over DataFrame rows as namedtuples.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> next(df.itertuples(name="Pair"))
Pair(Index=np.int64(0), A=np.int64(1), B=np.int64(4))
Parameters Name Description index
bool, default True
If True, return the index as the first element of the tuple.
name
str or None, default "Pandas"
The name of the returned namedtuples or None to return regular tuples.
Returns Type DescriptionIterable[Tuple]
An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values. join
join(
other: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
*,
on: typing.Optional[str] = None,
how: str = "left"
) -> bigframes.dataframe.DataFrame
Join columns of another DataFrame.
Join columns with other
DataFrame on index
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Join two DataFrames by specifying how to handle the operation:
>>> df1 = bpd.DataFrame({'col1': ['foo', 'bar'], 'col2': [1, 2]}, index=[10, 11])
>>> df1
col1 col2
10 foo 1
11 bar 2
<BLANKLINE>
[2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'col3': ['foo', 'baz'], 'col4': [3, 4]}, index=[11, 22])
>>> df2
col3 col4
11 foo 3
22 baz 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df1.join(df2)
col1 col2 col3 col4
10 foo 1 <NA> <NA>
11 bar 2 foo 3
<BLANKLINE>
[2 rows x 4 columns]
>>> df1.join(df2, how="left")
col1 col2 col3 col4
10 foo 1 <NA> <NA>
11 bar 2 foo 3
<BLANKLINE>
[2 rows x 4 columns]
>>> df1.join(df2, how="right")
col1 col2 col3 col4
11 bar 2 foo 3
22 <NA> <NA> baz 4
<BLANKLINE>
[2 rows x 4 columns]
>>> df1.join(df2, how="outer")
col1 col2 col3 col4
10 foo 1 <NA> <NA>
11 bar 2 foo 3
22 <NA> <NA> baz 4
<BLANKLINE>
[3 rows x 4 columns]
>>> df1.join(df2, how="inner")
col1 col2 col3 col4
11 bar 2 foo 3
<BLANKLINE>
[1 rows x 4 columns]
Another option to join using the key columns is to use the on parameter:
>>> df1.join(df2, on="col1", how="right")
col1 col2 col3 col4
<NA> 11 <NA> foo 3
<NA> 22 <NA> baz 4
<BLANKLINE>
[2 rows x 4 columns]
Parameter Name Description how
{'left', 'right', 'outer', 'inner'}, default 'left'
How to handle the operation of the two objects. left
: use calling frame's index (or column if on is specified) right
: use other
's index. outer
: form union of calling frame's index (or column if on is specified) with other
's index, and sort it lexicographically. inner
: form intersection of calling frame's index (or column if on is specified) with other
's index, preserving the order of the calling's one. cross
: creates the cartesian product from both frames, preserves the order of the left keys.
ValueError
If value for on
is specified for cross join. ValueError
If join on columns does not match the index level of the other DataFrame. Join on columns with multi-index is not supported. ValueError
If left index to join on does not have the same number of levels as the right index. Returns Type Description bigframes.pandas.DataFrame
A dataframe containing columns from both the caller and other
. keys
keys() -> pandas.core.indexes.base.Index
Get the 'info axis'.
This is index for Series, columns for DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df.keys()
Index(['A', 'B'], dtype='object')
Returns Type Description pandas.Index
Info axis. kurt
kurt(*, numeric_only: bool = False)
Return unbiased kurtosis over columns.
Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3, 4, 5],
... "B": [3, 4, 3, 2, 1],
... "C": [2, 2, 3, 2, 2]})
>>> df
A B C
0 1 3 2
1 2 4 2
2 3 3 3
3 4 2 2
4 5 1 2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the kurtosis value of each column:
>>> df.kurt()
A -1.2
B -0.177515
C 5.0
dtype: Float64
Parameter Name Description numeric_only
bool, default False
Include only float, int, boolean columns.
kurtosiskurtosis(*, numeric_only: bool = False)
Return unbiased kurtosis over columns.
Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3, 4, 5],
... "B": [3, 4, 3, 2, 1],
... "C": [2, 2, 3, 2, 2]})
>>> df
A B C
0 1 3 2
1 2 4 2
2 3 3 3
3 4 2 2
4 5 1 2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the kurtosis value of each column:
>>> df.kurt()
A -1.2
B -0.177515
C 5.0
dtype: Float64
Parameter Name Description numeric_only
bool, default False
Include only float, int, boolean columns.
lele(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrame
Get 'less than or equal to' of dataframe and other, element-wise (binary operator <=
).
Among flexible wrappers (eq
, ne
, le
, lt
, ge
, gt
) to comparison operators.
Equivalent to ==
, !=
, <=
, <
, >=
, >
with support to choose axis (rows or columns) and level for comparison.
NaN
values in floating point columns are considered different (i.e. NaN
!= NaN
). Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].le(180)
circle False
triangle True
rectangle False
Name: degrees, dtype: boolean
You can also use arithmetic operator <=
:
>>> df["degrees"] <= 180
circle False
triangle True
rectangle False
Name: degrees, dtype: boolean
Parameters Name Description other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns (1 or 'columns').
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame of bool. The result of the comparison. line
line(
x: typing.Optional[typing.Hashable] = None,
y: typing.Optional[typing.Hashable] = None,
**kwargs
)
Plot Series or DataFrame as lines. This function is useful to plot lines using DataFrame's values as coordinates.
This function calls pandas.plot
to generate a plot with a random sample of items. For consistent results, the random sampling is reproducible. Use the sampling_random_state
parameter to modify the sampling seed.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(
... {
... 'one': [1, 2, 3, 4],
... 'three': [3, 6, 9, 12],
... 'reverse_ten': [40, 30, 20, 10],
... }
... )
>>> ax = df.plot.line(x='one')
Parameters Name Description x
label or position, optional
Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.
y
label or position, optional
Allows plotting of one column versus another. If not specified, all numerical columns are used.
color
str, array-like, or dict, optional
The color for each of the DataFrame's columns. Possible values are: - A single color string referred to by name, RGB or RGBA code, for instance 'red' or '#a98d19'. - A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each column recursively. For instance ['green','yellow'] each column's %(kind)s will be filled in green or yellow, alternatively. If there is only a single column to be plotted, then only the first color from the color list will be used. - A dict of the form {column name : color}, so that each column will be colored accordingly. For example, if your columns are called a
and b
, then passing {'a': 'green', 'b': 'red'} will color %(kind)ss for column a
in green and %(kind)ss for column b
in red.
sampling_n
int, default 100
Number of random items for plotting.
sampling_random_state
int, default 0
Seed for random number generator.
Returns Type Descriptionmatplotlib.axes.Axes or np.ndarray of them
An ndarray is returned with one matplotlib.axes.Axes
per column when subplots=True
. lt
lt(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrame
Get 'less than' of DataFrame and other, element-wise (binary operator <
).
Among flexible wrappers (eq
, ne
, le
, lt
, ge
, gt
) to comparison operators.
Equivalent to ==
, !=
, <=
, <
, >=
, >
with support to choose axis (rows or columns) and level for comparison.
NaN
values in floating point columns are considered different (i.e. NaN
!= NaN
). Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].lt(180)
circle False
triangle False
rectangle False
Name: degrees, dtype: boolean
You can also use arithmetic operator <
:
>>> df["degrees"] < 180
circle False
triangle False
rectangle False
Name: degrees, dtype: boolean
Parameters Name Description other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns (1 or 'columns').
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame of bool. The result of the comparison. map
map(func, na_action: typing.Optional[str] = None) -> bigframes.dataframe.DataFrame
Apply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar to every element of a DataFrame.
Note: In pandas 2.1.0, DataFrame.applymap is deprecated and renamed to DataFrame.map. Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Let's use reuse=False
flag to make sure a new remote_function
is created every time we run the following code, but you can skip it to potentially reuse a previously deployed remote_function
from the same user defined function.
>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def minutes_to_hours(x: int) -> float:
... return x/60
>>> df_minutes = bpd.DataFrame(
... {"system_minutes" : [0, 30, 60, 90, 120],
... "user_minutes" : [0, 15, 75, 90, 6]})
>>> df_minutes
system_minutes user_minutes
0 0 0
1 30 15
2 60 75
3 90 90
4 120 6
<BLANKLINE>
[5 rows x 2 columns]
>>> df_hours = df_minutes.map(minutes_to_hours)
>>> df_hours
system_minutes user_minutes
0 0.0 0.0
1 0.5 0.25
2 1.0 1.25
3 1.5 1.5
4 2.0 0.1
<BLANKLINE>
[5 rows x 2 columns]
If there are NA
/None
values in the data, you can ignore applying the remote function on such values by specifying na_action='ignore'
.
>>> df_minutes = bpd.DataFrame(
... {
... "system_minutes" : [0, 30, 60, None, 90, 120, bpd.NA],
... "user_minutes" : [0, 15, 75, 90, 6, None, bpd.NA]
... }, dtype="Int64")
>>> df_hours = df_minutes.map(minutes_to_hours, na_action='ignore')
>>> df_hours
system_minutes user_minutes
0 0.0 0.0
1 0.5 0.25
2 1.0 1.25
3 <NA> 1.5
4 1.5 0.1
5 2.0 <NA>
6 <NA> <NA>
<BLANKLINE>
[7 rows x 2 columns]
Parameters Name Description func
function
Python function wrapped by remote_function
decorator, returns a single value from a single value.
na_action
Optional[str], default None
{None, 'ignore'}
, default None. If ignore
, propagate NaN values, without passing them to func.
TypeError
If value provided for func
is not callable. ValueError
If value provided for na_action
is not None
or ignore
. Returns Type Description bigframes.pandas.DataFrame
Transformed DataFrame. mask
Replace values where the condition is False.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'a': [20, 10, 0], 'b': [0, 10, 20]})
>>> df
a b
0 20 0
1 10 10
2 0 20
<BLANKLINE>
[3 rows x 2 columns]
You can filter the values in the dataframe based on a condition. The values matching the condition would be kept, and not matching would be replaced. The default replacement value is NA
. For example, when the condition is a dataframe:
>>> df.mask(df > 0)
a b
0 <NA> 0
1 <NA> <NA>
2 0 <NA>
<BLANKLINE>
[3 rows x 2 columns]
You can specify a custom replacement value for non-matching values.
>>> df.mask(df > 0, -1)
a b
0 -1 0
1 -1 -1
2 0 -1
<BLANKLINE>
[3 rows x 2 columns]
Besides dataframe, the condition can be a series too. For example:
>>> df.mask(df['a'] > 10, -1)
a b
0 -1 -1
1 10 10
2 0 20
<BLANKLINE>
[3 rows x 2 columns]
As for the replacement, it can be a dataframe too. For example:
>>> df.mask(df > 10, -df)
a b
0 -20 0
1 10 10
2 0 -20
<BLANKLINE>
[3 rows x 2 columns]
>>> df.mask(df['a'] > 10, -df)
a b
0 -20 0
1 10 10
2 0 20
<BLANKLINE>
[3 rows x 2 columns]
Please note, replacement doesn't support Series for now. In pandas, when specifying a Series as replacement, the axis value should be specified at the same time, which is not supported in bigframes DataFrame.
Parameters Name Descriptioncond
bool Series/DataFrame, array-like, or callable
Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and returns boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
other
scalar, DataFrame, or callable
Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the DataFrame and returns scalar or DataFrame. The callable must not change input DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).
Returns Type DescriptionDataFrame
DataFrame after the replacement. max
max(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return the maximum of the values over the requested axis.
If you want the index of the maximum, use idxmax
. This is the equivalent of the numpy.ndarray
method argmax
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
A B
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
Finding the maximum value in each column (the default behavior without an explicit axis parameter).
>>> df.max()
A 3
B 4
dtype: Int64
Finding the maximum value in each row.
>>> df.max(axis=1)
0 2
1 4
dtype: Int64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool. default False
Default False. Include only float, int, boolean columns.
meanmean(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return the mean of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
A B
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
Calculating the mean of each column (the default behavior without an explicit axis parameter).
>>> df.mean()
A 2.0
B 3.0
dtype: Float64
Calculating the mean of each row.
>>> df.mean(axis=1)
0 1.5
1 3.5
dtype: Float64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool. default False
Default False. Include only float, int, boolean columns.
medianmedian(
*, numeric_only: bool = False, exact: bool = True
) -> bigframes.series.Series
Return the median of the values over colunms.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
A B
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
Finding the median value of each column.
>>> df.median()
A 2.0
B 3.0
dtype: Float64
Parameters Name Description numeric_only
bool. default False
Default False. Include only float, int, boolean columns.
exact
bool. default True
Default True. Get the exact median instead of an approximate one.
meltmelt(
id_vars: typing.Optional[typing.Iterable[typing.Hashable]] = None,
value_vars: typing.Optional[typing.Iterable[typing.Hashable]] = None,
var_name: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
value_name: typing.Hashable = "value",
)
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars
), while all other columns, considered measured variables (value_vars
), are "unpivoted" to the row axis, leaving just two non-identifier columns, 'variable' and 'value'.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, None, 3, 4, 5],
... "B": [1, 2, 3, 4, 5],
... "C": [None, 3.5, None, 4.5, 5.0]})
>>> df
A B C
0 1.0 1 <NA>
1 <NA> 2 3.5
2 3.0 3 <NA>
3 4.0 4 4.5
4 5.0 5 5.0
<BLANKLINE>
[5 rows x 3 columns]
Using melt
without optional arguments:
>>> df.melt()
variable value
0 A 1.0
1 A <NA>
2 A 3.0
3 A 4.0
4 A 5.0
5 B 1.0
6 B 2.0
7 B 3.0
8 B 4.0
9 B 5.0
10 C <NA>
11 C 3.5
12 C <NA>
13 C 4.5
14 C 5.0
<BLANKLINE>
[15 rows x 2 columns]
Using melt
with id_vars
and value_vars
:
>>> df.melt(id_vars='A', value_vars=['B', 'C'])
A variable value
0 1.0 B 1.0
1 <NA> B 2.0
2 3.0 B 3.0
3 4.0 B 4.0
4 5.0 B 5.0
5 1.0 C <NA>
6 <NA> C 3.5
7 3.0 C <NA>
8 4.0 C 4.5
9 5.0 C 5.0
<BLANKLINE>
[10 rows x 3 columns]
Parameters Name Description id_vars
tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars
tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars
.
var_name
scalar
Name to use for the 'variable' column. If None it uses frame.columns.name
or 'variable'.
value_name
scalar, default 'value'
Name to use for the 'value' column.
Returns Type Descriptionbigframes.pandas.DataFrame
Unpivoted DataFrame. memory_usage
memory_usage(index: bool = True)
Return the memory usage of each column in bytes.
The memory usage can optionally include the contribution of the index and elements of object
dtype.
This value is displayed in DataFrame.info
by default. This can be suppressed by setting pandas.options.display.memory_usage
to False.
index
bool, default True
Specifies whether to include the memory usage of the DataFrame's index in returned Series. If index=True
, the memory usage of the index is the first item in the output.
bigframes.pandas.Series
A Series whose index is the original column names and whose values is the memory usage of each column in bytes. merge
merge(
right: bigframes.dataframe.DataFrame,
how: typing.Literal["inner", "left", "outer", "right", "cross"] = "inner",
on: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
*,
left_on: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
right_on: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
sort: bool = False,
suffixes: tuple[str, str] = ("_x", "_y")
) -> bigframes.dataframe.DataFrame
Merge DataFrame objects with a database-style join.
The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Warning: If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results. Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Merge DataFrames df1 and df2 by specifying type of merge:
>>> df1 = bpd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df1
a b
0 foo 1
1 bar 2
<BLANKLINE>
[2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df2
a c
0 foo 3
1 baz 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df1.merge(df2, how="inner", on="a")
a b c
0 foo 1 3
<BLANKLINE>
[1 rows x 3 columns]
>>> df1.merge(df2, how='left', on='a')
a b c
0 foo 1 3
1 bar 2 <NA>
<BLANKLINE>
[2 rows x 3 columns]
Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.
>>> df1 = bpd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [1, 2, 3, 5]})
>>> df1
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
<BLANKLINE>
[4 rows x 2 columns]
>>> df2 = bpd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [5, 6, 7, 8]})
>>> df2
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
<BLANKLINE>
[4 rows x 2 columns]
>>> df1.merge(df2, left_on='lkey', right_on='rkey')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 bar 2 bar 6
3 baz 3 baz 7
4 foo 5 foo 5
5 foo 5 foo 8
<BLANKLINE>
[6 rows x 4 columns]
Parameters Name Description on
label or list of labels
Columns to join on. It must be found in both DataFrames. Either on or left_on + right_on must be passed in.
left_on
label or list of labels
Columns to join on in the left DataFrame. Either on or left_on + right_on must be passed in.
right_on
label or list of labels
Columns to join on in the right DataFrame. Either on or left_on + right_on must be passed in.
Exceptions Type DescriptionValueError
If value for on
is specified for cross join. ValueError
If on
or left_on
+ right_on
are not specified when on
is None
. ValueError
If on
and left_on
+ right_on
are specified when on
is not None
. ValueError
If no column with the provided label is found in self
for left join. ValueError
If no column with the provided label is found in self
for right join. Returns Type Description bigframes.pandas.DataFrame
A DataFrame of the two merged objects. min
min(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return the minimum of the values over the requested axis.
If you want the index of the minimum, use idxmin
. This is the equivalent of the numpy.ndarray
method argmin
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
A B
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
Finding the minimum value in each column (the default behavior without an explicit axis parameter).
>>> df.min()
A 1
B 2
dtype: Int64
Finding the minimum value in each row.
>>> df.min(axis=1)
0 1
1 3
dtype: Int64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool, default False
Default False. Include only float, int, boolean columns.
modmod(
other: int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get modulo of DataFrame and other, element-wise (binary operator %
).
Equivalent to dataframe % other
. With reverse version, rmod
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].mod(df['B'])
0 1
1 2
2 3
dtype: Int64
You can also use arithmetic operator %
:
>>> df['A'] % (df['B'])
0 1
1 2
2 3
dtype: Int64
Parameter Name Description axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. mul
mul(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get multiplication of DataFrame and other, element-wise (binary operator *
).
Equivalent to dataframe * other
. With reverse version, rmul
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].mul(df['B'])
0 4
1 10
2 18
dtype: Int64
You can also use arithmetic operator *
:
>>> df['A'] * (df['B'])
0 4
1 10
2 18
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. multiply
multiply(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get multiplication of DataFrame and other, element-wise (binary operator *
).
Equivalent to dataframe * other
. With reverse version, rmul
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].mul(df['B'])
0 4
1 10
2 18
dtype: Int64
You can also use arithmetic operator *
:
>>> df['A'] * (df['B'])
0 4
1 10
2 18
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. ne
ne(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrame
Get not equal to of DataFrame and other, element-wise (binary operator ne
).
Among flexible wrappers (eq
, ne
, le
, lt
, ge
, gt
) to comparison operators.
Equivalent to ==
, !=
, <=
, <
, >=
, >
with support to choose axis (rows or columns) and level for comparison.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].ne(360)
circle False
triangle True
rectangle False
Name: degrees, dtype: boolean
You can also use arithmetic operator !=
:
>>> df["degrees"] != 360
circle False
triangle True
rectangle False
Name: degrees, dtype: boolean
Parameters Name Description other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns (1 or 'columns').
Returns Type Descriptionbigframes.pandas.DataFrame
Result of the comparison. nlargest
nlargest(
n: int,
columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
keep: str = "first",
) -> bigframes.dataframe.DataFrame
Return the first n
rows ordered by columns
in descending order.
Return the first n
rows with the largest values in columns
, in descending order. The columns that are not specified are returned as well, but not used for ordering.
This method is equivalent to df.sort_values(columns, ascending=False).head(n)
, but more performant.
object
or category
dtypes, TypeError
is raised. Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 1, 3, 3, 5, 5],
... "B": [5, 6, 3, 4, 1, 2],
... "C": ['a', 'b', 'a', 'b', 'a', 'b']})
>>> df
A B C
0 1 5 a
1 1 6 b
2 3 3 a
3 3 4 b
4 5 1 a
5 5 2 b
<BLANKLINE>
[6 rows x 3 columns]
Returns rows with the largest value in 'A', including all ties:
>>> df.nlargest(1, 'A', keep = "all")
A B C
4 5 1 a
5 5 2 b
<BLANKLINE>
[2 rows x 3 columns]
Returns the first row with the largest value in 'A', default behavior in case of ties:
>>> df.nlargest(1, 'A')
A B C
4 5 1 a
<BLANKLINE>
[1 rows x 3 columns]
Returns the last row with the largest value in 'A' in case of ties:
>>> df.nlargest(1, 'A', keep = "last")
A B C
5 5 2 b
<BLANKLINE>
[1 rows x 3 columns]
Returns the row with the largest combined values in both 'A' and 'C':
>>> df.nlargest(1, ['A', 'C'])
A B C
5 5 2 b
<BLANKLINE>
[1 rows x 3 columns]
Parameters Name Description n
int
Number of rows to return.
columns
label or list of labels
Column label(s) to order by.
keep
{'first', 'last', 'all'}, default 'first'
Where there are duplicate values: - first
: prioritize the first occurrence(s) - last
: prioritize the last occurrence(s) - all
: do not drop any duplicates, even it means selecting more than n
items.
ValueError
If value of keep
is not first
, last
, or all
. Returns Type Description bigframes.pandas.DataFrame
The first n
rows ordered by the given columns in descending order. notna
notna() -> bigframes.dataframe.DataFrame
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings ''
or numpy.inf
are not considered NA values. NA values get mapped to False values.
NDFrame
Mask of bool values for each element that indicates whether an element is not an NA value. notnull
notnull() -> bigframes.dataframe.DataFrame
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings ''
or numpy.inf
are not considered NA values. NA values get mapped to False values.
NDFrame
Mask of bool values for each element that indicates whether an element is not an NA value. nsmallest
nsmallest(
n: int,
columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
keep: str = "first",
) -> bigframes.dataframe.DataFrame
Return the first n
rows ordered by columns
in ascending order.
Return the first n
rows with the smallest values in columns
, in ascending order. The columns that are not specified are returned as well, but not used for ordering.
This method is equivalent to df.sort_values(columns, ascending=True).head(n)
, but more performant.
object
or category
dtypes, TypeError
is raised. Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 1, 3, 3, 5, 5],
... "B": [5, 6, 3, 4, 1, 2],
... "C": ['a', 'b', 'a', 'b', 'a', 'b']})
>>> df
A B C
0 1 5 a
1 1 6 b
2 3 3 a
3 3 4 b
4 5 1 a
5 5 2 b
<BLANKLINE>
[6 rows x 3 columns]
Returns rows with the smallest value in 'A', including all ties:
>>> df.nsmallest(1, 'A', keep = "all")
A B C
0 1 5 a
1 1 6 b
<BLANKLINE>
[2 rows x 3 columns]
Returns the first row with the smallest value in 'A', default behavior in case of ties:
>>> df.nsmallest(1, 'A')
A B C
0 1 5 a
<BLANKLINE>
[1 rows x 3 columns]
Returns the last row with the smallest value in 'A' in case of ties:
>>> df.nsmallest(1, 'A', keep = "last")
A B C
1 1 6 b
<BLANKLINE>
[1 rows x 3 columns]
Returns rows with the smallest values in 'A' and 'C'
>>> df.nsmallest(1, ['A', 'C'])
A B C
0 1 5 a
<BLANKLINE>
[1 rows x 3 columns]
Parameters Name Description n
int
Number of rows to return.
columns
label or list of labels
Column label(s) to order by.
keep
{'first', 'last', 'all'}, default 'first'
Where there are duplicate values: - first
: prioritize the first occurrence(s) - last
: prioritize the last occurrence(s) - all
: do not drop any duplicates, even it means selecting more than n
items.
ValueError
If value of keep
is not first
, last
, or all
. Returns Type Description bigframes.pandas.DataFrame
The first n
rows ordered by the given columns in ascending order. nunique
nunique() -> bigframes.series.Series
Count number of distinct elements in each column.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 2]})
>>> df
A B
0 3 1
1 1 2
2 2 2
<BLANKLINE>
[3 rows x 2 columns]
>>> df.nunique()
A 3
B 2
dtype: Int64
pct_change
pct_change(periods: int = 1) -> bigframes.dataframe.DataFrame
Fractional change between the current and a prior element.
Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.
Note: Despite the name of this method, it calculates fractional change (also known as per unit change or relative change) and not percentage change. If you need the percentage change, multiply these values by 100. Parameter Name Descriptionperiods
int, default 1
Periods to shift for forming percent change.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series
The same type as the calling object. peek
peek(
n: int = 5, *, force: bool = True, allow_large_results=None
) -> pandas.core.frame.DataFrame
Preview n arbitrary rows from the dataframe. No guarantees about row selection or ordering. DataFrame.peek(force=False)
will always be very fast, but will not succeed if data requires full data scanning. Using force=True
will always succeed, but may be perform queries. Query results will be cached so that future steps will benefit from these queries.
n
int, default 5
The number of rows to select from the dataframe. Which N rows are returned is non-deterministic.
force
bool, default True
If the data cannot be peeked efficiently, the dataframe will instead be fully materialized as part of the operation if force=True
. If force=False
, the operation will throw a ValueError.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Exceptions Type DescriptionValueError
If force=False and data cannot be efficiently peeked. Returns Type Description pandas.DataFrame
A pandas DataFrame with n rows. pivot
pivot(
*,
columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
index: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
values: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None
) -> bigframes.dataframe.DataFrame
Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a "pivot" table) based on column values. Uses unique values from specified index
/ columns
to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... "foo": ["one", "one", "one", "two", "two"],
... "bar": ["A", "B", "C", "A", "B"],
... "baz": [1, 2, 3, 4, 5],
... "zoo": ['x', 'y', 'z', 'q', 'w']
... })
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
<BLANKLINE>
[5 rows x 4 columns]
Using pivot
without optional arguments:
>>> df.pivot(columns='foo')
bar baz zoo
foo one two one two one two
0 A <NA> 1 <NA> x <NA>
1 B <NA> 2 <NA> y <NA>
2 C <NA> 3 <NA> z <NA>
3 <NA> A <NA> 4 <NA> q
4 <NA> B <NA> 5 <NA> w
<BLANKLINE>
[5 rows x 6 columns]
Using pivot
with index
and values
:
>>> df.pivot(columns='foo', index='bar', values='baz')
foo one two
bar
A 1 4
B 2 5
C 3 <NA>
<BLANKLINE>
[3 rows x 2 columns]
Parameters Name Description columns
str or object or a list of str
Column to use to make new frame's columns.
index
str or object or a list of str, optional
Column to use to make new frame's index. If not given, uses existing index.
values
str, object or a list of the previous, optional
Column(s) to use for populating new frame's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.
Returns Type Descriptionbigframes.pandas.DataFrame
Returns reshaped DataFrame. pivot_table
pivot_table(
values: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
index: typing.Optional[
typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
] = None,
columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
aggfunc: str = "mean",
) -> bigframes.dataframe.DataFrame
Create a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'Product': ['Product A', 'Product B', 'Product A', 'Product B', 'Product A', 'Product B'],
... 'Region': ['East', 'West', 'East', 'West', 'West', 'East'],
... 'Sales': [100, 200, 150, 100, 200, 150],
... 'Rating': [3, 5, 4, 3, 3, 5]
... })
>>> df
Product Region Sales Rating
0 Product A East 100 3
1 Product B West 200 5
2 Product A East 150 4
3 Product B West 100 3
4 Product A West 200 3
5 Product B East 150 5
<BLANKLINE>
[6 rows x 4 columns]
Using pivot_table
with default aggfunc "mean":
>>> pivot_table = df.pivot_table(
... values=['Sales', 'Rating'],
... index='Product',
... columns='Region'
... )
>>> pivot_table
Rating Sales
Region East West East West
Product
Product A 3.5 3.0 125.0 200.0
Product B 5.0 4.0 150.0 150.0
<BLANKLINE>
[2 rows x 4 columns]
Using pivot_table
with specified aggfunc "max":
>>> pivot_table = df.pivot_table(
... values=['Sales', 'Rating'],
... index='Product',
... columns='Region',
... aggfunc="max"
... )
>>> pivot_table
Rating Sales
Region East West East West
Product
Product A 4 3 150 200
Product B 5 5 150 200
<BLANKLINE>
[2 rows x 4 columns]
Parameters Name Description values
str, object or a list of the previous, optional
Column(s) to use for populating new frame's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.
index
str or object or a list of str, optional
Column to use to make new frame's index. If not given, uses existing index.
columns
str or object or a list of str
Column to use to make new frame's columns.
aggfunc
str, default "mean"
Aggregation function name to compute summary statistics (e.g., 'sum', 'mean').
Returns Type Descriptionbigframes.pandas.DataFrame
An Excel style pivot table. pow
pow(
other: int | bigframes.series.Series, axis: str | int = "columns"
) -> bigframes.dataframe.DataFrame
Get Exponential power of dataframe and other, element-wise (binary operator **
).
Equivalent to dataframe ** other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].pow(df['B'])
0 1
1 32
2 729
dtype: Int64
You can also use arithmetic operator **
:
>>> df['A'] ** (df['B'])
0 1
1 32
2 729
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. prod
prod(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return the product of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3], "B": [4.5, 5.5, 6.5]})
>>> df
A B
0 1 4.5
1 2 5.5
2 3 6.5
<BLANKLINE>
[3 rows x 2 columns]
Calculating the product of each column(the default behavior without an explicit axis parameter):
>>> df.prod()
A 6.0
B 160.875
dtype: Float64
Calculating the product of each row:
>>> df.prod(axis=1)
0 4.5
1 11.0
2 19.5
dtype: Float64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool. default False
Include only float, int, boolean columns.
productproduct(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return the product of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3], "B": [4.5, 5.5, 6.5]})
>>> df
A B
0 1 4.5
1 2 5.5
2 3 6.5
<BLANKLINE>
[3 rows x 2 columns]
Calculating the product of each column(the default behavior without an explicit axis parameter):
>>> df.prod()
A 6.0
B 160.875
dtype: Float64
Calculating the product of each row:
>>> df.prod(axis=1)
0 4.5
1 11.0
2 19.5
dtype: Float64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool. default False
Include only float, int, boolean columns.
quantilequantile(
q: typing.Union[float, typing.Sequence[float]] = 0.5, *, numeric_only: bool = False
)
Return values at the given quantile over requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
... columns=['a', 'b'])
>>> df.quantile(.1)
a 1.3
b 3.7
Name: 0.1, dtype: Float64
>>> df.quantile([.1, .5])
a b
0.1 1.3 3.7
0.5 2.5 55.0
<BLANKLINE>
[2 rows x 2 columns]
Parameters Name Description q
float or array-like, default 0.5 (50% quantile)
Value between 0 <= q <= 1, the quantile(s) to compute.
numeric_only
bool, default False
Include only float
, int
or boolean
data.
bigframes.pandas.DataFrame or bigframes.pandas.Series
If q
is an array, a DataFrame will be returned where the index is q
, the columns are the columns of self, and the values are the quantiles. If q
is a float, a Series will be returned where the index is the columns of self and the values are the quantiles. query
query(expr: str) -> bigframes.dataframe.DataFrame
Query the columns of a DataFrame with a boolean expression.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': range(1, 6),
... 'B': range(10, 0, -2),
... 'C C': range(10, 5, -1)})
>>> df
A B C C
0 1 10 10
1 2 8 9
2 3 6 8
3 4 4 7
4 5 2 6
<BLANKLINE>
[5 rows x 3 columns]
>>> df.query('A > B')
A B C C
4 5 2 6
<BLANKLINE>
[1 rows x 3 columns]
The previous expression is equivalent to
>>> df[df.A > df.B]
A B C C
4 5 2 6
<BLANKLINE>
[1 rows x 3 columns]
For columns with spaces in their name, you can use backtick quoting.
>>> df.query('B == `C C`')
A B C C
0 1 10 10
<BLANKLINE>
[1 rows x 3 columns]
The previous expression is equivalent to
>>> df[df.B == df['C C']]
A B C C
0 1 10 10
<BLANKLINE>
[1 rows x 3 columns]
Parameter Name Description expr
str
The query string to evaluate. You can refer to variables in the environment by prefixing them with an '@' character like @a + b
. You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named "Area (cm^2)" would be referenced as <code>Area (cm^2)</code>`<code>). Column names which are Python keywords (like "list", "for", "import", etc) cannot be used. For example, if one of your columns is called </code>a a<code> and you want to sum it with </code><code>b</code><code>, your query should be </code>`<code>a a</code> + b
.
None or bigframes.pandas.DataFrame
DataFrame result after the query operation, otherwise None. radd
radd(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get addition of DataFrame and other, element-wise (binary operator +
).
Equivalent to other + dataframe
. With reverse version, add
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].radd(df['B'])
0 5
1 7
2 9
dtype: Int64
You can also use arithmetic operator +
:
>>> df['A'] + df['B']
0 5
1 7
2 9
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. rank
rank(
axis=0,
method: str = "average",
numeric_only=False,
na_option: str = "keep",
ascending=True,
) -> bigframes.dataframe.DataFrame
Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the ranks of those values.
Parameters Name Descriptionmethod
{'average', 'min', 'max', 'first', 'dense'}, default 'average'
How to rank the group of records that have the same value (i.e. ties): average
: average rank of the group, min
: lowest rank in the group max: highest rank in the group,
first: ranks assigned in order they appear in the array,
dense`: like 'min', but rank always increases by 1 between groups.
numeric_only
bool, default False
For DataFrame objects, rank only numeric columns if set to True.
na_option
{'keep', 'top', 'bottom'}, default 'keep'
How to rank NaN values: keep
: assign NaN rank to NaN values, , top
: assign lowest rank to NaN values, bottom
: assign highest rank to NaN values.
ascending
bool, default True
Whether or not the elements should be ranked in ascending order.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series
Return a Series or DataFrame with data ranks as values. rdiv
rdiv(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get floating division of DataFrame and other, element-wise (binary operator /
).
Equivalent to other / dataframe
. With reverse version, truediv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df['A'].rtruediv(df['B'])
0 4.0
1 2.5
2 2.0
dtype: Float64
It's equivalent to using arithmetic operator: /
:
>>> df['B'] / (df['A'])
0 4.0
1 2.5
2 2.0
dtype: Float64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. reindex
reindex(
labels=None,
*,
index=None,
columns=None,
axis: typing.Optional[typing.Union[str, int]] = None,
validate: typing.Optional[bool] = None
)
Conform DataFrame to new index with optional filling logic.
Places NA in locations having no value in the previous index. A new object is produced.
Parameters Name Descriptionlabels
array-like, optional
New labels / index to conform the axis specified by 'axis' to.
index
array-like, optional
New labels for the index. Preferably an Index object to avoid duplicating data.
columns
array-like, optional
New labels for the columns. Preferably an Index object to avoid duplicating data.
axis
int or str, optional
Axis to target. Can be either the axis name ('index', 'columns') or number (0, 1).
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame with changed index. reindex_like
reindex_like(
other: bigframes.dataframe.DataFrame, *, validate: typing.Optional[bool] = None
)
Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional filling logic, placing Null in locations having no value in the previous index.
Parameter Name Descriptionother
Object of the same data type
Its row and column indices are used to define the new indices of this object.
Returns Type Descriptionbigframes.pandas.DataFrame or bigframes.pandas.Series
Same type as caller, but with changed indices on each axis. rename
Rename columns.
Dict values must be unique (1-to-1). Labels not contained in a dict will be left as-is. Extra labels listed don't throw an error.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df
A B
0 1 4
1 2 5
2 3 6
<BLANKLINE>
[3 rows x 2 columns]
Rename columns using a mapping:
>>> df.rename(columns={"A": "col1", "B": "col2"})
col1 col2
0 1 4
1 2 5
2 3 6
<BLANKLINE>
[3 rows x 2 columns]
Parameters Name Description columns
Mapping
Dict-like from old column labels to new column labels.
inplace
bool
Default False. Whether to modify the DataFrame rather than creating a new one.
Exceptions Type DescriptionKeyError
If any of the labels is not found. Returns Type Description bigframes.pandas.DataFrame None
DataFrame with the renamed axis labels or None if inplace=True
. rename_axis
Set the name of the axis for the index.
Note: Currently only accepts a single string parameter (the new name of the index). Parameters Name Descriptionmapper
str
Value to set the axis name attribute.
inplace
bool
Default False. Modifies the object directly, instead of creating a new Series or DataFrame.
Returns Type Descriptionbigframes.pandas.DataFrame None
DataFrame with the new index name or None if inplace=True
. reorder_levels
reorder_levels(
order: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
axis: int | str = 0,
)
Rearrange index levels using input order. May not drop or duplicate levels.
Parameters Name Descriptionorder
list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis
{0 or 'index', 1 or 'columns'}, default 0
Where to reorder levels.
Exceptions Type DescriptionValueError
If columns are not multi-index. Returns Type Description bigframes.pandas.DataFrame
DataFrame of rearranged index. replace
replace(to_replace: typing.Any, value: typing.Any = None, *, regex: bool = False)
Replace values given in to_replace
with value
.
Values of the Series/DataFrame are replaced with other values dynamically. This differs from updating with .loc
or .iloc
, which require you to specify a location to update with some value.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'int_col': [1, 1, 2, 3],
... 'string_col': ["a", "b", "c", "b"],
... })
Using scalar to_replace
and value
:
>>> df.replace("b", "e")
int_col string_col
0 1 a
1 1 e
2 2 c
3 3 e
<BLANKLINE>
[4 rows x 2 columns]
Using dictionary:
>>> df.replace({"a": "e", 2: 5})
int_col string_col
0 1 e
1 1 b
2 5 c
3 3 b
<BLANKLINE>
[4 rows x 2 columns]
Using regex:
>>> df.replace("[ab]", "e", regex=True)
int_col string_col
0 1 e
1 1 e
2 2 c
3 3 e
<BLANKLINE>
[4 rows x 2 columns]
Parameters Name Description to_replace
str, regex, list, int, float or None
How to find the values that will be replaced. numeric: numeric values equal to to_replace
will be replaced with value
str: string exactly matching to_replace
will be replaced with value
regex: regexs matching to_replace
will be replaced withvalue
list of str, regex, or numeric: First, if to_replace
and value
are both lists, they must be the same length. Second, if regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn't matter much for value
since there are only a few possible substitution regexes you can use. str, regex and numeric rules apply as above.
value
scalar, default None
Value to replace any values matching to_replace
with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
regex
bool, default False
Whether to interpret to_replace
and/or value
as regular expressions. If this is True
then to_replace
must be a string.
reset_index(*, drop: bool = False) -> bigframes.dataframe.DataFrame
Reset the index.
Reset the index of the DataFrame, and use the default one instead.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame([('bird', 389.0),
... ('bird', 24.0),
... ('mammal', 80.5),
... ('mammal', np.nan)],
... index=['falcon', 'parrot', 'lion', 'monkey'],
... columns=('class', 'max_speed'))
>>> df
class max_speed
falcon bird 389.0
parrot bird 24.0
lion mammal 80.5
monkey mammal <NA>
<BLANKLINE>
[4 rows x 2 columns]
When we reset the index, the old index is added as a column, and a new sequential index is used:
>>> df.reset_index()
index class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal <NA>
<BLANKLINE>
[4 rows x 3 columns]
We can use the drop
parameter to avoid the old index being added as a column:
>>> df.reset_index(drop=True)
class max_speed
0 bird 389.0
1 bird 24.0
2 mammal 80.5
3 mammal <NA>
<BLANKLINE>
[4 rows x 2 columns]
You can also use reset_index
with MultiIndex
.
>>> import pandas as pd
>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
... ('bird', 'parrot'),
... ('mammal', 'lion'),
... ('mammal', 'monkey')],
... names=['class', 'name'])
>>> columns = ['speed', 'max']
>>> df = bpd.DataFrame([(389.0, 'fly'),
... (24.0, 'fly'),
... (80.5, 'run'),
... (np.nan, 'jump')],
... index=index,
... columns=columns)
>>> df
speed max
class name
bird falcon 389.0 fly
parrot 24.0 fly
mammal lion 80.5 run
monkey <NA> jump
<BLANKLINE>
[4 rows x 2 columns]
>>> df.reset_index()
class name speed max
0 bird falcon 389.0 fly
1 bird parrot 24.0 fly
2 mammal lion 80.5 run
3 mammal monkey <NA> jump
<BLANKLINE>
[4 rows x 4 columns]
>>> df.reset_index(drop=True)
speed max
0 389.0 fly
1 24.0 fly
2 80.5 run
3 <NA> jump
<BLANKLINE>
[4 rows x 2 columns]
Parameter Name Description drop
bool, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame with the new index. rfloordiv
rfloordiv(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get integer division of DataFrame and other, element-wise (binary operator //
).
Equivalent to other // dataframe
. With reverse version, rfloordiv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df['A'].rfloordiv(df['B'])
0 4
1 2
2 2
dtype: Int64
It's equivalent to using arithmetic operator: //
:
>>> df['B'] // (df['A'])
0 4
1 2
2 2
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. rmod
rmod(
other: int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get modulo of DataFrame and other, element-wise (binary operator %
).
Equivalent to other % dataframe
. With reverse version, mod
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df['A'].rmod(df['B'])
0 0
1 1
2 0
dtype: Int64
It's equivalent to using arithmetic operator: %
:
>>> df['B'] % (df['A'])
0 0
1 1
2 0
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. rmul
rmul(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get multiplication of DataFrame and other, element-wise (binary operator *
).
Equivalent to other * dataframe
. With reverse version, mul
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].rmul(df['B'])
0 4
1 10
2 18
dtype: Int64
You can also use arithmetic operator *
:
>>> df['A'] * (df['B'])
0 4
1 10
2 18
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. rolling
rolling(
window: (
int
| pandas._libs.tslibs.timedeltas.Timedelta
| numpy.timedelta64
| datetime.timedelta
| str
),
min_periods=None,
on: str | None = None,
closed: typing.Literal["right", "left", "both", "neither"] = "right",
) -> bigframes.core.window.rolling.Window
Provide rolling window calculations.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([0,1,2,3,4])
>>> s.rolling(window=3).min()
0 <NA>
1 <NA>
2 0
3 1
4 2
dtype: Int64
>>> df = bpd.DataFrame({'A': [0,1,2,3], 'B': [0,2,4,6]})
>>> df.rolling(window=2, on='A', closed='both').sum()
A B
0 0 <NA>
1 1 2
2 2 6
3 3 12
<BLANKLINE>
[4 rows x 2 columns]
Parameters Name Description window
int, pandas.Timedelta, numpy.timedelta64, datetime.timedelta, str
Size of the moving window. If an integer, the fixed number of observations used for each window. If a string, the timedelta representation in string. This string must be parsable by pandas.Timedelta(). Otherwise, the time range for each window.
min_periods
int, default None
Minimum number of observations in window required to have a value; otherwise, result is np.nan
. For a window that is specified by an integer, min_periods
will default to the size of the window. For a window that is not spicified by an interger, min_periods
will default to 1.
on
str, optional
For a DataFrame, a column label on which to calculate the rolling window, rather than the DataFrame’s index.
closed
str, default 'right'
If 'right', the first point in the window is excluded from calculations. If 'left', the last point in the window is excluded from calculations. If 'both', the no points in the window are excluded from calculations. If 'neither', the first and last points in the window are excluded from calculations.
roundround(
decimals: typing.Union[int, dict[typing.Hashable, int]] = 0,
) -> bigframes.dataframe.DataFrame
Round a DataFrame to a variable number of decimal places.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
... columns=['dogs', 'cats'])
>>> df
dogs cats
0 0.21 0.32
1 0.01 0.67
2 0.66 0.03
3 0.21 0.18
<BLANKLINE>
[4 rows x 2 columns]
By providing an integer each column is rounded to the same number
of decimal places
>>> df.round(1)
dogs cats
0 0.2 0.3
1 0.0 0.7
2 0.7 0.0
3 0.2 0.2
<BLANKLINE>
[4 rows x 2 columns]
With a dict, the number of places for specific columns can be
specified with the column names as key and the number of decimal
places as value
>>> df.round({'dogs': 1, 'cats': 0})
dogs cats
0 0.2 0.0
1 0.0 1.0
2 0.7 0.0
3 0.2 0.0
<BLANKLINE>
[4 rows x 2 columns]
Using a Series, the number of places for specific columns can be
specified with the column names as index and the number of
decimal places as value
>>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
dogs cats
0 0.2 0.0
1 0.0 1.0
2 0.7 0.0
3 0.2 0.0
<BLANKLINE>
[4 rows x 2 columns]
Parameter Name Description decimals
int, dict, Series
Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals
is a dict-like, or in the index if decimals
is a Series. Any columns not included in decimals
will be left as is. Elements of decimals
which are not columns of the input will be ignored.
bigframes.pandas.DataFrame
A DataFrame with the affected columns rounded to the specified number of decimal places. rpow
rpow(
other: int | bigframes.series.Series, axis: str | int = "columns"
) -> bigframes.dataframe.DataFrame
Get Exponential power of dataframe and other, element-wise (binary operator rpow
).
Equivalent to other ** dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df['A'].rpow(df['B'])
0 4
1 25
2 216
dtype: Int64
It's equivalent to using arithmetic operator: **
:
>>> df['B'] ** (df['A'])
0 4
1 25
2 216
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. rsub
rsub(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get subtraction of DataFrame and other, element-wise (binary operator -
).
Equivalent to other - dataframe
. With reverse version, sub
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df['A'].rsub(df['B'])
0 3
1 3
2 3
dtype: Int64
It's equivalent to using arithmetic operator: -
:
>>> df['B'] - (df['A'])
0 3
1 3
2 3
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. rtruediv
rtruediv(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get floating division of DataFrame and other, element-wise (binary operator /
).
Equivalent to other / dataframe
. With reverse version, truediv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
>>> df['A'].rtruediv(df['B'])
0 4.0
1 2.5
2 2.0
dtype: Float64
It's equivalent to using arithmetic operator: /
:
>>> df['B'] / (df['A'])
0 4.0
1 2.5
2 2.0
dtype: Float64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. sample
sample(
n: typing.Optional[int] = None,
frac: typing.Optional[float] = None,
*,
random_state: typing.Optional[int] = None,
sort: typing.Optional[typing.Union[bool, typing.Literal["random"]]] = "random"
) -> bigframes.dataframe.DataFrame
Return a random sample of items from an axis of object.
You can use random_state
for reproducibility.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
<BLANKLINE>
[4 rows x 3 columns]
Fetch one random row from the DataFrame (Note that we use random_state
to ensure reproducibility of the examples):
>>> df.sample(random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
<BLANKLINE>
[1 rows x 3 columns]
A random 50% sample of the DataFrame:
>>> df.sample(frac=0.5, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
<BLANKLINE>
[2 rows x 3 columns]
Extract 3 random elements from the Series df['num_legs']
:
>>> s = df['num_legs']
>>> s.sample(n=3, random_state=1)
dog 4
fish 0
spider 8
Name: num_legs, dtype: Int64
Parameters Name Description n
Optional[int], default None
Number of items from axis to return. Cannot be used with frac
. Default = 1 if frac
= None.
frac
Optional[float], default None
Fraction of axis items to return. Cannot be used with n
.
random_state
Optional[int], default None
Seed for random number generator.
sort
Optional[bool|Literal["random"]], default "random"
ValueError
If both n
and frac
are specified. Returns Type Description bigframes.pandas.DataFrame or bigframes.pandas.Series
A new object of same type as caller containing n
items randomly sampled from the caller object. scatter
scatter(
x: typing.Optional[typing.Hashable] = None,
y: typing.Optional[typing.Hashable] = None,
s: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
c: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
**kwargs
)
Create a scatter plot with varying marker point size and color.
This function calls pandas.plot
to generate a plot with a random sample of items. For consistent results, the random sampling is reproducible. Use the sampling_random_state
parameter to modify the sampling seed.
Examples:
Let's see how to draw a scatter plot using coordinates from the values in a DataFrame's columns.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
... [6.4, 3.2, 1], [5.9, 3.0, 2]],
... columns=['length', 'width', 'species'])
>>> ax1 = df.plot.scatter(x='length',
... y='width',
... c='DarkBlue')
And now with the color determined by a column as well.
>>> ax2 = df.plot.scatter(x='length',
... y='width',
... c='species',
... colormap='viridis')
Parameters Name Description x
int or str
The column name or column position to be used as horizontal coordinates for each point.
y
int or str
The column name or column position to be used as vertical coordinates for each point.
s
str, scalar or array-like, optional
The size of each point. Possible values are: - A string with the name of the column to be used for marker's size. - A single scalar so all points have the same size.
c
str, int or array-like, optional
The color of each point. Possible values are: - A single color string referred to by name, RGB or RGBA code, for instance 'red' or '#a98d19'. - A column name or position whose values will be used to color the marker points according to a colormap.
sampling_n
int, default 100
Number of random items for plotting.
sampling_random_state
int, default 0
Seed for random number generator.
Returns Type Descriptionmatplotlib.axes.Axes or np.ndarray of them
An ndarray is returned with one matplotlib.axes.Axes
per column when subplots=True
. select_dtypes
select_dtypes(include=None, exclude=None) -> bigframes.dataframe.DataFrame
Return a subset of the DataFrame's columns based on the column dtypes.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': ["hello", "world"], 'col3': [True, False]})
>>> df.select_dtypes(include=['Int64'])
col1
0 1
1 2
<BLANKLINE>
[2 rows x 1 columns]
>>> df.select_dtypes(exclude=['Int64'])
col2 col3
0 hello True
1 world False
<BLANKLINE>
[2 rows x 2 columns]
Parameters Name Description include
scalar or list-like
A selection of dtypes or strings to be included.
exclude
scalar or list-like
A selection of dtypes or strings to be excluded.
Returns Type Descriptionbigframes.pandas.DataFrame
The subset of the frame including the dtypes in include
and excluding the dtypes in exclude
. set_index
set_index(
keys: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
append: bool = False,
drop: bool = True,
) -> bigframes.dataframe.DataFrame
Set the DataFrame index using existing columns.
Set the DataFrame index (row labels) using one existing column. The index can replace the existing index.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'month': [1, 4, 7, 10],
... 'year': [2012, 2014, 2013, 2014],
... 'sale': [55, 40, 84, 31]})
>>> df
month year sale
0 1 2012 55
1 4 2014 40
2 7 2013 84
3 10 2014 31
<BLANKLINE>
[4 rows x 3 columns]
Set the 'month' column to become the index:
>>> df.set_index('month')
year sale
month
1 2012 55
4 2014 40
7 2013 84
10 2014 31
<BLANKLINE>
[4 rows x 2 columns]
Create a MultiIndex using columns 'year' and 'month':
>>> df.set_index(['year', 'month'])
sale
year month
2012 1 55
2014 4 40
2013 7 84
2014 10 31
<BLANKLINE>
[4 rows x 1 columns]
Exceptions Type Description KeyError
If key(s) are not in the columns. Returns Type Description bigframes.pandas.DataFrame
Changed row labels. shift
shift(periods: int = 1) -> bigframes.dataframe.DataFrame
Shift index by desired number of periods.
Shifts the index without realigning the data.
Returns Type DescriptionNDFrame
Copy of input object, shifted. skew
skew(*, numeric_only: bool = False)
Return unbiased skew over columns.
Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3, 4, 5],
... 'B': [5, 4, 3, 2, 1],
... 'C': [2, 2, 3, 2, 2]})
>>> df
A B C
0 1 5 2
1 2 4 2
2 3 3 3
3 4 2 2
4 5 1 2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the skewness of each column.
>>> df.skew()
A 0.0
B 0.0
C 2.236068
dtype: Float64
Parameter Name Description numeric_only
bool, default False
Include only float, int, boolean columns.
sort_indexSort object by labels (along an axis).
Parameters Name Descriptionascending
bool, default True
Sort ascending vs. descending.
inplace
bool, default False
Whether to modify the DataFrame rather than creating a new one.
na_position
{'first', 'last'}, default 'last'
Puts NaNs at the beginning if first
; last
puts NaNs at the end. Not implemented for MultiIndex.
ValueError
If value of na_position
is not one of first
or last
. ValueError
If length of ascending
dose not equal length of by
. Returns Type Description bigframes.pandas.DataFrame
DataFrame with sorted values or None if inplace=True. sort_values
Sort by the values along row axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'col1': ['A', 'A', 'B', bpd.NA, 'D', 'C'],
... 'col2': [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... 'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 <NA> 8 4 D
4 D 7 2 e
5 C 4 3 F
<BLANKLINE>
[6 rows x 4 columns]
Sort by col1:
>>> df.sort_values(by=['col1'])
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 <NA> 8 4 D
<BLANKLINE>
[6 rows x 4 columns]
Sort by multiple columns:
>>> df.sort_values(by=['col1', 'col2'])
col1 col2 col3 col4
1 A 1 1 B
0 A 2 0 a
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 <NA> 8 4 D
<BLANKLINE>
[6 rows x 4 columns]
Sort Descending:
>>> df.sort_values(by='col1', ascending=False)
col1 col2 col3 col4
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
3 <NA> 8 4 D
<BLANKLINE>
[6 rows x 4 columns]
Putting NAs first:
>>> df.sort_values(by='col1', ascending=False, na_position='first')
col1 col2 col3 col4
3 <NA> 8 4 D
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
<BLANKLINE>
[6 rows x 4 columns]
Parameters Name Description by
str or Sequence[str]
Name or list of names to sort by.
ascending
bool or Sequence[bool], default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
inplace
bool, default False
If True, perform operation in-place.
kind
str, default 'quicksort'
Choice of sorting algorithm. Accepts 'quicksort', 'mergesort', 'heapsort', 'stable'. Ignored except when determining whether to sort stably. 'mergesort' or 'stable' will result in stable reorder.
na_position
{'first', 'last'}, default last
{'first', 'last'}
, default 'last' Puts NaNs at the beginning if first
; last
puts NaNs at the end.
ValueError
If value of na_position
is not one of first
or last
. Returns Type Description bigframes.pandas.DataFram or None
DataFrame with sorted values or None if inplace=True. stack
stack(level: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = -1)
Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 3], 'B': [2, 4]}, index=['foo', 'bar'])
>>> df
A B
foo 1 2
bar 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.stack()
foo A 1
B 2
bar A 3
B 4
dtype: Int64
Parameter Name Description level
int, str, or list of these, default -1 (last level)
Level(s) to stack from the column axis onto the index axis.
stdstd(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return sample standard deviation over columns.
Normalized by N-1 by default.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3, 4, 5],
... "B": [3, 4, 3, 2, 1],
... "C": [2, 2, 3, 2, 2]})
>>> df
A B C
0 1 3 2
1 2 4 2
2 3 3 3
3 4 2 2
4 5 1 2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the standard deviation of each column:
>>> df.std()
A 1.581139
B 1.140175
C 0.447214
dtype: Float64
Parameter Name Description numeric_only
bool. default False
Default False. Include only float, int, boolean columns.
subsub(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get subtraction of DataFrame and other, element-wise (binary operator -
).
Equivalent to dataframe - other
. With reverse version, rsub
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].sub(df['B'])
0 -3
1 -3
2 -3
dtype: Int64
You can also use arithmetic operator -
:
>>> df['A'] - (df['B'])
0 -3
1 -3
2 -3
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. subtract
subtract(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get subtraction of DataFrame and other, element-wise (binary operator -
).
Equivalent to dataframe - other
. With reverse version, rsub
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].sub(df['B'])
0 -3
1 -3
2 -3
dtype: Int64
You can also use arithmetic operator -
:
>>> df['A'] - (df['B'])
0 -3
1 -3
2 -3
dtype: Int64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. sum
sum(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return the sum of the values over the requested axis.
This is equivalent to the method numpy.sum
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
A B
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
Calculating the sum of each column (the default behavior without an explicit axis parameter).
>>> df.sum()
A 4
B 6
dtype: Int64
Calculating the sum of each row.
>>> df.sum(axis=1)
0 3
1 7
dtype: Int64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool. default False
Default False. Include only float, int, boolean columns.
swaplevelswaplevel(i: int = -2, j: int = -1, axis: int | str = 0)
Swap levels i and j in a MultiIndex
.
Default is to swap the two innermost levels of the index.
Parameters Name Descriptioni
int or str
Levels of the indices to be swapped. Can pass level name as string.
j
int or str
Levels of the indices to be swapped. Can pass level name as string.
axis
{0 or 'index', 1 or 'columns'}, default 0
The axis to swap levels on. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.
Exceptions Type DescriptionValueError
If columns are not multi-index. Returns Type Description bigframes.pandas.DataFrame
DataFrame with levels swapped in MultiIndex. tail
tail(n: int = 5) -> bigframes.dataframe.DataFrame
Return the last n
rows.
This function returns last n
rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.
For negative values of n
, this function returns all rows except the first |n|
rows, equivalent to df[|n|:]
.
If n is larger than the number of rows, this function returns all rows.
Parameter Name Descriptionn
int, default 5
Number of rows to select.
Returns Type Descriptionbigframes.pandas.DataFrame
The last n
rows of the caller object. take
take(
indices: typing.Sequence[int], axis: int | str | None = 0, **kwargs
) -> bigframes.dataframe.DataFrame
Return the elements in the given positional indices along an axis.
This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.
Parameters Name Descriptionindices
list-like
An array of ints indicating which positions to take.
axis
{0 or 'index', 1 or 'columns', None}, default 0
The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns. For Series this parameter is unused and defaults to 0.
to_arrowto_arrow(
*, ordered: bool = True, allow_large_results: typing.Optional[bool] = None
) -> pyarrow.lib.Table
Write DataFrame to an Arrow table / record batch.
Parameters Name Descriptionordered
bool, default True
Determines whether the resulting Arrow table will be ordered. In some cases, unordered may result in a faster-executing query.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionpyarrow.Table
A pyarrow Table with all rows and columns of this DataFrame. to_csv
to_csv(
path_or_buf=None,
sep=",",
*,
header: bool = True,
index: bool = True,
allow_large_results: typing.Optional[bool] = None
) -> typing.Optional[str]
Write object to a comma-separated values (csv) file on Cloud Storage.
Parameters Name Descriptionpath_or_buf
str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=''
, disabling universal newlines. If a binary file object is passed, mode
might need to contain a 'b'
. Alternatively, a destination URI of Cloud Storage files(s) to store the extracted dataframe in format of gs://<bucket_name>/<object_name_or_glob>
. If the data size is more than 1GB, you must use a wildcard to export the data into multiple files and the size of the files varies. None, file-like objects or local file paths not yet supported.
index
bool, default True
If True, write row names (index).
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB. This parameter has no effect when results are saved to Google Cloud Storage (GCS).
Returns Type DescriptionNone or str
If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None. to_dict
to_dict(orient: typing.Literal['dict', 'list', 'series', 'split', 'tight', 'records', 'index'] = 'dict', into: type[dict] = <class 'dict'>, *, allow_large_results: typing.Optional[bool] = None, **kwargs) -> dict | list[dict]
Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters (see below).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_dict()
{'col1': {np.int64(0): 1, np.int64(1): 2}, 'col2': {np.int64(0): 3, np.int64(1): 4}}
You can specify the return orientation.
>>> df.to_dict('series')
{'col1': 0 1
1 2
Name: col1, dtype: Int64,
'col2': 0 3
1 4
Name: col2, dtype: Int64}
>>> df.to_dict('split')
{'index': [0, 1], 'columns': ['col1', 'col2'], 'data': [[1, 3], [2, 4]]}
>>> df.to_dict("tight")
{'index': [0, 1],
'columns': ['col1', 'col2'],
'data': [[1, 3], [2, 4]],
'index_names': [None],
'column_names': [None]}
Parameters Name Description orient
str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}
Determines the type of the values of the dictionary. 'dict' (default) : dict like {column -> {index -> value}}. 'list' : dict like {column -> [values]}. 'series' : dict like {column -> Series(values)}. split' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}. 'tight' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values], 'index_names' -> [index.names], 'column_names' -> [column.names]}. 'records' : list like [{column -> value}, ... , {column -> value}]. 'index' : dict like {index -> {column -> value}}.
into
class, default dict
The collections.abc.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.
index
bool, default True
Whether to include the index item (and index_names item if orient
is 'tight') in the returned dictionary. Can only be False
when orient
is 'split' or 'tight'.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptiondict or list of dict
Return a collections.abc.Mapping object representing the DataFrame. The resulting transformation depends on the orient
parameter. to_excel
to_excel(
excel_writer,
sheet_name: str = "Sheet1",
*,
allow_large_results: typing.Optional[bool] = None,
**kwargs
) -> None
Write DataFrame to an Excel sheet.
To write a single DataFrame to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter
object with a target file name, and specify a sheet in the file to write to.
Multiple sheets may be written to by specifying unique sheet_name
. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter
object with a file name that already exists will result in the contents of the existing file being erased.
Examples:
>>> import bigframes.pandas as bpd
>>> import tempfile
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_excel(tempfile.TemporaryFile())
Parameters Name Description excel_writer
path-like, file-like, or ExcelWriter object
File path or existing ExcelWriter.
sheet_name
str, default 'Sheet1'
Name of sheet which will contain DataFrame.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
to_gbqto_gbq(
destination_table: typing.Optional[str] = None,
*,
if_exists: typing.Optional[typing.Literal["fail", "replace", "append"]] = None,
index: bool = True,
ordering_id: typing.Optional[str] = None,
clustering_columns: typing.Union[
pandas.core.indexes.base.Index, typing.Iterable[typing.Hashable]
] = (),
labels: dict[str, str] = {}
) -> str
Write a DataFrame to a BigQuery table.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Write a DataFrame to a BigQuery table.
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> # destination_table = PROJECT_ID + "." + DATASET_ID + "." + TABLE_NAME
>>> df.to_gbq("bigframes-dev.birds.test-numbers", if_exists="replace")
'bigframes-dev.birds.test-numbers'
Write a DataFrame to a temporary BigQuery table in the anonymous dataset.
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> destination = df.to_gbq(ordering_id="ordering_id")
>>> # The table created can be read outside of the current session.
>>> bpd.close_session() # Optional, to demonstrate a new session.
>>> bpd.read_gbq(destination, index_col="ordering_id")
col1 col2
ordering_id
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
Write a DataFrame to a BigQuery table with clustering columns:
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]})
>>> clustering_cols = ['col1', 'col3']
>>> df.to_gbq(
... "bigframes-dev.birds.test-clusters",
... if_exists="replace",
... clustering_columns=clustering_cols,
... )
'bigframes-dev.birds.test-clusters'
Parameters Name Description destination_table
Optional[str]
Name of table to be written, in the form dataset.tablename
or project.dataset.tablename
. If no destination_table
is set, a new temporary table is created in the BigQuery anonymous dataset.
if_exists
Optional[str]
Behavior when the destination table exists. When destination_table
is set, this defaults to 'fail'
. When destination_table
is not set, this field is not applicable. A new table is always created. Value can be one of: 'fail'
If table exists raise pandas_gbq.gbq.TableCreationError. 'replace'
If table exists, drop it, recreate it, and insert data. 'append'
If table exists, insert data. Create if does not exist.
index
bool. default True
whether write row names (index) or not.
ordering_id
Optional[str], default None
If set, write the ordering of the DataFrame as a column in the result table with this name.
clustering_columns
Union[pd.Index, Iterable[Hashable]], default ()
Specifies the columns for clustering in the BigQuery table. The order of columns in this list is significant for clustering hierarchy. Index columns may be included in clustering if the index
parameter is set to True, and their names are specified in this. These index columns, if included, precede DataFrame columns in the clustering order. The clustering order within the Index/DataFrame columns follows the order specified in clustering_columns
.
labels
dict[str, str], default None
Specifies table labels within BigQuery
Exceptions Type DescriptionValueError
If an invalid value is provided for if_exists
when destination_table
is None
. None
or replace
are the only valid values for if_exists
. ValueError
If an invalid value is provided for destination_table
that is not one of datasetID.tableId
or projectId.datasetId.tableId
. ValueError
If an invalid value is provided for if_exists
that is not one of fail
, replace
, or append
. Returns Type Description str
The fully-qualified ID for the written table, in the form project.dataset.tablename
. to_html
to_html(
buf=None,
columns: typing.Optional[typing.Sequence[str]] = None,
col_space=None,
header: bool = True,
index: bool = True,
na_rep: str = "NaN",
formatters=None,
float_format=None,
sparsify: bool | None = None,
index_names: bool = True,
justify: str | None = None,
max_rows: int | None = None,
max_cols: int | None = None,
show_dimensions: bool = False,
decimal: str = ".",
bold_rows: bool = True,
classes: str | list | tuple | None = None,
escape: bool = True,
notebook: bool = False,
border: int | None = None,
table_id: str | None = None,
render_links: bool = False,
encoding: str | None = None,
*,
allow_large_results: bool | None = None
) -> str
Render a DataFrame as an HTML table.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_html())
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>col1</th>
<th>col2</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>3</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>4</td>
</tr>
</tbody>
</table>
Parameters Name Description buf
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns
sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space
str or int, list or dict of int or str, optional
The minimum width of each column in CSS length units. An int is assumed to be px units.
header
bool, optional
Whether to print column labels, default True.
index
bool, optional, default True
Whether to print index (row) labels.
na_rep
str, optional, default 'NaN'
String representation of NAN to use.
formatters
list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.
float_format
one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.
sparsify
bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.
index_names
bool, optional, default True
Prints the names of the indexes.
justify
str, default None
How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), 'right' out of the box. Valid values are, 'left', 'right', 'center', 'justify', 'justify-all', 'start', 'end', 'inherit', 'match-parent', 'initial', 'unset'.
max_rows
int, optional
Maximum number of rows to display in the console.
max_cols
int, optional
Maximum number of columns to display in the console.
show_dimensions
bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal
str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
bold_rows
bool, default True
Make the row labels bold in the output.
classes
str or list or tuple, default None
CSS class(es) to apply to the resulting html table.
escape
bool, default True
Convert the characters <, >, and & to HTML-safe sequences.
notebook
bool, default False
Whether the generated HTML is for IPython Notebook.
border
int
A border=border attribute is included in the opening
tag. Default pd.options.display.html.border.table_id
str, optional
A css id is included in the opening
tag if specified.render_links
bool, default False
Convert URLs to HTML links.
encoding
str, default "utf-8"
Set character encoding.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionstr or None
If buf is None, returns the result as a string. Otherwise returns None. to_json
to_json(
path_or_buf=None,
orient: typing.Optional[
typing.Literal["split", "records", "index", "columns", "values", "table"]
] = None,
*,
lines: bool = False,
index: bool = True,
allow_large_results: typing.Optional[bool] = None
) -> typing.Optional[str]
Convert the object to a JSON string, written to Cloud Storage.
Note NaN's and None will be converted to null and datetime objects will be converted to UNIX timestamps.
Note: Onlyorient='records'
and lines=True
is supported so far. Parameters Name Description path_or_buf
str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. Can be a destination URI of Cloud Storage files(s) to store the extracted dataframe in format of gs://<bucket_name>/<object_name_or_glob>
. Must contain a wildcard *
character. If the data size is more than 1GB, you must use a wildcard to export the data into multiple files and the size of the files varies.
orient
{split
, records
, index
, columns
, values
, table
}, default 'columns
Indication of expected JSON string format. * Series: - default is 'index' - allowed values are: {{'split', 'records', 'index', 'table'}}. * DataFrame: - default is 'columns' - allowed values are: {{'split', 'records', 'index', 'columns', 'values', 'table'}}. * The format of the JSON string: - 'split' : dict like {{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}} - 'records' : list like [{{column -> value}}, ... , {{column -> value}}] - 'index' : dict like {{index -> {{column -> value}}}} - 'columns' : dict like {{column -> {{index -> value}}}} - 'values' : just the values array - 'table' : dict like {{'schema': {{schema}}, 'data': {{data}}}} Describing the data, where data component is like orient='records'
.
index
bool, default True
If True, write row names (index).
lines
bool, default False
If 'orient' is 'records' write out line-delimited json format. Will throw ValueError if incorrect 'orient' since others are not list-like.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB. This parameter has no effect when results are saved to Google Cloud Storage (GCS).
Exceptions Type DescriptionValueError
If lines
is True but records
is not provided as value for orient
. Returns Type Description None or str
If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None. to_latex
to_latex(
buf=None,
columns: typing.Optional[typing.Sequence] = None,
header: typing.Union[bool, typing.Sequence[str]] = True,
index: bool = True,
*,
allow_large_results: typing.Optional[bool] = None,
**kwargs
) -> str | None
Render object to a LaTeX tabular, longtable, or nested table.
Requires \usepackage{{booktabs}}
. The output can be copy/pasted into a main LaTeX document or read from an external file with \input{{table.tex}}
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_latex())
\begin{tabular}{lrr}
\toprule
& col1 & col2 \\
\midrule
0 & 1 & 3 \\
1 & 2 & 4 \\
\bottomrule
\end{tabular}
<BLANKLINE>
Parameters Name Description buf
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns
list of label, optional
The subset of columns to write. Writes all columns by default.
header
bool or list of str, default True
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.
index
bool, default True
Write row names (index).
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionstr or None
If buf is None, returns the result as a string. Otherwise returns None. to_markdown
to_markdown(
buf=None,
mode: str = "wt",
index: bool = True,
*,
allow_large_results: typing.Optional[bool] = None,
**kwargs
) -> str | None
Print DataFrame in Markdown-friendly format.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_markdown())
| | col1 | col2 |
|---:|-------:|-------:|
| 0 | 1 | 3 |
| 1 | 2 | 4 |
Parameters Name Description buf
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
mode
str, optional
Mode in which file is opened.
index
bool, optional, default True
Add index (row) labels.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionstr
DataFrame in Markdown-friendly format. to_numpy
to_numpy(
dtype=None,
copy=False,
na_value=_NoDefault.no_default,
*,
allow_large_results=None,
**kwargs
) -> numpy.ndarray
Convert the DataFrame to a NumPy array.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_numpy()
array([[1, 3],
[2, 4]], dtype=object)
Parameters Name Description dtype
None
The dtype to pass to numpy.asarray()
.
copy
bool, default None
Whether to ensure that the returned value is not a view on another array.
na_value
Any, default None
The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionnumpy.ndarray
The converted NumPy array. to_orc
to_orc(path=None, *, allow_large_results=None, **kwargs) -> bytes | None
Write a DataFrame to the ORC format.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> import tempfile
>>> df.to_orc(tempfile.TemporaryFile())
Parameters Name Description path
str, file-like object or None, default None
If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionbytes or None
If buf is None, returns the result as bytes. Otherwise returns None. to_pandas
Write DataFrame to pandas DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col': [4, 2, 2]})
Download the data from BigQuery and convert it into an in-memory pandas DataFrame.
>>> df.to_pandas()
col
0 4
1 2
2 2
Estimate job statistics without processing or downloading data by using dry_run=True
.
>>> df.to_pandas(dry_run=True) # doctest: +SKIP
columnCount 1
columnDtypes {'col': Int64}
indexLevel 1
indexDtypes [Int64]
projectId bigframes-dev
location US
jobType QUERY
destinationTable {'projectId': 'bigframes-dev', 'datasetId': '_...
useLegacySql False
referencedTables None
totalBytesProcessed 0
cacheHit False
statementType SELECT
creationTime 2025-04-02 20:17:12.038000+00:00
dtype: object
Parameters Name Description max_download_size
int, default None
.. deprecated:: 2.0.0 max_download_size
parameter is deprecated. Please use to_pandas_batches()
method instead. Download size threshold in MB. If max_download_size
is exceeded when downloading data, the data will be downsampled if bigframes.options.sampling.enable_downsampling
is True
, otherwise, an error will be raised. If set to a value other than None
, this will supersede the global config.
sampling_method
str, default None
.. deprecated:: 2.0.0 sampling_method
parameter is deprecated. Please use sample()
method instead. Downsampling algorithms to be chosen from, the choices are: "head": This algorithm returns a portion of the data from the beginning. It is fast and requires minimal computations to perform the downsampling; "uniform": This algorithm returns uniform random samples of the data. If set to a value other than None, this will supersede the global config.
random_state
int, default None
.. deprecated:: 2.0.0 random_state
parameter is deprecated. Please use sample()
method instead. The seed for the uniform downsampling algorithm. If provided, the uniform method may take longer to execute and require more computation. If set to a value other than None, this will supersede the global config.
ordered
bool, default True
Determines whether the resulting pandas dataframe will be ordered. In some cases, unordered may result in a faster-executing query.
dry_run
bool, default False
If this argument is true, this method will not process the data. Instead, it returns a Pandas Series containing dry run statistics
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionpandas.DataFrame
A pandas DataFrame with all rows and columns of this DataFrame if the data_sampling_threshold_mb is not exceeded; otherwise, a pandas DataFrame with downsampled rows and all columns of this DataFrame. If dry_run is set, a pandas Series containing dry run statistics will be returned. to_pandas_batches
to_pandas_batches(
page_size: typing.Optional[int] = None,
max_results: typing.Optional[int] = None,
*,
allow_large_results: typing.Optional[bool] = None
) -> typing.Iterable[pandas.core.frame.DataFrame]
Stream DataFrame results to an iterable of pandas DataFrame.
page_size and max_results determine the size and number of batches, see https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJob#google_cloud_bigquery_job_QueryJob_result
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col': [4, 3, 2, 2, 3]})
Iterate through the results in batches, limiting the total rows yielded across all batches via max_results
:
>>> for df_batch in df.to_pandas_batches(max_results=3):
... print(df_batch)
col
0 4
1 3
2 2
Alternatively, control the approximate size of each batch using page_size
and fetch batches manually using next()
:
>>> it = df.to_pandas_batches(page_size=2)
>>> next(it)
col
0 4
1 3
>>> next(it)
col
2 2
3 2
Parameters Name Description page_size
int, default None
The maximum number of rows of each batch. Non-positive values are ignored.
max_results
int, default None
The maximum total number of rows of all batches.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type DescriptionIterable[pandas.DataFrame]
An iterable of smaller dataframes which combine to form the original dataframe. Results stream from bigquery, see https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.table.RowIterator#google_cloud_bigquery_table_RowIterator_to_arrow_iterable to_parquet
to_parquet(
path=None,
*,
compression: typing.Optional[typing.Literal["snappy", "gzip"]] = "snappy",
index: bool = True,
allow_large_results: typing.Optional[bool] = None
) -> typing.Optional[bytes]
Write a DataFrame to the binary Parquet format.
This function writes the dataframe as a parquet file <https://parquet.apache.org/>
_ to Cloud Storage.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> gcs_bucket = "gs://bigframes-dev-testing/sample_parquet*.parquet"
>>> df.to_parquet(path=gcs_bucket)
Parameters Name Description path
str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]
), or file-like object implementing a binary write()
function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset. Destination URI(s) of Cloud Storage files(s) to store the extracted dataframe should be formatted gs://<bucket_name>/<object_name_or_glob>
. If the data size is more than 1GB, you must use a wildcard to export the data into multiple files and the size of the files varies.
compression
str, default 'snappy'
Name of the compression to use. Use None
for no compression. Supported options: 'gzip'
, 'snappy'
.
index
bool, default True
If True
, include the dataframe's index(es) in the file output. If False
, they will not be written to the file.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB. This parameter has no effect when results are saved to Google Cloud Storage (GCS).
Exceptions Type DescriptionValueError
If an invalid value provided for compression
that is not one of None
, snappy
, or gzip
. Returns Type Description None or bytes
bytes if no path argument is provided else None to_pickle
to_pickle(path, *, allow_large_results=None, **kwargs) -> None
Pickle (serialize) object to file.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> gcs_bucket = "gs://bigframes-dev-testing/sample_pickle_gcs.pkl"
>>> df.to_pickle(path=gcs_bucket)
Parameters Name Description path
str
File path where the pickled object will be stored.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
to_recordsto_records(
index: bool = True,
column_dtypes=None,
index_dtypes=None,
*,
allow_large_results=None
) -> numpy.rec.recarray
Convert DataFrame to a NumPy record array.
Index will be included as the first field of the record array if requested.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_records()
rec.array([(0, 1, 3), (1, 2, 4)],
dtype=[('index', '<i8'), ('col1', '<i8'), ('col2', '<i8')])
Parameters Name Description index
bool, default True
Include index in resulting record array, stored in 'index' field or using the index label, if set.
column_dtypes
str, type, dict, default None
If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.
index_dtypes
str, type, dict, default None
If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB. This mapping is applied only if index=True
.
np.recarray
NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries. to_string
to_string(
buf=None,
columns: typing.Optional[typing.Sequence[str]] = None,
col_space=None,
header: typing.Union[bool, typing.Sequence[str]] = True,
index: bool = True,
na_rep: str = "NaN",
formatters=None,
float_format=None,
sparsify: bool | None = None,
index_names: bool = True,
justify: str | None = None,
max_rows: int | None = None,
max_cols: int | None = None,
show_dimensions: bool = False,
decimal: str = ".",
line_width: int | None = None,
min_rows: int | None = None,
max_colwidth: int | None = None,
encoding: str | None = None,
*,
allow_large_results: typing.Optional[bool] = None
) -> str | None
Render a DataFrame to a console-friendly tabular output.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_string())
col1 col2
0 1 3
1 2 4
Parameters Name Description buf
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns
sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space
int, list or dict of int, optional
The minimum width of each column.
header
bool or sequence, optional
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.
index
bool, optional, default True
Whether to print index (row) labels.
na_rep
str, optional, default 'NaN'
String representation of NAN to use.
formatters
list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.
float_format
one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are floats. The result of this function must be a unicode string.
sparsify
bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.
index_names
bool, optional, default True
Prints the names of the indexes.
justify
str, default None
How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), 'right' out of the box. Valid values are, 'left', 'right', 'center', 'justify', 'justify-all', 'start', 'end', 'inherit', 'match-parent', 'initial', 'unset'.
max_rows
int, optional
Maximum number of rows to display in the console.
min_rows
int, optional
The number of rows to display in the console in a truncated repr (when number of rows is above max_rows
).
max_cols
int, optional
Maximum number of columns to display in the console.
show_dimensions
bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal
str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
line_width
int, optional
Width to wrap a line in characters.
max_colwidth
int, optional
Max width to truncate each column in characters. By default, no limit.
encoding
str, default "utf-8"
Set character encoding.
allow_large_results
bool, default None
If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
Returns Type Descriptionstr or None
If buf is None, returns the result as a string. Otherwise returns None. transpose
transpose() -> bigframes.dataframe.DataFrame
Transpose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property .T
is an accessor to the method transpose
.
All columns must be the same dtype (numerics can be coerced to a common supertype).
Examples:
**Square DataFrame with homogeneous dtype**
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = bpd.DataFrame(data=d1)
>>> df1
col1 col2
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df1_transposed = df1.T # or df1.transpose()
>>> df1_transposed
0 1
col1 1 2
col2 3 4
<BLANKLINE>
[2 rows x 2 columns]
When the dtype is homogeneous in the original DataFrame, we get a
transposed DataFrame with the same dtype:
>>> df1.dtypes
col1 Int64
col2 Int64
dtype: object
>>> df1_transposed.dtypes
0 Int64
1 Int64
dtype: object
Returns Type Description bigframes.pandas.DataFrame
The transposed DataFrame. truediv
truediv(
other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
axis: str | int = "columns",
) -> bigframes.dataframe.DataFrame
Get floating division of DataFrame and other, element-wise (binary operator /
).
Equivalent to dataframe / other
. With reverse version, rtruediv
.
Among flexible wrappers (add
, sub
, mul
, div
, mod
, pow
) to arithmetic operators: +
, -
, *
, /
, //
, %
, **
.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3],
... 'B': [4, 5, 6],
... })
You can use method name:
>>> df['A'].truediv(df['B'])
0 0.25
1 0.4
2 0.5
dtype: Float64
You can also use arithmetic operator /
:
>>> df['A'] / (df['B'])
0 0.25
1 0.4
2 0.5
dtype: Float64
Parameters Name Description other
float, int, or Series
Any single or multiple element data structure, or list-like object.
axis
{0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on.
Returns Type Descriptionbigframes.pandas.DataFrame
DataFrame result of the arithmetic operation. unstack
unstack(
level: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = -1,
)
Pivot a level of the (necessarily hierarchical) index labels.
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 3], 'B': [2, 4]}, index=['foo', 'bar'])
>>> df
A B
foo 1 2
bar 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.unstack()
A foo 1
bar 3
B foo 2
bar 4
dtype: Int64
Parameter Name Description level
int, str, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name.
updateupdate(other, join: str = "left", overwrite=True, filter_func=None)
Modify in place using non-NA values from another DataFrame.
Aligns on indices. There is no return value.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = bpd.DataFrame({'B': [4, 5, 6],
... 'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
A B
0 1 4
1 2 5
2 3 6
<BLANKLINE>
[3 rows x 2 columns]
Parameters Name Description other
DataFrame, or object coercible into a DataFrame
Should have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame.
join
{'left'}, default 'left'
Only left join is implemented, keeping the index and columns of the original object.
overwrite
bool, default True
How to handle non-NA values for overlapping keys: True: overwrite original DataFrame's values with values from other
. False: only update values that are NA in the original DataFrame.
filter_func
callable(1d-array) -> bool 1d-array, optional
Can choose to replace values other than NA. Return True for values that should be updated.
Exceptions Type DescriptionValueError
If a type of join other than left
is provided as an argument. Returns Type Description None
This method directly changes calling object. value_counts
value_counts(
subset: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
normalize: bool = False,
sort: bool = True,
ascending: bool = False,
dropna: bool = True,
)
Return a Series containing counts of unique rows in the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4, 4, 6, 7],
... 'num_wings': [2, 0, 0, 0, bpd.NA]},
... index=['falcon', 'dog', 'cat', 'ant', 'octopus'],
... dtype='Int64')
>>> df
num_legs num_wings
falcon 2 2
dog 4 0
cat 4 0
ant 6 0
octopus 7 <NA>
<BLANKLINE>
[5 rows x 2 columns]
value_counts
sorts the result by counts in a descending order by default:
>>> df.value_counts()
num_legs num_wings
4 0 2
2 2 1
6 0 1
Name: count, dtype: Int64
You can normalize the counts to return relative frequencies by setting normalize=True
:
>>> df.value_counts(normalize=True)
num_legs num_wings
4 0 0.5
2 2 0.25
6 0 0.25
Name: proportion, dtype: Float64
You can get the rows in the ascending order of the counts by setting ascending=True
:
>>> df.value_counts(ascending=True)
num_legs num_wings
2 2 1
6 0 1
4 0 2
Name: count, dtype: Int64
You can include the counts of the rows with NA
values by setting dropna=False
:
>>> df.value_counts(dropna=False)
num_legs num_wings
4 0 2
2 2 1
6 0 1
7 <NA> 1
Name: count, dtype: Int64
Parameters Name Description subset
label or list of labels, optional
Columns to use when counting unique combinations.
normalize
bool, default False
Return proportions rather than frequencies.
sort
bool, default True
Sort by frequencies.
ascending
bool, default False
Sort in ascending order.
dropna
bool, default True
Don’t include counts of rows that contain NA values.
varvar(
axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.Series
Return unbiased variance over requested axis.
Normalized by N-1 by default.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
A B
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
Calculating the variance of each column (the default behavior without an explicit axis parameter).
>>> df.var()
A 2.0
B 2.0
dtype: Float64
Calculating the variance of each row.
>>> df.var(axis=1)
0 0.5
1 0.5
dtype: Float64
Parameters Name Description axis
{index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
numeric_only
bool. default False
Default False. Include only float, int, boolean columns.
whereReplace values where the condition is False.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'a': [20, 10, 0], 'b': [0, 10, 20]})
>>> df
a b
0 20 0
1 10 10
2 0 20
<BLANKLINE>
[3 rows x 2 columns]
You can filter the values in the dataframe based on a condition. The values matching the condition would be kept, and not matching would be replaced. The default replacement value is NA
. For example, when the condition is a dataframe:
>>> df.where(df > 0)
a b
0 20 <NA>
1 10 10
2 <NA> 20
<BLANKLINE>
[3 rows x 2 columns]
You can specify a custom replacement value for non-matching values.
>>> df.where(df > 0, -1)
a b
0 20 -1
1 10 10
2 -1 20
<BLANKLINE>
[3 rows x 2 columns]
Besides dataframe, the condition can be a series too. For example:
>>> df.where(df['a'] > 10, -1)
a b
0 20 0
1 -1 -1
2 -1 -1
<BLANKLINE>
[3 rows x 2 columns]
As for the replacement, it can be a dataframe too. For example:
>>> df.where(df > 10, -df)
a b
0 20 0
1 -10 -10
2 0 20
<BLANKLINE>
[3 rows x 2 columns]
>>> df.where(df['a'] > 10, -df)
a b
0 20 0
1 -10 -10
2 0 -20
<BLANKLINE>
[3 rows x 2 columns]
Please note, replacement doesn't support Series for now. In pandas, when specifying a Series as replacement, the axis value should be specified at the same time, which is not supported in bigframes DataFrame.
Parameters Name Descriptioncond
bool Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and returns boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
other
scalar, DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the DataFrame and returns scalar or DataFrame. The callable must not change input DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).
Returns Type DescriptionDataFrame
DataFrame after the replacement.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-12 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-12 UTC."],[],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4