For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index
, Series
, or DataFrame
.
For some data types, pandas extends NumPyâs type system. String aliases for these types can be found at dtypes.
pandas and third-party libraries can extend NumPyâs type system (see Extension types). The top-level array()
method can be used to create a new array, which may be stored in a Series
, Index
, or as a column in a DataFrame
.
array
(data[, dtype, copy])
Create an array.
PyArrow#Warning
This feature is experimental, and the API can change in a future release without warning.
The arrays.ArrowExtensionArray
is backed by a pyarrow.ChunkedArray
with a pyarrow.DataType
instead of a NumPy array and data type. The .dtype
of a arrays.ArrowExtensionArray
is an ArrowDtype
.
Pyarrow provides similar array and data type support as NumPy including first-class nullability support for all data types, immutability and more.
The table below shows the equivalent pyarrow-backed (pa
), pandas extension, and numpy (np
) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype
to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_())
Note
Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow")
and pd.ArrowDtype(pa.string())
. pd.StringDtype("pyarrow")
is described below in the string section and will be returned if the string alias "string[pyarrow]"
is specified. pd.ArrowDtype(pa.string())
generally has better interoperability with ArrowDtype
of different types.
While individual values in an arrays.ArrowExtensionArray
are stored as a PyArrow objects, scalars are returned as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA
for missing values.
ArrowDtype
(pyarrow_dtype)
An ExtensionDtype for PyArrow data types.
For more information, please see the PyArrow user guide
Datetimes#NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray
extension array, which can hold timezone-naive or timezone-aware values.
Timestamp
, a subclass of datetime.datetime
, is pandasâ scalar type for timezone-naive or timezone-aware datetime data. NaT
is the missing value for datetime data.
Timestamp
([ts_input, year, month, day, ...])
Pandas replacement for python datetime.datetime object.
Properties# Methods#Timestamp.as_unit
(unit[, round_ok])
Convert the underlying int64 representaton to the given unit.
Convert timezone-aware Timestamp to another time zone.
Timestamp.ceil
(freq[, ambiguous, nonexistent])
Return a new Timestamp ceiled to this resolution.
Timestamp.combine
(date, time)
Combine date, time into datetime with same date and time fields.
Return ctime() style string.
Return date object with same year, month and day.
Timestamp.day_name
([locale])
Return the day name of the Timestamp with specified locale.
Return the daylight saving time (DST) adjustment.
Timestamp.floor
(freq[, ambiguous, nonexistent])
Return a new Timestamp floored to this resolution.
Timestamp.fromordinal
(ordinal[, tz])
Construct a timestamp from a a proleptic Gregorian ordinal.
Transform timestamp[, tz] to tz's local time from POSIX timestamp.
Return a named tuple containing ISO year, week number, and weekday.
Timestamp.isoformat
([sep, timespec])
Return the time formatted according to ISO 8601.
Return the day of the week represented by the date.
Timestamp.month_name
([locale])
Return the month name of the Timestamp with specified locale.
Normalize Timestamp to midnight, preserving tz information.
Timestamp.now
([tz])
Return new Timestamp object representing current time local to tz.
Timestamp.replace
([year, month, day, hour, ...])
Implements datetime.replace, handles nanoseconds.
Timestamp.round
(freq[, ambiguous, nonexistent])
Round the Timestamp to the specified resolution.
Timestamp.strftime
(format)
Return a formatted string of the Timestamp.
Timestamp.strptime
(string, format)
Function is not implemented.
Return time object with same time but with tzinfo=None.
Return POSIX timestamp as float.
Return time tuple, compatible with time.localtime().
Return time object with same time and tzinfo.
Return a numpy.datetime64 object with same precision.
Timestamp.to_numpy
([dtype, copy])
Convert the Timestamp to a NumPy datetime64.
Convert TimeStamp to a Julian Date.
Timestamp.to_period
([freq])
Return an period of which this timestamp is an observation.
Timestamp.to_pydatetime
([warn])
Convert a Timestamp object to a native Python datetime object.
Timestamp.today
([tz])
Return the current time in the local timezone.
Return proleptic Gregorian ordinal.
Convert timezone-aware Timestamp to another time zone.
Timestamp.tz_localize
(tz[, ambiguous, ...])
Localize the Timestamp to a timezone.
Return time zone name.
Timestamp.utcfromtimestamp
(ts)
Construct a timezone-aware UTC datetime from a POSIX timestamp.
Return a new Timestamp representing UTC day and time.
Return utc offset.
Return UTC time tuple, compatible with time.localtime().
Return the day of the week represented by the date.
A collection of timestamps may be stored in a arrays.DatetimeArray
. For timezone-aware data, the .dtype
of a arrays.DatetimeArray
is a DatetimeTZDtype
. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are timezone-aware, then every value in the array must have the same timezone.
arrays.DatetimeArray
(values[, dtype, freq, copy])
Pandas ExtensionArray for tz-naive or tz-aware datetime data.
DatetimeTZDtype
([unit, tz])
An ExtensionDtype for timezone-aware datetime data.
Timedeltas#NumPy can natively represent timedeltas. pandas provides Timedelta
for symmetry with Timestamp
. NaT
is the missing value for timedelta data.
Timedelta
([value, unit])
Represents a duration, the difference between two dates or times.
Properties# Methods#A collection of Timedelta
may be stored in a TimedeltaArray
.
pandas represents spans of times as Period
objects.
Period
([value, freq, ordinal, year, month, ...])
Represents a period of time.
Properties# Methods#A collection of Period
may be stored in a arrays.PeriodArray
. Every period in a arrays.PeriodArray
must have the same freq
.
arrays.PeriodArray
(values[, dtype, freq, copy])
Pandas ExtensionArray for storing Period data.
Intervals#Arbitrary intervals can be represented as Interval
objects.
Immutable object implementing an Interval, a bounded slice-like interval.
Properties#A collection of intervals may be stored in an arrays.IntervalArray
.
arrays.IntervalArray
(data[, closed, dtype, ...])
Pandas array for interval data that are closed on the same side.
IntervalDtype
([subtype, closed])
An ExtensionDtype for Interval data.
Nullable integer#numpy.ndarray
cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray
.
An ExtensionDtype for int8 integer data.
An ExtensionDtype for int16 integer data.
An ExtensionDtype for int32 integer data.
An ExtensionDtype for int64 integer data.
An ExtensionDtype for uint8 integer data.
An ExtensionDtype for uint16 integer data.
An ExtensionDtype for uint32 integer data.
An ExtensionDtype for uint64 integer data.
Nullable float# Categoricals#pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical
can be described by a CategoricalDtype
.
CategoricalDtype
([categories, ordered])
Type for categorical data with the categories and orderedness.
Categorical data can be stored in a pandas.Categorical
Categorical
(values[, categories, ordered, ...])
Represent a categorical variable in classic R / S-plus fashion.
The alternative Categorical.from_codes()
constructor can be used when you have the categories and integer codes already:
Categorical.from_codes
(codes[, categories, ...])
Make a Categorical type from codes and categories or dtype.
The dtype information is available on the Categorical
np.asarray(categorical)
works by implementing the array interface. Be aware, that this converts the Categorical
back to a NumPy array, so categories and order information is not preserved!
A Categorical
can be stored in a Series
or DataFrame
. To create a Series of dtype category
, use cat = s.astype(dtype)
or Series(..., dtype=dtype)
where dtype
is either
the string 'category'
an instance of CategoricalDtype
.
If the Series
is of dtype CategoricalDtype
, Series.cat
can be used to change the categorical data. See Categorical accessor for more.
Data where a single value is repeated many times (e.g. 0
or NaN
) may be stored efficiently as a arrays.SparseArray
.
arrays.SparseArray
(data[, sparse_index, ...])
An ExtensionArray for storing sparse data.
SparseDtype
([dtype, fill_value])
Dtype for data stored in SparseArray
.
The Series.sparse
accessor may be used to access sparse-specific attributes and methods if the Series
contains sparse values. See Sparse accessor and the user guide for more.
When working with text data, where each valid element is a string or missing, we recommend using StringDtype
(with the alias "string"
).
StringDtype
([storage, na_value])
Extension dtype for string data.
The Series.str
accessor is available for Series
backed by a arrays.StringArray
. See String handling for more.
The boolean dtype (with the alias "boolean"
) provides support for storing boolean data (True
, False
) with missing values, which is not possible with a bool numpy.ndarray
.
arrays.BooleanArray
(values, mask[, copy])
Array of boolean (True/False) data with missing values.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4