agg
(*exprs)
Aggregate on the entire DataFrame
without groups (shorthand for df.groupBy().agg()
).
alias
(alias)
Returns a new DataFrame
with an alias set.
approxQuantile
(col, probabilities, relativeError)
Calculates the approximate quantiles of numerical columns of a DataFrame
.
asTable
()
Converts the DataFrame into a table_arg.TableArg
object, which can be used as a table argument in a TVF(Table-Valued Function) including UDTF (User-Defined Table Function).
cache
()
Persists the DataFrame
with the default storage level (MEMORY_AND_DISK_DESER).
checkpoint
([eager])
Returns a checkpointed version of this DataFrame
.
coalesce
(numPartitions)
Returns a new DataFrame
that has exactly numPartitions partitions.
colRegex
(colName)
Selects column based on the column name specified as a regex and returns it as Column
.
collect
()
Returns all the records in the DataFrame as a list of Row
.
corr
(col1, col2[, method])
Calculates the correlation of two columns of a DataFrame
as a double value.
count
()
Returns the number of rows in this DataFrame
.
cov
(col1, col2)
Calculate the sample covariance for the given columns, specified by their names, as a double value.
createGlobalTempView
(name)
Creates a global temporary view with this DataFrame
.
createOrReplaceGlobalTempView
(name)
Creates or replaces a global temporary view using the given name.
createOrReplaceTempView
(name)
Creates or replaces a local temporary view with this DataFrame
.
createTempView
(name)
Creates a local temporary view with this DataFrame
.
crossJoin
(other)
Returns the cartesian product with another DataFrame
.
crosstab
(col1, col2)
Computes a pair-wise frequency table of the given columns.
cube
(*cols)
Create a multi-dimensional cube for the current DataFrame
using the specified columns, allowing aggregations to be performed on them.
describe
(*cols)
Computes basic statistics for numeric and string columns.
distinct
()
Returns a new DataFrame
containing the distinct rows in this DataFrame
.
drop
(*cols)
Returns a new DataFrame
without specified columns.
dropDuplicates
([subset])
Return a new DataFrame
with duplicate rows removed, optionally only considering certain columns.
dropDuplicatesWithinWatermark
([subset])
Return a new DataFrame
with duplicate rows removed,
drop_duplicates
([subset])
drop_duplicates()
is an alias for dropDuplicates()
.
dropna
([how, thresh, subset])
Returns a new DataFrame
omitting rows with null or NaN values.
exceptAll
(other)
Return a new DataFrame
containing rows in this DataFrame
but not in another DataFrame
while preserving duplicates.
exists
()
Return a Column object for an EXISTS Subquery.
explain
([extended, mode])
Prints the (logical and physical) plans to the console for debugging purposes.
fillna
(value[, subset])
Returns a new DataFrame
which null values are filled with new value.
filter
(condition)
Filters rows using the given condition.
first
()
Returns the first row as a Row
.
foreach
(f)
Applies the f
function to all Row
of this DataFrame
.
Applies the f
function to each partition of this DataFrame
.
freqItems
(cols[, support])
Finding frequent items for columns, possibly with false positives.
groupBy
(*cols)
Groups the DataFrame
by the specified columns so that aggregation can be performed on them.
groupby
(*cols)
groupby()
is an alias for groupBy()
.
groupingSets
(groupingSets, *cols)
Create multi-dimensional aggregation for the current DataFrame
using the specified grouping sets, so we can run aggregation on them.
head
([n])
Returns the first n
rows.
hint
(name, *parameters)
Specifies some hint on the current DataFrame
.
Returns a best-effort snapshot of the files that compose this DataFrame
.
intersect
(other)
Return a new DataFrame
containing rows only in both this DataFrame
and another DataFrame
.
intersectAll
(other)
Return a new DataFrame
containing rows in both this DataFrame
and another DataFrame
while preserving duplicates.
isEmpty
()
Checks if the DataFrame
is empty and returns a boolean value.
isLocal
()
Returns True
if the collect()
and take()
methods can be run locally (without any Spark executors).
join
(other[, on, how])
Joins with another DataFrame
, using the given join expression.
lateralJoin
(other[, on, how])
Lateral joins with another DataFrame
, using the given join expression.
limit
(num)
Limits the result count to the number specified.
localCheckpoint
([eager, storageLevel])
Returns a locally checkpointed version of this DataFrame
.
mapInArrow
(func, schema[, barrier, profile])
Maps an iterator of batches in the current DataFrame
using a Python native function that is performed on pyarrow.RecordBatchs both as input and output, and returns the result as a DataFrame
.
mapInPandas
(func, schema[, barrier, profile])
Maps an iterator of batches in the current DataFrame
using a Python native function that is performed on pandas DataFrames both as input and output, and returns the result as a DataFrame
.
melt
(ids, values, variableColumnName, ...)
Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
mergeInto
(table, condition)
Merges a set of updates, insertions, and deletions based on a source table into a target table.
metadataColumn
(colName)
Selects a metadata column based on its logical column name and returns it as a Column
.
observe
(observation, *exprs)
Define (named) metrics to observe on the DataFrame.
offset
(num)
Returns a new :class: DataFrame by skipping the first n rows.
orderBy
(*cols, **kwargs)
Returns a new DataFrame
sorted by the specified column(s).
pandas_api
([index_col])
Converts the existing DataFrame into a pandas-on-Spark DataFrame.
persist
([storageLevel])
Sets the storage level to persist the contents of the DataFrame
across operations after the first time it is computed.
printSchema
([level])
Prints out the schema in the tree format.
randomSplit
(weights[, seed])
Randomly splits this DataFrame
with the provided weights.
registerTempTable
(name)
Registers this DataFrame
as a temporary table using the given name.
repartition
(numPartitions, *cols)
Returns a new DataFrame
partitioned by the given partitioning expressions.
repartitionByRange
(numPartitions, *cols)
Returns a new DataFrame
partitioned by the given partitioning expressions.
replace
(to_replace[, value, subset])
Returns a new DataFrame
replacing a value with another value.
rollup
(*cols)
Create a multi-dimensional rollup for the current DataFrame
using the specified columns, allowing for aggregation on them.
sameSemantics
(other)
Returns True when the logical query plans inside both DataFrame
s are equal and therefore return the same results.
sample
([withReplacement, fraction, seed])
Returns a sampled subset of this DataFrame
.
sampleBy
(col, fractions[, seed])
Returns a stratified sample without replacement based on the fraction given on each stratum.
scalar
()
Return a Column object for a SCALAR Subquery containing exactly one row and one column.
select
(*cols)
Projects a set of expressions and returns a new DataFrame
.
selectExpr
(*expr)
Projects a set of SQL expressions and returns a new DataFrame
.
Returns a hash code of the logical query plan against this DataFrame
.
show
([n, truncate, vertical])
Prints the first n
rows of the DataFrame to the console.
sort
(*cols, **kwargs)
Returns a new DataFrame
sorted by the specified column(s).
sortWithinPartitions
(*cols, **kwargs)
Returns a new DataFrame
with each partition sorted by the specified column(s).
subtract
(other)
Return a new DataFrame
containing rows in this DataFrame
but not in another DataFrame
.
summary
(*statistics)
Computes specified statistics for numeric and string columns.
tail
(num)
Returns the last num
rows as a list
of Row
.
take
(num)
Returns the first num
rows as a list
of Row
.
to
(schema)
Returns a new DataFrame
where each row is reconciled to match the specified schema.
toArrow
()
Returns the contents of this DataFrame
as PyArrow pyarrow.Table
.
toDF
(*cols)
Returns a new DataFrame
that with new specified column names
toJSON
([use_unicode])
Converts a DataFrame
into a RDD
of string.
toLocalIterator
([prefetchPartitions])
Returns an iterator that contains all of the rows in this DataFrame
.
toPandas
()
Returns the contents of this DataFrame
as Pandas pandas.DataFrame
.
transform
(func, *args, **kwargs)
Returns a new DataFrame
.
transpose
([indexColumn])
Transposes a DataFrame such that the values in the specified index column become the new columns of the DataFrame.
union
(other)
Return a new DataFrame
containing the union of rows in this and another DataFrame
.
unionAll
(other)
Return a new DataFrame
containing the union of rows in this and another DataFrame
.
unionByName
(other[, allowMissingColumns])
Returns a new DataFrame
containing union of rows in this and another DataFrame
.
unpersist
([blocking])
Marks the DataFrame
as non-persistent, and remove all blocks for it from memory and disk.
unpivot
(ids, values, variableColumnName, ...)
Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
where
(condition)
where()
is an alias for filter()
.
withColumn
(colName, col)
Returns a new DataFrame
by adding a column or replacing the existing column that has the same name.
withColumnRenamed
(existing, new)
Returns a new DataFrame
by renaming an existing column.
withColumns
(*colsMap)
Returns a new DataFrame
by adding multiple columns or replacing the existing columns that have the same names.
withColumnsRenamed
(colsMap)
Returns a new DataFrame
by renaming multiple columns.
withMetadata
(columnName, metadata)
Returns a new DataFrame
by updating an existing column with metadata.
withWatermark
(eventTime, delayThreshold)
Defines an event time watermark for this DataFrame
.
writeTo
(table)
Create a write configuration builder for v2 sources.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4