RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html below:

pyspark.sql.DataFrame — PySpark 4.0.0 documentation

agg(*exprs)

Aggregate on the entire DataFrame without groups (shorthand for df.groupBy().agg()).

alias(alias)

Returns a new DataFrame with an alias set.

approxQuantile(col,Â probabilities,Â relativeError)

Calculates the approximate quantiles of numerical columns of a DataFrame.

asTable()

Converts the DataFrame into a table_arg.TableArg object, which can be used as a table argument in a TVF(Table-Valued Function) including UDTF (User-Defined Table Function).

cache()

Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER).

checkpoint([eager])

Returns a checkpointed version of this DataFrame.

coalesce(numPartitions)

Returns a new DataFrame that has exactly numPartitions partitions.

colRegex(colName)

Selects column based on the column name specified as a regex and returns it as Column.

collect()

Returns all the records in the DataFrame as a list of Row.

corr(col1,Â col2[,Â method])

Calculates the correlation of two columns of a DataFrame as a double value.

count()

Returns the number of rows in this DataFrame.

cov(col1,Â col2)

Calculate the sample covariance for the given columns, specified by their names, as a double value.

createGlobalTempView(name)

Creates a global temporary view with this DataFrame.

createOrReplaceGlobalTempView(name)

Creates or replaces a global temporary view using the given name.

createOrReplaceTempView(name)

Creates or replaces a local temporary view with this DataFrame.

createTempView(name)

Creates a local temporary view with this DataFrame.

crossJoin(other)

Returns the cartesian product with another DataFrame.

crosstab(col1,Â col2)

Computes a pair-wise frequency table of the given columns.

cube(*cols)

Create a multi-dimensional cube for the current DataFrame using the specified columns, allowing aggregations to be performed on them.

describe(*cols)

Computes basic statistics for numeric and string columns.

distinct()

Returns a new DataFrame containing the distinct rows in this DataFrame.

drop(*cols)

Returns a new DataFrame without specified columns.

dropDuplicates([subset])

Return a new DataFrame with duplicate rows removed, optionally only considering certain columns.

dropDuplicatesWithinWatermark([subset])

Return a new DataFrame with duplicate rows removed,

drop_duplicates([subset])

drop_duplicates() is an alias for dropDuplicates().

dropna([how,Â thresh,Â subset])

Returns a new DataFrame omitting rows with null or NaN values.

exceptAll(other)

Return a new DataFrame containing rows in this DataFrame but not in another DataFrame while preserving duplicates.

exists()

Return a Column object for an EXISTS Subquery.

explain([extended,Â mode])

Prints the (logical and physical) plans to the console for debugging purposes.

fillna(value[,Â subset])

Returns a new DataFrame which null values are filled with new value.

filter(condition)

Filters rows using the given condition.

first()

Returns the first row as a Row.

foreach(f)

Applies the f function to all Row of this DataFrame.

foreachPartition(f)

Applies the f function to each partition of this DataFrame.

freqItems(cols[,Â support])

Finding frequent items for columns, possibly with false positives.

groupBy(*cols)

Groups the DataFrame by the specified columns so that aggregation can be performed on them.

groupby(*cols)

groupby() is an alias for groupBy().

groupingSets(groupingSets,Â *cols)

Create multi-dimensional aggregation for the current DataFrame using the specified grouping sets, so we can run aggregation on them.

head([n])

Returns the first n rows.

hint(name,Â *parameters)

Specifies some hint on the current DataFrame.

inputFiles()

Returns a best-effort snapshot of the files that compose this DataFrame.

intersect(other)

Return a new DataFrame containing rows only in both this DataFrame and another DataFrame.

intersectAll(other)

Return a new DataFrame containing rows in both this DataFrame and another DataFrame while preserving duplicates.

isEmpty()

Checks if the DataFrame is empty and returns a boolean value.

isLocal()

Returns True if the collect() and take() methods can be run locally (without any Spark executors).

join(other[,Â on,Â how])

Joins with another DataFrame, using the given join expression.

lateralJoin(other[,Â on,Â how])

Lateral joins with another DataFrame, using the given join expression.

limit(num)

Limits the result count to the number specified.

localCheckpoint([eager,Â storageLevel])

Returns a locally checkpointed version of this DataFrame.

mapInArrow(func,Â schema[,Â barrier,Â profile])

Maps an iterator of batches in the current DataFrame using a Python native function that is performed on pyarrow.RecordBatchs both as input and output, and returns the result as a DataFrame.

mapInPandas(func,Â schema[,Â barrier,Â profile])

Maps an iterator of batches in the current DataFrame using a Python native function that is performed on pandas DataFrames both as input and output, and returns the result as a DataFrame.

melt(ids,Â values,Â variableColumnName,Â ...)

Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.

mergeInto(table,Â condition)

Merges a set of updates, insertions, and deletions based on a source table into a target table.

metadataColumn(colName)

Selects a metadata column based on its logical column name and returns it as a Column.

observe(observation,Â *exprs)

Define (named) metrics to observe on the DataFrame.

offset(num)

Returns a new :class: DataFrame by skipping the first n rows.

orderBy(*cols,Â **kwargs)

Returns a new DataFrame sorted by the specified column(s).

pandas_api([index_col])

Converts the existing DataFrame into a pandas-on-Spark DataFrame.

persist([storageLevel])

Sets the storage level to persist the contents of the DataFrame across operations after the first time it is computed.

printSchema([level])

Prints out the schema in the tree format.

randomSplit(weights[,Â seed])

Randomly splits this DataFrame with the provided weights.

registerTempTable(name)

Registers this DataFrame as a temporary table using the given name.

repartition(numPartitions,Â *cols)

Returns a new DataFrame partitioned by the given partitioning expressions.

repartitionByRange(numPartitions,Â *cols)

Returns a new DataFrame partitioned by the given partitioning expressions.

replace(to_replace[,Â value,Â subset])

Returns a new DataFrame replacing a value with another value.

rollup(*cols)

Create a multi-dimensional rollup for the current DataFrame using the specified columns, allowing for aggregation on them.

sameSemantics(other)

Returns True when the logical query plans inside both DataFrames are equal and therefore return the same results.

sample([withReplacement,Â fraction,Â seed])

Returns a sampled subset of this DataFrame.

sampleBy(col,Â fractions[,Â seed])

Returns a stratified sample without replacement based on the fraction given on each stratum.

scalar()

Return a Column object for a SCALAR Subquery containing exactly one row and one column.

select(*cols)

Projects a set of expressions and returns a new DataFrame.

selectExpr(*expr)

Projects a set of SQL expressions and returns a new DataFrame.

semanticHash()

Returns a hash code of the logical query plan against this DataFrame.

show([n,Â truncate,Â vertical])

Prints the first n rows of the DataFrame to the console.

sort(*cols,Â **kwargs)

Returns a new DataFrame sorted by the specified column(s).

sortWithinPartitions(*cols,Â **kwargs)

Returns a new DataFrame with each partition sorted by the specified column(s).

subtract(other)

Return a new DataFrame containing rows in this DataFrame but not in another DataFrame.

summary(*statistics)

Computes specified statistics for numeric and string columns.

tail(num)

Returns the last num rows as a list of Row.

take(num)

Returns the first num rows as a list of Row.

to(schema)

Returns a new DataFrame where each row is reconciled to match the specified schema.

toArrow()

Returns the contents of this DataFrame as PyArrow pyarrow.Table.

toDF(*cols)

Returns a new DataFrame that with new specified column names

toJSON([use_unicode])

Converts a DataFrame into a RDD of string.

toLocalIterator([prefetchPartitions])

Returns an iterator that contains all of the rows in this DataFrame.

toPandas()

Returns the contents of this DataFrame as Pandas pandas.DataFrame.

transform(func,Â *args,Â **kwargs)

Returns a new DataFrame.

transpose([indexColumn])

Transposes a DataFrame such that the values in the specified index column become the new columns of the DataFrame.

union(other)

Return a new DataFrame containing the union of rows in this and another DataFrame.

unionAll(other)

Return a new DataFrame containing the union of rows in this and another DataFrame.

unionByName(other[,Â allowMissingColumns])

Returns a new DataFrame containing union of rows in this and another DataFrame.

unpersist([blocking])

Marks the DataFrame as non-persistent, and remove all blocks for it from memory and disk.

unpivot(ids,Â values,Â variableColumnName,Â ...)

Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.

where(condition)

where() is an alias for filter().

withColumn(colName,Â col)

Returns a new DataFrame by adding a column or replacing the existing column that has the same name.

withColumnRenamed(existing,Â new)

Returns a new DataFrame by renaming an existing column.

withColumns(*colsMap)

Returns a new DataFrame by adding multiple columns or replacing the existing columns that have the same names.

withColumnsRenamed(colsMap)

Returns a new DataFrame by renaming multiple columns.

withMetadata(columnName,Â metadata)

Returns a new DataFrame by updating an existing column with metadata.

withWatermark(eventTime,Â delayThreshold)

Defines an event time watermark for this DataFrame.

writeTo(table)

Create a write configuration builder for v2 sources.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4