Collection of equal-length arrays matching a particular Schema.
A record batch is table-like data structure that is semantically a sequence of fields, each a contiguous Arrow array
Public Functions
Convert record batch to struct array.
Create a struct array whose child arrays are the record batchâs columns. Note that the record batchâs top-level field metadata cannot be reflected in the resulting struct array.
Convert record batch with one data type to Tensor.
Create a Tensor object with shape (number of rows, number of columns) and strides (type size in bytes, type size in bytes * number of rows). Generated Tensor will have column-major layout.
Determine if two record batches are exactly equal.
other â [in] the RecordBatch to compare with
check_metadata â [in] if true, check that Schema metadata is the same
opts â [in] the options for equality comparisons
true if batches are equal
Determine if two record batches are approximately equal.
other â [in] the RecordBatch to compare with
opts â [in] the options for equality comparisons
true if batches are approximately equal
the record batchâs schema
Replace the schema with another schema with the same types, but potentially different field names and/or metadata.
Retrieve all columns at once.
Retrieve an array from the record batch.
i â [in] field index, does not boundscheck
an Array object
Retrieve an array from the record batch.
name â [in] field name
an Array or null if no field was found
Retrieve an arrayâs internal data from the record batch.
i â [in] field index, does not boundscheck
an internal ArrayData object
Retrieve all arraysâ internal data from the record batch.
Add column to the record batch, producing a new RecordBatch.
i â [in] field index, which will be boundschecked
field â [in] field to be added
column â [in] column to be added
Add new nullable column to the record batch, producing a new RecordBatch.
For non-nullable columns, use the Field-based version of this method.
i â [in] field index, which will be boundschecked
field_name â [in] name of field to be added
column â [in] column to be added
Replace a column in the record batch, producing a new RecordBatch.
i â [in] field index, does boundscheck
field â [in] field to be replaced
column â [in] column to be replaced
Remove column from the record batch, producing a new RecordBatch.
i â [in] field index, does boundscheck
Name in i-th column.
the number of columns in the table
the number of rows (the corresponding length of each column)
Copy the entire RecordBatch to destination MemoryManager.
This uses Array::CopyTo on each column of the record batch to create a new record batch where all underlying buffers for the columns have been copied to the destination MemoryManager. This uses MemoryManager::CopyBuffer under the hood.
View or Copy the entire RecordBatch to destination MemoryManager.
This uses Array::ViewOrCopyTo on each column of the record batch to create a new record batch where all underlying buffers for the columns have been zero-copy viewed on the destination MemoryManager, falling back to performing a copy if it canât be viewed as a zero-copy buffer. This uses Buffer::ViewOrCopy under the hood.
Slice each of the arrays in the record batch.
offset â [in] the starting offset to slice, through end of batch
new record batch
Slice each of the arrays in the record batch.
offset â [in] the starting offset to slice
length â [in] the number of elements to slice from offset
new record batch
PrettyPrint representation suitable for debugging
Return names of all columns.
Rename columns with provided names.
Return new record batch with specified columns.
Perform cheap validation checks to determine obvious inconsistencies within the record batchâs schema and internal data.
This is O(k) where k is the total number of fields and array descendents.
Perform extensive validation checks to determine inconsistencies within the record batchâs schema and internal data.
This is potentially O(k*n) where n is the number of rows.
EXPERIMENTAL: Return a top-level sync event object for this record batch.
If all of the data for this record batch is in CPU memory, then this will return null. If the data for this batch is on a device, then if synchronization is needed before accessing the data the returned sync event will allow for it.
null or a Device::SyncEvent
Create a statistics array of this record batch.
The created array follows the C data interface statistics specification. See https://arrow.apache.org/docs/format/StatisticsSchema.html for details.
pool â [in] the memory pool to allocate memory from
the statistics array of this record batch
Public Static Functions
schema â [in] The record batch schema
num_rows â [in] length of fields in the record batch. Each array should have the same length as num_rows
columns â [in] the record batch fields as vector of arrays
sync_event â [in] optional synchronization event for non-CPU device memory used by buffers
Construct record batch from vector of internal data structures.
This class is intended for internal use, or advanced users.
0.5.0
schema â the record batch schema
num_rows â the number of semantic rows in the record batch. This should be equal to the length of each field
columns â the data for the batchâs columns
device_type â the type of the device that the Arrow columns are allocated on
sync_event â optional synchronization event for non-CPU device memory used by buffers
Create an empty RecordBatch of a given schema.
The output RecordBatch will be created with DataTypes from the given schema.
schema â [in] the schema of the empty RecordBatch
pool â [in] the memory pool to allocate memory from
the resulting RecordBatch
Construct record batch from struct array.
This constructs a record batch using the child arrays of the given array, which must be a struct array.
This operation will usually be zero-copy. However, if the struct array has an offset or a validity bitmap then these will need to be pushed into the child arrays. Pushing the offset is zero-copy but pushing the validity bitmap is not.
array â [in] the source array, must be a StructArray
pool â [in] the memory pool to allocate new validity bitmaps
Abstract interface for reading stream of record batches.
Subclassed by arrow::TableBatchReader, arrow::csv::StreamingReader, arrow::flight::sql::example::SqliteStatementBatchReader, arrow::flight::sql::example::SqliteTablesWithSchemaBatchReader, arrow::ipc::RecordBatchStreamReader, arrow::json::StreamingReader
Public Functions
the shared schema of the record batches in the stream
Read the next record batch in the stream.
Return null for batch when reaching end of stream
Example:
while (true) { std::shared_ptr<RecordBatch> batch; ARROW_RETURN_NOT_OK(reader->ReadNext(&batch)); if (!batch) { break; } // handling the `batch`, the `batch->num_rows()` // might be 0. }
batch â [out] the next loaded batch, null at end of stream. Returning an empty batch doesnât mean the end of stream because it is valid data.
Iterator interface.
finalize reader
EXPERIMENTAL: Get the device type for record batches this reader produces.
default implementation is to return DeviceAllocationType::kCPU
Return an iterator to the first record batch in the stream.
Return an iterator to the end of the stream.
Consume entire stream as a vector of record batches.
Read all batches and concatenate as arrow::Table.
Public Static Functions
Create a RecordBatchReader from a vector of RecordBatch.
batches â [in] the vector of RecordBatch to read from
schema â [in] schema to conform to. Will be inferred from the first element if not provided.
device_type â [in] the type of device that the batches are allocated on
Create a RecordBatchReader from an Iterator of RecordBatch.
batches â [in] an iterator of RecordBatch to read from.
schema â [in] schema that each record batch in iterator will conform to.
device_type â [in] the type of device that the batches are allocated on
Compute a stream of record batches from a (possibly chunked) Table.
The conversion is zero-copy: each record batch is a view over a slice of the tableâs columns.
The table is expected to be valid prior to using it with the batch reader.
Public Functions
Construct a TableBatchReader for the given table.
the shared schema of the record batches in the stream
Read the next record batch in the stream.
Return null for batch when reaching end of stream
Example:
while (true) { std::shared_ptr<RecordBatch> batch; ARROW_RETURN_NOT_OK(reader->ReadNext(&batch)); if (!batch) { break; } // handling the `batch`, the `batch->num_rows()` // might be 0. }
batch â [out] the next loaded batch, null at end of stream. Returning an empty batch doesnât mean the end of stream because it is valid data.
Set the desired maximum number of rows for record batches.
The actual number of rows in each record batch may be smaller, depending on actual chunking characteristics of each table column.
Logical table as sequence of chunked arrays.
Public Functions
Return the table schema.
Return a column by index.
Return vector of all columns for table.
Return a columnâs field by index.
Return vector of all fields for table.
Construct a zero-copy slice of the table with the indicated offset and length.
offset â [in] the index of the first row in the constructed slice
length â [in] the number of rows of the slice. If there are not enough rows in the table, the length will be adjusted accordingly
a new object wrapped in std::shared_ptr<Table>
Slice from first row at offset until end of the table.
Return a column by name.
name â [in] field name
an Array or null if no field was found
Remove column from the table, producing a new Table.
Add column to the table, producing a new Table.
Replace a column in the table, producing a new Table.
Return names of all columns.
Rename columns with provided names.
Return new table with specified columns.
Replace schema key-value metadata with new metadata.
0.5.0
metadata â [in] new KeyValueMetadata
new Table
Flatten the table, producing a new Table.
Any column with a struct type will be flattened into multiple columns
pool â [in] The pool for buffer allocations, if any
PrettyPrint representation suitable for debugging
Perform cheap validation checks to determine obvious inconsistencies within the tableâs schema and internal data.
This is O(k*m) where k is the total number of field descendents, and m is the number of chunks.
Perform extensive validation checks to determine inconsistencies within the tableâs schema and internal data.
This is O(k*n) where k is the total number of field descendents, and n is the number of rows.
Return the number of columns in the table.
Return the number of rows (equal to each columnâs logical length)
Determine if tables are equal.
Two tables can be equal only if they have equal schemas. However, they may be equal even if they have different chunkings.
Make a new table by combining the chunks this table has.
All the underlying chunks in the ChunkedArray of each column are concatenated into zero or one chunk.
pool â [in] The pool for buffer allocations
Make a new record batch by combining the chunks this table has.
All the underlying chunks in the ChunkedArray of each column are concatenated into a single chunk.
pool â [in] The pool for buffer allocations
Public Static Functions
Construct a Table from schema and columns.
If columns is zero-length, the tableâs number of rows is zero
schema â [in] The table schema (column types)
columns â [in] The tableâs columns as chunked arrays
num_rows â [in] number of rows in table, -1 (default) to infer from columns
Construct a Table from schema and arrays.
schema â [in] The table schema (column types)
arrays â [in] The tableâs columns as arrays
num_rows â [in] number of rows in table, -1 (default) to infer from columns
Create an empty Table of a given schema.
The output Table will be created with a single empty chunk per column.
Construct a Table from a RecordBatchReader.
reader â [in] the arrow::RecordBatchReader that produces batches
Construct a Table from RecordBatches, using schema supplied by the first RecordBatch.
batches â [in] a std::vector of record batches
Construct a Table from RecordBatches, using supplied schema.
There may be zero record batches
schema â [in] the arrow::Schema for each batch
batches â [in] a std::vector of record batches
Construct a Table from a chunked StructArray.
One column will be produced for each field of the StructArray.
array â [in] a chunked StructArray
Construct a new table from multiple input tables.
The new table is assembled from existing column chunks without copying, if schemas are identical. If schemas do not match exactly and unify_schemas is enabled in options (off by default), an attempt is made to unify them, and then column chunks are converted to their respective unified datatype, which will probably incur a copy. :func:arrow::PromoteTableToSchema
is used to unify schemas.
Tables are concatenated in order they are provided in and the order of rows within tables will be preserved.
tables â [in] a std::vector of Tables to be concatenated
options â [in] specify how to unify schema of input tables
memory_pool â [in] MemoryPool to be used if null-filled arrays need to be created or if existing column chunks need to endure type conversion
new Table
Warning
doxygenfunction: Unable to resolve function âarrow::PromoteTableToSchemaâ with arguments None in doxygen xml output for project âarrow_cppâ from directory: /build/cpp/apidoc/xml. Potential matches:
- Result<std::shared_ptr<Table>> PromoteTableToSchema(const std::shared_ptr<Table> &table, const std::shared_ptr<Schema> &schema, MemoryPool *pool = default_memory_pool()) - Result<std::shared_ptr<Table>> PromoteTableToSchema(const std::shared_ptr<Table> &table, const std::shared_ptr<Schema> &schema, const compute::CastOptions &options, MemoryPool *pool = default_memory_pool())
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4