Take() currently concatenates ChunkedArrays first. However, this breaks down when calling Take() from a ChunkedArray or Table where concatenating the arrays would result in an array that's too large. While inconvenient to implement, it would be useful if this case were handled.
This could be done as a higher-level wrapper around Take(), perhaps.
Example in Python:
>>> import pyarrow as pa >>> pa.__version__ '1.0.0' >>> rb1 = pa.RecordBatch.from_arrays([["a" * 2**30]], names=["a"]) >>> rb2 = pa.RecordBatch.from_arrays([["b" * 2**30]], names=["a"]) >>> table = pa.Table.from_batches([rb1, rb2], schema=rb1.schema) >>> table.take([1, 0]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/table.pxi", line 1145, in pyarrow.lib.Table.take File "/home/lidavidm/Code/twosigma/arrow/venv/lib/python3.8/site-packages/pyarrow/compute.py", line 268, in take return call_function('take', [data, indices], options) File "pyarrow/_compute.pyx", line 298, in pyarrow._compute.call_function File "pyarrow/_compute.pyx", line 192, in pyarrow._compute.Function.call File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
In this example, it would be useful if Take() or a higher-level wrapper could generate multiple record batches as output.
Reporter: Will Jones / @wjones127
Assignee: Will Jones / @wjones127
take
on a memory-mapped table #37766 (relates to)Note: This issue was originally created as ARROW-9773. Please see the migration documentation for further details.
kdkavanagh, pcmoritz, dmazin and adams-brian
You can’t perform that action at this time.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4