RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/apache/arrow/issues/25822 below:

[C++] Take kernel can't handle ChunkedArrays that don't fit in an Array · Issue #25822 · apache/arrow · GitHub

Pricing

Saved searches Use saved searches to filter your results more quickly

Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert Additional navigation options

[C++] Take kernel can't handle ChunkedArrays that don't fit in an Array #25822

Description

Take() currently concatenates ChunkedArrays first. However, this breaks down when calling Take() from a ChunkedArray or Table where concatenating the arrays would result in an array that's too large. While inconvenient to implement, it would be useful if this case were handled.

This could be done as a higher-level wrapper around Take(), perhaps.

Example in Python:

>>> import pyarrow as pa
>>> pa.__version__
'1.0.0'
>>> rb1 = pa.RecordBatch.from_arrays([["a" * 2**30]], names=["a"])
>>> rb2 = pa.RecordBatch.from_arrays([["b" * 2**30]], names=["a"])
>>> table = pa.Table.from_batches([rb1, rb2], schema=rb1.schema)
>>> table.take([1, 0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/table.pxi", line 1145, in pyarrow.lib.Table.take
  File "/home/lidavidm/Code/twosigma/arrow/venv/lib/python3.8/site-packages/pyarrow/compute.py", line 268, in take
    return call_function('take', [data, indices], options)
  File "pyarrow/_compute.pyx", line 298, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 192, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays

In this example, it would be useful if Take() or a higher-level wrapper could generate multiple record batches as output.

Reporter: Will Jones / @wjones127
Assignee: Will Jones / @wjones127

Related issues:

[C++] Query engine umbrella issue #28385 (is a child of)
[C++] Take on string chunked arrays slow and fails #26738 (duplicate)
[Python] take function doesn't work when table has large row counts #31249 (duplicate)
[C++] Take on string chunked arrays slow and fails #26738 (duplicate)
[Python] Pyarrow DictionaryArray.dictionary_decode mangling strings #34583 (duplicate)
[Python] Too much RAM consumption when using take on a memory-mapped table #37766 (relates to)
[C++] Allow automatic String -> LargeString promotions when concatenating tables #23539 (relates to)
[C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays #33049 (relates to)
[C++] TakeCC is doing indices.num_chunks() Concatenate() calls when it could be doing only one #40207 (relates to)

PRs and other links:

ARROW-9773: [C++] Implement Take kernel for ChunkedArray #13857

_{Note: This issue was originally created as ARROW-9773. Please see the migration documentation for further details.}

kdkavanagh, pcmoritz, dmazin and adams-brian

You can’t perform that action at this time.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4