A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/apache/arrow/issues/37766 below:

[Python] Too much RAM consumption when using `take` on a memory-mapped table · Issue #37766 · apache/arrow · GitHub

Describe the bug, including details regarding any error messages, version, and platform.

I created a random array and wrote it repeatedly to an Arrow IPC file so that the whole array was too large to fit in RAM. Then, I read it by memory mapping. I could slice it without any problem, but when I tried to access the rows based on an arbitrary list of indices by using take, the RAM usage went up until the computer hung. The code is as follows (in which the array length and the number of writes may be adjusted according to your disk space and RAM size):

import numpy as np
import pyarrow as pa
from pyarrow import feather

rng = np.random.default_rng(1337)
data = rng.normal(size=(1000000,))
table = pa.table({'data': data})
sink = pa.output_stream('data.feather')
schema = pa.schema([('data', pa.float64())])
with pa.ipc.new_file(sink, schema) as writer:
    for i in range(1000):
        writer.write_table(table)

table = feather.read_table('data.feather', memory_map=True)
print(table.take([0]))
Component(s)

Python


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4