The basic approach for the GPU Data Frame (GDF) is pretty simple: if applications and libraries agree on an in-memory data format for tabular data and associate metadata, then just a device pointer to the data structure need be exchanged. Additionally, the IPC mechanism built into the CUDA driver allows device pointers to be moved between processes.
Currently, the GDF format is a subset of the Apache Arrow specification. The precise subset has not been fully defined yet, but currently includes numerical columns, and will soon include dictionary-encoded columns (sometimes called "categorical" columns in other data frame systems).
Fundamentally, one can implement GDF by following the Arrow specification directly. In some cases, that is the easiest approach. However, there are some common operations that we expect many GDF-supporting applications will need. To help jumpstart other GDF users, we are working to develop several layers of GDF functionality that can be reused in other projects:
[Much of this functionality is still in progress...]
libgdf: A C library of helper functions, including:
pygdf: A Python library for manipulating GDFs
dask_gdf: Extension for Dask to work with distributed GDFs.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4