With the upcoming DatasetV2 a lot of the APIs are getting simplified. That also opens up some additional possibilities than just passing the dataset to tf.keras.
One area of interest, is that we already have support for many columnized dataset, e.g, Arrow, Avro, Parquet, Json, HDF5, etc. Those dataset may potentially be standardized with the same API so that we could treat them homogeneously. For example, ArrowDataset already exposes a columns()
property method. We could apply the same to Avro, Parquet, Json, HDF5 etc. Thought?
Since those columnized dataset are largely numeric values, I think one area we also could have a common base class for those dataset, and support additional operations. For example, dataset_1 + dataset_2 => dataset_3 (add) where dataset_3 could be passed to tf.keras. The implementation could start with zip + map in python (not even needed in C++). Maybe this could be one use case that will help users?
/cc @terrytangyuan @BryanCutler
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4