This guide shows examples of how to use PyMongoArrow schemas in common situations.
When performing aggregate or find operations, you can provide a schema for nested data by using the struct
object. There can be conflicting names in sub-documents compared to their parent documents.
>>> from pymongo import MongoClient... from pymongoarrow.api import Schema, find_arrow_all... from pyarrow import struct, field, int32... coll = MongoClient().db.coll... coll.insert_many(... [... {"start": "string", "prop": {"name": "foo", "start": 0}},... {"start": "string", "prop": {"name": "bar", "start": 10}},... ]... )... arrow_table = find_arrow_all(... coll, {}, schema=Schema({"start": str, "prop": struct([field("start", int32())])})... )... print(arrow_table)pyarrow.Tablestart: stringprop: struct<start: int32> child 0, start: int32----start: [["string","string"]]prop: [ -- is_valid: all not null -- child 0 type: int32[0,10]]
You can do the same thing when using Pandas and NumPy:
>>> df = find_pandas_all(... coll, {}, schema=Schema({"start": str, "prop": struct([field("start", int32())])})... )... print(df) start prop0 string {'start': 0}1 string {'start': 10}
You can also use projections to flatten the data before passing it to PyMongoArrow. The following example illustrates how to do this by using a very simple nested document structure:
>>> df = find_pandas_all(... coll,... {... "prop.start": {... "$gte": 0,... "$lte": 10,... }... },... projection={"propName": "$prop.name", "propStart": "$prop.start"},... schema=Schema({"_id": ObjectIdType(), "propStart": int, "propName": str}),... )... print(df) _id propStart propName0 b'c\xec2\x98R(\xc9\x1e@#\xcc\xbb' 0 foo1 b'c\xec2\x98R(\xc9\x1e@#\xcc\xbc' 10 bar
When performing an aggregate operation, you can flatten the fields by using the $project
stage, as shown in the following example:
>>> df = aggregate_pandas_all(... coll,... pipeline=[... {"$match": {"prop.start": {"$gte": 0, "$lte": 10}}},... {... "$project": {... "propStart": "$prop.start",... "propName": "$prop.name",... }... },... ],... )
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4