A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mongodb.com/docs/languages/python/pymongo-arrow-driver/current/comparison/ below:

Comparing to PyMongo - PyMongoArrow v1.8

In this guide, you can learn about the differences between PyMongoArrow and the PyMongo driver. This guide assumes familiarity with basic PyMongo and MongoDB concepts.

The most basic way to read data using PyMongo is:

coll = db.benchmarkf = list(coll.find({}, projection={"_id": 0}))table = pyarrow.Table.from_pylist(f)

This works, but you have to exclude the _id field, otherwise you get the following error:

pyarrow.lib.ArrowInvalid: Could not convert ObjectId('642f2f4720d92a85355671b3') with type ObjectId: did not recognize Python value type when inferring an Arrow data type

The following code example shows a workaround for the preceding error when using PyMongo:

>>> f = list(coll.find({}))>>> for doc in f:...     doc["_id"] = str(doc["_id"])...>>> table = pyarrow.Table.from_pylist(f)>>> print(table)pyarrow.Table_id: stringx: int64y: double

Even though this avoids the error, a drawback is that Arrow can't identify that _id is an ObjectId, as noted by the schema showing _id as a string.

PyMongoArrow supports BSON types through Arrow or Pandas Extension Types. This allows you to avoid the preceding workaround.

>>> from pymongoarrow.types import ObjectIdType>>> schema = Schema({"_id": ObjectIdType(), "x": pyarrow.int64(), "y": pyarrow.float64()})>>> table = find_arrow_all(coll, {}, schema=schema)>>> print(table)pyarrow.Table_id: extension<arrow.py_extension_type<ObjectIdType>>x: int64y: double

With this method, Arrow correctly identifies the type. This has limited use for non-numeric extension types, but avoids unnecessary casting for certain operations, such as sorting datetimes.

f = list(coll.find({}, projection={"_id": 0, "x": 0}))naive_table = pyarrow.Table.from_pylist(f)schema = Schema({"time": pyarrow.timestamp("ms")})table = find_arrow_all(coll, {}, schema=schema)assert (    table.sort_by([("time", "ascending")])["time"]    == naive_table["time"].cast(pyarrow.timestamp("ms")).sort())

Additionally, PyMongoArrow supports Pandas extension types. With PyMongo, a Decimal128 value behaves as follows:

coll = client.test.testcoll.insert_many([{"value": Decimal128(str(i))} for i in range(200)])cursor = coll.find({})df = pd.DataFrame(list(cursor))print(df.dtypes)

The equivalent in PyMongoArrow is:

from pymongoarrow.api import find_pandas_allcoll = client.test.testcoll.insert_many([{"value": Decimal128(str(i))} for i in range(200)])df = find_pandas_all(coll, {})print(df.dtypes)

In both cases, the underlying values are the BSON class type:

print(df["value"][0])Decimal128("0")

Writing data from an Arrow table using PyMongo looks like the following:

data = arrow_table.to_pylist()db.collname.insert_many(data)

The equivalent in PyMongoArrow is:

from pymongoarrow.api import writewrite(db.collname, arrow_table)

As of PyMongoArrow 1.0, the main advantage to using the write function is that it iterates over the arrow table, data frame, or numpy array, and doesn't convert the entire object to a list.

The following measurements were taken with PyMongoArrow version 1.0 and PyMongo version 4.4. For insertions, the library performs about the same as when using conventional PyMongo, and uses the same amount of memory.

ProfileInsertSmall.peakmem_insert_conventional      107MProfileInsertSmall.peakmem_insert_arrow             108MProfileInsertSmall.time_insert_conventional         202±0.8msProfileInsertSmall.time_insert_arrow                181±0.4msProfileInsertLarge.peakmem_insert_arrow             127MProfileInsertLarge.peakmem_insert_conventional      125MProfileInsertLarge.time_insert_arrow                425±1msProfileInsertLarge.time_insert_conventional         440±1ms

For reads, the library is slower for small documents and nested documents, but faster for large documents. It uses less memory in all cases.

ProfileReadSmall.peakmem_conventional_arrow     85.8MProfileReadSmall.peakmem_to_arrow               83.1MProfileReadSmall.time_conventional_arrow        38.1±0.3msProfileReadSmall.time_to_arrow                  60.8±0.3msProfileReadLarge.peakmem_conventional_arrow     138MProfileReadLarge.peakmem_to_arrow               106MProfileReadLarge.time_conventional_ndarray      243±20msProfileReadLarge.time_to_arrow                  186±0.8msProfileReadDocument.peakmem_conventional_arrow  209MProfileReadDocument.peakmem_to_arrow            152MProfileReadDocument.time_conventional_arrow     865±7msProfileReadDocument.time_to_arrow               937±1ms

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4