A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/12021 below:

Serializing Pandas Functions · Issue #12021 · pandas-dev/pandas · GitHub

In recent efforts using Pandas on multiple machines I've found that some of the functions are tricky to serialize. Apparently this might be due to runtime generation. Here are a few examples of serialization breaking, occasionally in unpleasant ways:

In [1]: import pandas as pd
In [2]: import pickle
In [3]: pd.read_csv
Out[3]: <function pandas.io.parsers._make_parser_function.<locals>.parser_f>
In [4]: pickle.loads(pickle.dumps(pd.read_csv))
AttributeError: Can't pickle local object '_make_parser_function.<locals>.parser_f'

Lest you think that this is just a problem with pickle (which has many flaws), dill, a much more robust function serialization library, also fails (the failure here is py35 only.) (cc @mmckerns)

In [5]: import dill
In [6]: dill.loads(dill.dumps(pd.read_csv))
PicklingError: Can't pickle <function _make_parser_function.<locals>.parser_f at 0x7f71f5ec1158>: it's not found as pandas.io.parsers._make_parser_function.<locals>.parser_f

In this particular case though cloudpickle will work.

Other functions have this problem as well. Consider the series methods:

In [7]: pickle.loads(pickle.dumps(pd.Series.sum))
AttributeError: Can't pickle local object '_make_stat_function.<locals>.stat_func'

In this case, concerningly cloudpickle completes, but returns a wrong result:

In [9]: import cloudpickle
In [11]: pd.Series.sum
Out[11]: <function pandas.core.generic._make_stat_function.<locals>.stat_func>

In [12]: cloudpickle.loads(cloudpickle.dumps(pd.Series.sum))
Out[12]: <function stat_func>

I've been able to fix some of these in cloudpipe/cloudpickle#46 but generally speaking I'm running into a number of problems here. It would be useful if, during the generation of these functions we could at least pay attention to assigning metadata like __name__ correctly. This one in particular confused me for a while:

In [15]: pd.Series.cumsum.__name__
Out[15]: 'sum'
What would help?
    def save_instancemethod(self, obj):
        # Memoization rarely is ever useful due to python bounding
        if obj.__self__ is None:
            self.save_reduce(getattr, (obj.im_class, obj.__name__))
        else:
            if PY3:
                self.save_reduce(types.MethodType, (obj.__func__, obj.__self__), obj=obj)
            else:
                self.save_reduce(types.MethodType, (obj.__func__, obj.__self__, obj.__self__.__class__),
                         obj=obj)

    def _reduce_method_descriptor(obj):
        return (getattr, (obj.__objclass__, obj.__name__))

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4