A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/6927 below:

DataFrame.__finalize__ not called in pd.concat · Issue #6927 · pandas-dev/pandas · GitHub

When I assign metadata to a df

import numpy as np
import pandas as pd
np.random.seed(10)

pd.DataFrame._metadata = ['filename']
df1 = pd.DataFrame(np.random.randint(0, 4, (3, 2)), columns=list('ab'))
df1.filenames = {'a': 'f1', 'b': 'f2'}
df1
       a  b
    0  1  1
    1  0  3
    2  0  1

and define a __finalize__ that prints when it's called

def finalize_df(self, other, method=None, **kwargs):
    print 'finalize called'
    for name in self._metadata:
        object.__setattr__(self, name, getattr(other, name, None))
    return self

pd.DataFrame.__finalize__ = finalize_df

nothing is preserved when pd.concat is called:

stacked = pd.concat([df1, df1])  # Nothing printed
stacked
       a  b
    0  1  1
    1  0  3
    2  0  1
    0  1  1
    1  0  3
    2  0  1
stacked.finalize  # => AttributeError 

For this specific case it seems reasonable that __finalize__ should be used since all of the elements are from the same dataframe, though I'm not sure about the general use since concat can also take types other than a DataFrame. But should we/do we have some method to stack dataframes that preserves metadata?

Similar to #6923.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4