A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/10804 below:

categorical value_counts can be much faster · Issue #10804 · pandas-dev/pandas · GitHub

The internal impl of Categorical.value_counts should just do this. I think it is factorizing multiple times when it is not necessary.

In [32]: np.random.seed(1234)

In [33]: n = 500000

In [34]: u = int(0.1*n)

In [35]: arr = [ "s%04d" % i for i in np.random.randint(0,u,size=n) ]

In [36]: c = pd.Series(arr).astype('category')                              

In [37]: result1 = Series(np.arange(len(c.cat.categories)),c.cat.categories).map(c.cat.codes.value_counts()).order(ascending=False)

In [38]: result2 = c.value_counts()

In [39]: %timeit Series(np.arange(len(c.cat.categories)),c.cat.categories).map(c.cat.codes.value_counts()).order(ascending=False)
100 loops, best of 3: 17.2 ms per loop

In [40]: %timeit c.value_counts()
10 loops, best of 3: 62.3 ms per loop

In [41]: result1.equals(result2)
Out[41]: True

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4