A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/7032 below:

read_html infers wrong datatype · Issue #7032 · pandas-dev/pandas · GitHub

As can be seen in the below code, column 3, 8, 9, and 10 were misinterpreted as datetime objects. Columns 1, 6 and 7 should be integer. How do I force the columns to be interpreted as the proper type? Only 2, 4, 5 and 11 appear to have been read properly. I can pass 'infer_types=False' I suppose and do manual conversion afterwards, but since infer_types is going away, this won't work.

In [63]: import pandas as pd
In [64]: path = r"http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
In [65]: tables = pd.read_html(path)
In [66]: df = tables[1]

In [67]: df.head()
Out[67]:
        1           2          3         4         5        6        7   8   \
1  !000001  California        NaT  37253956  33871648  !000053  !000055 NaT
2  !000002       Texas        NaT  25145561  20851820  !000036  !000038 NaT
3  !000003    New York 1965-11-27  19378102  18976457  !000027  !000029 NaT
4  !000004     Florida        NaT  18801310  15982378  !000027  !000029 NaT
5  !000005    Illinois        NaT  12830632  12419293  !000018  !000020 NaT

   9   10      11
1 NaT NaT  11.91%
2 NaT NaT   8.04%
3 NaT NaT   6.19%
4 NaT NaT   6.01%
5 NaT NaT   4.10%

[5 rows x 11 columns]

dtype: object

In [68]: df.dtypes
Out[68]:
1             object
2             object
3     datetime64[ns]
4             object
5             object
6             object
7             object
8     datetime64[ns]
9     datetime64[ns]
10    datetime64[ns]
11            object
dtype: object

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4