读取pandas.read\u sql时对impala数据进行编码

lfapxunr 于 2021-06-26 发布在 Impala

关注(0)|答案(1)|浏览(398)

当我使用 pyhive 图书馆和 pandas.read_sql 我犯了个错误 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 3071: unexpected end of data 此错误的原因可能是数据可能已损坏。
如何将其更改为不同的编码，以便获得Dataframe中的数据？

impala python pandas pyhive

来源：https://stackoverflow.com/questions/55102307/encoding-impala-data-while-reading-from-pandas-read-sql

1条答案

按热度按时间

xmd2e60i1#

解决方法如下：
1）我们通过pyhive游标逐块检索数据。
2）预处理：编码/解码。
3）附加到最终Dataframe。


# cursor to the database.

cursor = hive.Connection(host=HOST, port=PORT, username=USERNAME).cursor()

# execute the query on the database side.

cursor.execute("SELECT id, message FROM table")

# result dataframe, empty for now.

df = pd.DataFrame(columns=['id', 'message'])

while True:
    # fetch 10k rows (as tuples).
    rows = cursor.fetchmany(10000)

    # if no more rows to retrieve, we stop.
    if not rows:
        break

    # Preprocessing: do encoding/decoding here
    rows = [(id, message.decode('utf-8', 'ignore')) for id, message in rows]

    # put result in a temporary dataframe
    df_tmp = pd.DataFrame(rows, columns=['id', 'message'])

    # merge the temporary dataframe to the original df
    df = pd.concat([df, df_tmp])

df = ...

赞(0）回复(0）举报 2021-06-26

我来回答

读取pandas.read\u sql时对impala数据进行编码

1条答案

相关问题

热门标签

最新问答