pandas中编码列表的不可读测试

bzzcjhmw  于 2023-06-28  发布在  其他
关注(0)|答案(1)|浏览(62)

我正在尝试读取此数据集

path = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1165013/UK_Sanctions_List.ods"

通过使用这段代码(我已经看到有很多线程/建议的解决方案,但下面的一个似乎是最合理的一个):

encoding_list = ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737'
                 , 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862'
                 , 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950'
                 , 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254'
                 , 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr'
                 , 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2'
                 , 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2'
                 , 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9'
                 , 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab'
                 , 'koi8_r', 'koi8_t', 'koi8_u', 'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2'
                 , 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32'
                 , 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig']

for encoding in encoding_list:
    worked = True
    try:
        df = pd.read_csv(path, encoding=encoding, nrows=5)
        print(df)
    except:
        worked = False
    if worked:
        print(encoding, ':\n', df.head())

但是当我打印dataframe时,结果看起来不可读,像这样:

ËäjñEÎÜ'g
«sQğøÆÿŞmÿ´;Ğ´³µÇÇm®©sbH«iw...  ¿`Ò­ìş#mxOnBXvFî&ƪPÊz1á3uoj_g
¢x>æi7¸}Z«¤õÔ3ÎílW|ùÍx¡c;PÓ©kê+_ëͪ...                                NaN

                                                          qJ|HfÆzÖ¤c[¨ÿ`ÉŞ` *ª
b¾?]ÔüR~
¾GÌOmxÜ?=v좦Í`                                                     NaN
D¾Å¢
Æ·äÎQ´
ûò£^×%óÒ·$]qÓ´În[l'ß                                        NaN
                                                                                &.
ËäjñEÎ"'g
«sQ}øÆÿ@mÿ´;!´³µ¢¢m®©sbH«iw...  ¿ýÒ­ì¦ÖmxOnBXvFî&ƪPÊz1á3uoj_g
^x>æi7¸ðZ«€õÔ3ÎílW]ùÍx¡c;PÓ©kê+_ëͪ...                                NaN

                                                          qJ]HfÆz#€cǨÿýÉ@ý *ª
b¾?ÐÔ\Rö
¾GÌOmx"?=vì^þÍý                                                     NaN
D¾Å^
Æ·äÎQ´
ûò£¬×%óÒ·ÝÐqÓ´ÎnÇl'ß                                        NaN
cp1140 :
                                                                                 &.
ËäjñEÎ"'g
«sQ}øÆÿ@mÿ´;!´³µ¢¢m®©sbH«iw...  ¿ýÒ­ì¦ÖmxOnBXvFî&ƪPÊz1á3uoj_g
^x>æi7¸ðZ«€õÔ3ÎílW]ùÍx¡c;PÓ©kê+_ëͪ...                                NaN

                                                          qJ]HfÆz#€cǨÿýÉ@ý *ª
b¾?ÐÔ\Rö
¾GÌOmx"?=vì^þÍý                                                     NaN
D¾Å^
Æ·äÎQ´
ûò£¬×%óÒ·ÝÐqÓ´ÎnÇl'ß                                        NaN

有谁知道我该怎么读吗?

nmpmafwu

nmpmafwu1#

这不是一个CSV文件,而是一个ODS (Open Document Spreadsheet)文件。
您应该使用pandas.read_excel(确保安装了odpfy模块):

# pip install odfpy

df = pd.read_excel("UK_Sanctions_List_2.ods", skiprows=2)
  • 注意:这个过程很慢,所以要有耐心。原始文件对我来说不起作用,但在LibreOffice中打开并保存它确实起了作用。另一种选择是在LibreOffice中打开数据,并从那里转换为CSV。

输出(前5行):

Last Updated Unique ID  OFSI Group ID UN Reference Number                                      Name 6  Name 1  Name 2  Name 3  Name 4  Name 5  ... IMO number  Current owner/operator (s)  \
0   2022-01-12   AFG0001          12703             TAe.010  HAJI KHAIRULLAH HAJI SATTAR MONEY EXCHANGE     NaN     NaN     NaN     NaN     NaN  ...        NaN                         NaN   
1   2022-01-12   AFG0001          12703             TAe.010  HAJI KHAIRULLAH HAJI SATTAR MONEY EXCHANGE     NaN     NaN     NaN     NaN     NaN  ...        NaN                         NaN   
2   2022-01-12   AFG0001          12703             TAe.010  HAJI KHAIRULLAH HAJI SATTAR MONEY EXCHANGE     NaN     NaN     NaN     NaN     NaN  ...        NaN                         NaN   
3   2022-01-12   AFG0001          12703             TAe.010  HAJI KHAIRULLAH HAJI SATTAR MONEY EXCHANGE     NaN     NaN     NaN     NaN     NaN  ...        NaN                         NaN   
4   2022-01-12   AFG0001          12703             TAe.010  HAJI KHAIRULLAH HAJI SATTAR MONEY EXCHANGE     NaN     NaN     NaN     NaN     NaN  ...        NaN                         NaN   

   Previous owner/operator (s) Current believed flag of ship  Previous flags  Type of ship Tonnage of ship Length of ship Year Built Hull identification number (HIN)  
0                          NaN                           NaN             NaN           NaN             NaN            NaN        NaN                              NaN  
1                          NaN                           NaN             NaN           NaN             NaN            NaN        NaN                              NaN  
2                          NaN                           NaN             NaN           NaN             NaN            NaN        NaN                              NaN  
3                          NaN                           NaN             NaN           NaN             NaN            NaN        NaN                              NaN  
4                          NaN                           NaN             NaN           NaN             NaN            NaN        NaN                              NaN

相关问题