pandas drop删除整个 Dataframe ,只需删除空行

hxzsmxv2  于 2022-11-20  发布在  其他
关注(0)|答案(1)|浏览(219)

我用的是这样一段代码:

import pandas as pd
df = pd.read_excel('input.xls', sheet_name='Nouveau concept')
print(f"Dataframe:\n{df}")
new_df = df.dropna()
print(f"Dataframe now:\n{new_df}")

读取Excel文件(必须是xls而不是xlsx)并删除所有空行,即根本不包含数据的行。
当我运行上面的代码时,我得到了这个:

Anibals-New-MacBook-Air:UCNI anibal$ python3 test.py
Dataframe:
     Source Terminology Version  Requestor Internal ID    Parent ID                    Parent FSN  ... Unnamed: 77 Unnamed: 78 Unnamed: 79  Unnamed: 80
0                september 2022                    NaN  283403005.0  Cut of ear region (disorder)  ...         NaN         NaN         NaN          NaN
1                september 2022                    NaN  283403005.0  Cut of ear region (disorder)  ...         NaN         NaN         NaN          NaN
2                september 2022                    NaN  283412007.0   Cut of upper arm (disorder)  ...         NaN         NaN         NaN          NaN
3                september 2022                    NaN  283412007.0   Cut of upper arm (disorder)  ...         NaN         NaN         NaN          NaN
4                september 2022                    NaN  283413002.0       Cut of elbow (disorder)  ...         NaN         NaN         NaN          NaN
...                         ...                    ...          ...                           ...  ...         ...         ...         ...          ...
5056                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5057                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5058                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5059                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5060                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN

[5061 rows x 81 columns]
Dataframe now:
Empty DataFrame
Columns: [Source Terminology Version, Requestor Internal ID, Parent ID, Parent FSN, FSN (*), Semantic Tag (*), PT (*), Synonym (1), Synonym (2), Definition, Reason for Change, Notes, References, Unnamed: 13, Unnamed: 14, Unnamed: 15, Unnamed: 16, Unnamed: 17, Unnamed: 18, Unnamed: 19, Unnamed: 20, Unnamed: 21, Unnamed: 22, Unnamed: 23, Unnamed: 24, Unnamed: 25, Unnamed: 26, Unnamed: 27, Unnamed: 28, Unnamed: 29, Unnamed: 30, Unnamed: 31, Unnamed: 32, Unnamed: 33, Unnamed: 34, Unnamed: 35, Unnamed: 36, Unnamed: 37, Unnamed: 38, Unnamed: 39, Unnamed: 40, Unnamed: 41, Unnamed: 42, Unnamed: 43, Unnamed: 44, Unnamed: 45, Unnamed: 46, Unnamed: 47, Unnamed: 48, Unnamed: 49, Unnamed: 50, Unnamed: 51, Unnamed: 52, Unnamed: 53, Unnamed: 54, Unnamed: 55, Unnamed: 56, Unnamed: 57, Unnamed: 58, Unnamed: 59, Unnamed: 60, Unnamed: 61, Unnamed: 62, Unnamed: 63, Unnamed: 64, Unnamed: 65, Unnamed: 66, Unnamed: 67, Unnamed: 68, Unnamed: 69, Unnamed: 70, Unnamed: 71, Unnamed: 72, Unnamed: 73, Unnamed: 74, Unnamed: 75, Unnamed: 76, Unnamed: 77, Unnamed: 78, Unnamed: 79, Unnamed: 80]
Index: []

那么,第二个 Dataframe 是完全空的。为什么?
我只需要读取包含任何数据的行,也就是说,如果某行为空,则跳过它。
输入文件input.xls可在以下位置找到:
https://docs.google.com/spreadsheets/d/1pXfhPHklnd0v45yLbff5E5dp2ypVIbxG/edit?usp=share_link&ouid=117900420544251849196&rtpof=true&sd=true
有什么想法吗?
顺便说一句,我不能清理文件。这个输入文件是由另一个系统生成的,我的程序应该自动处理这个文件,所以我不能直接在Excel中加载它并清理它。
我尝试了一大堆Dropna的组合都无济于事。我还尝试了StackOverflow中发现的其他几个解决方案,一次又一次,都无济于事。

8hhllhi2

8hhllhi21#

首先,只导入所需的列(即,使用use_cols排除空白列)

df = pd.read_excel('input.xls', sheet_name='Nouveau concept',usecols="A:M")

然后,要删除空行,必须考虑列的子集。当前,有几列完全为空,因此这就是删除所有行的原因。要解决此问题,请使用以下命令:

new_df =df.dropna(subset=['Source Terminology Version'], how = 'all')
# In this example, I used only one column, but you can pass in a list.

相关问题