我想将每列中不匹配的值拆分到单独的行中,同时为新行保留“Gene.ID”列中的值。
import pandas as pd
data = {
'Gene.ID': ['NZ_JAHWGH010000001.1_15', 'NZ_JAHWGH010000001.1_17', 'NZ_JAHWGH010000001.1_68', 'NZ_JAHWGH010000001.1_7', 'NZ_JAHWGH010000001.1_7', 'NZ_JAHWGH010000001.1_7', 'NZ_JAHWGH010000001.1_7', 'NZ_JAHWGH010000001.1_7','NZ_JAHWGH010000001.1_7'],
'DIAMOND': ['SLH', 'GT2', 'GT2', 'CBM41', 'CBM48', 'GH11', 'GH13', 'GH13', ''],
'HMMER': ['', 'GT2', 'GT2', 'CBM41', 'CBM41', 'GH13', 'GH13', '', 'GH13'],
'dbCAN_sub': ['', 'GT2', 'GT2', 'CBM41', 'CBM41', 'CBM41', 'CBM48', '', 'GH13']
}
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
结果应如下所示:
expected_data = {
"Gene.ID": ["NZ_JAHWGH010000001.1_15", "NZ_JAHWGH010000001.1_17", "NZ_JAHWGH010000001.1_68", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7", "NZ_JAHWGH010000001.1_7"],
"DIAMOND": ["SLH", "GT2", "GT2", "CBM41", "CBM48", "", "GH11", "", "", "GH13", "", "GH13",""],
"HMMER": ["", "", "GT2", "CBM41", "", "CBM41", "", "GH13", "", "GH13", "", "", "GH13"],
"dbCAN_sub": ["", "", "GT2", "CBM41", "", "CBM41", "", "", "CBM41", "", "CBM48", "", "GH13"]
}
expected_df = pd.DataFrame(expected_data)
print(expected_df)
我试过这个密码
import pandas as pd
print(df)
def g(df):
for i in range(len(df)):
if i == len(df) - 1:
break
if df.iloc[i, 0] == '':
pass
if df.iloc[i, 0] == df.iloc[i, 1]:
pass
if df.iloc[i, 0] != df.iloc[i, 1]:
df.iloc[i, 1] = df.iloc[i+1, 1]
if df.iloc[i, 1] == '':
pass
if df.iloc[i, 1] == df.iloc[i, 2]:
pass
if df.iloc[i, 1] != df.iloc[i, 2]:
df.iloc[i, 2] = df.iloc[i+1, 2]
return df
df = g(df.copy())
print(df)
但是,在拆分不匹配的值和为新行保留“Gene.ID”列方面,我面临着挑战。有人能帮我一个解决方案或建议一个更有效的方法来实现这一点吗?
2条答案
按热度按时间fruv7luv1#
您可以调整DataFrame的形状并重新聚合。
总而言之,
stack
要去掉空单元格,用groupby.ngroup
分配一个新索引,然后pivot
返回到2D:输出量:
7cwmlq892#
是的,我用这个做的,对吗