numpy Pandas-将具有多个标识符的DF处理到组合列中

aij0ehis  于 2022-11-10  发布在  其他
关注(0)|答案(1)|浏览(108)

我有下面的DF,我想要处理成像下面的样本。我有一个困难的时间,因为识别符是分享他们所在的网站的列。柱子的位置也不是固定的,因为在某些情况下,“地”有30天,而“水”只有5天
谢谢你对我的照顾
示例df:

df = pd.DataFrame({'Unnamed: 0': ['Site',"A", "B","C"],
               'Unnamed: 1': ['Identifier 1',"Land", "Land","Land"],
               'Unnamed: 2': ['Day 1',1, 2,3],
               'Unnamed: 3': ['Day 2',4, 5,6],
               'Unnamed: 4': ['Day 3',7, 8,9],
               'Unnamed: 5': ['Day 4',10, 11,12],
               'Unnamed: 6': ['Identifier 2',"Water", "Water","Water"],
               'Unnamed: 7': ['Day 1',13, 14,15],
               'Unnamed: 8': ['Day 2',16, 17,18],
               'Unnamed: 9': ['Day 3',19, 20,21],
               'Unnamed: 10': ['Day 4',22, 23,24],
               })

输出文件:

Site    Identifier 1    Day 1   Day 2   Day 3   Day 4
A   Land    1   4   7   10
B   Land    2   5   8   11
C   Land    3   6   9   12
A   Water   13  16  19  22
B   Water   14  17  20  23
C   Water   15  18  21  24
dtcbnfnu

dtcbnfnu1#

这是你想要的吗?我首先将 Dataframe 拆分成两个 Dataframe ,每个 Dataframe 包含不同的标识符,最后将它们合并回tgt。以下是代码,希望它能有所帮助。

df = pd.DataFrame({'Unnamed: 0': ['Site',"A", "B","C"],
           'Unnamed: 1': ['Identifier 1',"Land", "Land","Land"],
           'Unnamed: 2': ['Day 1',1, 2,3],
           'Unnamed: 3': ['Day 2',4, 5,6],
           'Unnamed: 4': ['Day 3',7, 8,9],
           'Unnamed: 5': ['Day 4',10, 11,12],
           'Unnamed: 6': ['Identifier 2',"Water", "Water","Water"],
           'Unnamed: 7': ['Day 1',13, 14,15],
           'Unnamed: 8': ['Day 2',16, 17,18],
           'Unnamed: 9': ['Day 3',19, 20,21],
           'Unnamed: 10': ['Day 4',22, 23,24],
           })

## Restriving Column Name

header = [item for item in df.iloc[0]]

## Filter Out the first row

df = df.iloc[1:]

# Getting first 6 columns as df_land and rename column by header

df_land = df.iloc[:, 0:6]
df_land.columns = header[:6]

# Getting the first column and the last 5 columns as df_water and rename column by header

df_water = df.iloc[:,[0,6,7,8,9,10]]
df_water.columns = header[:6]

# Union two tables

df_final = pd.concat( [df_land,df_water] , axis = 0 )

编辑-用于动态N列

df = pd.DataFrame({'Unnamed: 0': ['Site',"A", "B","C"],
           'Unnamed: 1': ['Identifier 1',"Land", "Land","Land"],
           'Unnamed: 2': ['Day 1',1, 2,3],
           'Unnamed: 3': ['Day 2',4, 5,6],
           'Unnamed: 4': ['Day 3',7, 8,9],
           'Unnamed: 5': ['Day 4',10, 11,12],
           'Unnamed: 11': ['Day 5',10, 11,12],
           'Unnamed: 12': ['Day 6',10, 11,12],
           'Unnamed: 6': ['Identifier 2',"Water", "Water","Water"],
           'Unnamed: 7': ['Day 1',13, 14,15],
           'Unnamed: 8': ['Day 2',16, 17,18],
           'Unnamed: 9': ['Day 3',19, 20,21],
           'Unnamed: 10': ['Day 4',22, 23,24],

# 'Unnamed: 13': ['Day 5',22, 23,24],

           })

## Restriving Column Name

header = [item for item in df.iloc[0]]

## Filter Out the first row

df = df.iloc[1:]

n_col = len(header)
n_of_column_exclude_from_land = 5

# Getting first 6 columns as df_land and rename column by header

df_land = df.iloc[:, 0:n_col-n_of_column_exclude_from_land]
df_land.columns = header[:n_col-n_of_column_exclude_from_land]

# Getting the first column and the last 5 columns as df_water and rename column by header

water_col_index = [n for n in range(n_col-n_of_column_exclude_from_land,n_col)]
water_col_index.append(0)
water_col_index.sort()

df_water = df.iloc[:,water_col_index]
df_water.columns = header[:n_of_column_exclude_from_land+1]

# Union two tables

df_final = pd.concat( [df_land,df_water] , axis = 0 )

相关问题