如何在pandas,python上将多个csv文件合并成一个具有特定列的文件?

swvgeqrz  于 2023-06-20  发布在  Python
关注(0)|答案(4)|浏览(106)

我有4个不同的csv文件。
csv1:

ID      Fruit
1001    Apple
1002    Banana
1003    Kiwi

csv2:

ID      Color
1001    Green
1005    Red
1006    Orange
1007    Yellow

csv3:

ID      Size
1001    Large
1008    Small
1009    Medium
1010    Large

csv4:

ID      Price
1002    20
1009    40
1010    30
1011    50

这是一个我想创建的主csv文件:

Number  ID      Fruit   Color   Size    Price

1       1001    Apple   Green   Large   
2       1002    Banana                  20
3       1003    Kiwi            
4       1005            Red     
5       1006            Orange      
6       1007            Yellow      
7       1008            Small   
8       1009            Medium          40
9       1010            Large           30
10      1011                            50

我想用*pandas*会更容易,但我不知道Python
由于每个csv文件都有不同的列,我如何选择列并将它们粘贴到主csv文件中?如果没有信息,我想使其为NULL或N/A值。
你可以看到每个csv文件和主文件:Click here for image
我已经花了6个小时,但我不知道该怎么做。
先谢谢你。

mklgxw1f

mklgxw1f1#

reduce + combine_first

关键是将'ID'设置为索引,这样我们就可以在两个轴上获得正确的对齐。我假设所有的DataFrame都在内存中,但是如果没有,您可以将它们读入一个列表,或者在reduce步骤中进行阅读。

from functools import reduce

my_dfs = [df.set_index('ID') for df in [df1, df2, df3, df4]]
#my_dfs = [pd.read_csv(file).set_index('ID') for file in your_list_of_files]

reduce(lambda l,r: l.combine_first(r), my_dfs)
Color   Fruit  Price    Size
ID                                 
1001   Green   Apple    NaN   Large
1002     NaN  Banana   20.0     NaN
1003     NaN    Kiwi    NaN     NaN
1005     Red     NaN    NaN     NaN
1006  Orange     NaN    NaN     NaN
1007  Yellow     NaN    NaN     NaN
1008     NaN     NaN    NaN   Small
1009     NaN     NaN   40.0  Medium
1010     NaN     NaN   30.0   Large
1011     NaN     NaN   50.0     NaN
1u4esq0p

1u4esq0p2#

类似这样的东西应该可以工作:

import pandas as pd

list_of_csv_filenames = ['csv1.csv', 'csv2.csv', 'csv3.csv', 'csv4.csv']
all_dfs = []
for i in range(1, 5):
    temp = pd.read_csv(list_of_csv_filesnames[i-1])
    temp['Number'] = i
    all_dfs.append(temp)
full_df = pd.concat(all_dfs)
full_df.to_csv('output_filename.csv', index=False)
axzmvihb

axzmvihb3#

您可以使用merge:

import pandas as pd

df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = pd.read_csv('3.csv')
df4 = pd.read_csv('4.csv')
df = df1.merge(df2).merge(df3).merge(df4)
df.to_csv('result.csv')
lhcgjxsq

lhcgjxsq4#

Concat can be used to append data to dataframe.   
            
                file_path1 = r"path1.csv"
                df1 = pd.read_csv(file_path1)
                file_path2 = r"path2.csv"
                df2 = pd.read_csv(file_path2)
                file_path3 = r"path3.csv"
                df3 = pd.read_csv(file_path3)
                frames = [df1,df2,df3]
                df = pd.concat(frames)
reset index after concat
               df.reset_index(inplace=True)

相关问题