pandas 为什么合并两个数据框后会得到一个包含三倍行的数据框

55ooxyrt  于 2022-11-20  发布在  其他
关注(0)|答案(3)|浏览(98)

I have df1 :

x            y no.
0  -17.7    -0.785430  y1
1  -15.0 -3820.085000  y4
2  -12.5     2.138833  y3
..  ....     ........  ..
40  15.6     5.486901  y2
41  19.2     1.980686  y3
42  19.6     9.364718  y2

and df2 :

delta y     x
0     0.053884 -17.7
1     0.085000 -15.0
2     0.143237 -12.5
..    ........  ....
40    0.113099  15.6
41    0.102245  19.2
42    0.235282  19.6

They both have 43 rows, and x column is exactly the same on both.
Somehow when I merge them on x I get a df with 123 rows:

x            y no.   delta y
0   -17.7    -0.785430  y1  0.053884
1   -15.0 -3820.085000  y4  0.085000
2   -12.5     2.138833  y3  0.143237
3   -12.4     1.721205  y3  0.251180
4   -12.1     2.227343  y2  0.127343
..    ...          ...  ..       ...
118  12.1     1.642526  y3  0.143886
119  14.4  2576.435000  y4  0.171000
120  15.6     5.486901  y2  0.113099
121  19.2     1.980686  y3  0.102245
122  19.6     9.364718  y2  0.235282

My input: final = df1.merge(df2, on="x") x float64 y float64 no. object dtype: object
delta y float64 x float64 dtype: object
x float64 y float64 no. object dtype: object
delta y float64 x float64 dtype: object
x float64 y float64 no. object dtype: object
delta y float64 x float64 dtype: object
df1 = pd.DataFrame({'x': {0: -17.7, 1: -15.0, 2: -12.5, 3: -12.4, 4: -12.1, 5: -11.2, 6: -8.9, 7: -7.5, 8: -7.5, 9: -6.0, 10: -6.0, 11: -4.7, 12: -4.1, 13: -3.8, 14: -3.4, 15: -3.4, 16: -1.9, 17: -1.5, 18: -1.1, 19: -0.4, 20: -0.1, 21: 3.5, 22: 3.8, 23: 5.3, 24: 5.3, 25: 5.3, 26: 5.3, 27: 5.3, 28: 5.3, 29: 5.3, 30: 5.3, 31: 5.3, 32: 6.4, 33: 6.8, 34: 6.8, 35: 10.2, 36: 10.3, 37: 11.9, 38: 12.1, 39: 14.4, 40: 15.6, 41: 19.2, 42: 19.6}, 'y': {0: -0.7854295, 1: -3820.085, 2: 2.1388333, 3: 1.7212046, 4: 2.227343, 5: 0.04315967, 6: -0.9616607, 7: -1.9878536, 8: -0.52237016, 9: -283.27216, 10: -282.5332, 11: -0.4335017, 12: -1.1585577, 13: -0.008831219, 14: 848.92303, 15: -57.407845, 16: -9.010686, 17: -3.2473037, 18: 0.5536767, 19: 1.8351307, 20: 4.8347697, 21: -6.45842, 22: -1.5683812, 23: 0.9338831, 24: 0.9338831, 25: 97.65833, 26: 1.6500127, 27: 1.6500127, 28: 97.65833, 29: 97.65833, 30: 1.6500127, 31: 97.65833, 32: -3.655422, 33: 1.9058462, 34: 227.5592, 35: 857.7455, 36: -0.68584794, 37: 1.6785516, 38: 1.6425261, 39: 2576.435, 40: 5.4869013, 41: 1.9806856, 42: 9.364718}, 'no.': {0: 'y1', 1: 'y4', 2: 'y3', 3: 'y3', 4: 'y2', 5: 'y3', 6: 'y2', 7: 'y2', 8: 'y2', 9: 'y4', 10: 'y4', 11: 'y1', 12: 'y3', 13: 'y1', 14: 'y4', 15: 'y4', 16: 'y4', 17: 'y4', 18: 'y1', 19: 'y3', 20: 'y4', 21: 'y2', 22: 'y3', 23: 'y3', 24: 'y3', 25: 'y4', 26: 'y3', 27: 'y3', 28: 'y4', 29: 'y3', 30: 'y4', 31: 'y4', 32: 'y2', 33: 'y3', 34: 'y3', 35: 'y4', 36: 'y3', 37: 'y3', 38: 'y3', 39: 'y4', 40: 'y2', 41: 'y3', 42: 'y2'}})
df2 = pd.DataFrame({'delta y': {0: 0.05388353000000001, 1: 0.08500000000003638, 2: 0.14323679999999994, 3: 0.25117999999999996, 4: 0.12734299999999976, 5: 0.36285006000000003, 6: 0.13833930000000005, 7: 0.5121464, 8: 1.97762984, 9: 0.2721599999999853, 10: 0.4667999999999779, 11: 0.2692114, 12: 0.00890970000000002, 13: 0.314458351, 14: 906.34703, 15: 0.0161549999999977, 16: 0.06831400000000087, 17: 0.3723036999999998, 18: 0.2988478, 19: 0.006991300000000145, 20: 0.14423030000000026, 21: 0.04157999999999973, 22: 0.013554200000000183, 23: 0.17486560000000007, 24: 0.17486560000000007, 25: 0.03866999999999621, 26: 0.541264, 27: 0.541264, 28: 0.03866999999999621, 29: 96.5495813, 30: 96.0469873, 31: 0.03866999999999621, 32: 0.05542200000000008, 33: 0.1670513, 34: 225.82040510000002, 35: 0.38250000000005, 36: 0.59580486, 37: 0.10641100000000003, 38: 0.14388610000000002, 39: 0.17099999999982174, 40: 0.11309869999999922, 41: 0.10224489999999986, 42: 0.23528199999999977}, 'x': {0: -17.7, 1: -15.0, 2: -12.5, 3: -12.4, 4: -12.1, 5: -11.2, 6: -8.9, 7: -7.5, 8: -7.5, 9: -6.0, 10: -6.0, 11: -4.7, 12: -4.1, 13: -3.8, 14: -3.4, 15: -3.4, 16: -1.9, 17: -1.5, 18: -1.1, 19: -0.4, 20: -0.1, 21: 3.5, 22: 3.8, 23: 5.3, 24: 5.3, 25: 5.3, 26: 5.3, 27: 5.3, 28: 5.3, 29: 5.3, 30: 5.3, 31: 5.3, 32: 6.4, 33: 6.8, 34: 6.8, 35: 10.2, 36: 10.3, 37: 11.9, 38: 12.1, 39: 14.4, 40: 15.6, 41: 19.2, 42: 19.6}})
final = df1.merge(df2, on="x")

vuktfyat

vuktfyat1#

请尝试以下操作:df1.join(df2)
连接是列向左连接
pd.merge是按列内部连接
pd.concat是按行的外连接
pd.concat:采用可迭代参数。因此,它不能直接采用DataFrame(使用[df,df2])。DataFrame的维度应沿着轴匹配
加入和分帐合并:可以接受DataFrame参数
参考:Merge two dataframes by index

lf5gs5x2

lf5gs5x22#

尝试以下语法,我鼓励你彻底阅读Python的官方文档,链接在底部给出。我想你可能在df 1和df 2中有不同的x值,它们不是100%相同。这可能是因为小数的缘故。

import pandas as pd

left = pd.DataFrame(
   {
       "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
     }
 )

right = pd.DataFrame(
     {
        "key": ["K0", "K1", "K2", "K3"],
         "C": ["C0", "C1", "C2", "C3"],
         "D": ["D0", "D1", "D2", "D3"],
     }
 )
 

result = pd.merge(left, right, on="key")

Result Image
Python Merge,Join, Concatenate Official Guide

kqlmhetl

kqlmhetl3#

问题是x值不是唯一的,因此合并重复行以获得所有组合。

>>> import pandas as pd
>>> df1=pd.DataFrame({"a":[1,2,3,2], "b":['a', 'b', 'c', 'd']})
>>> df2=pd.DataFrame({"a":[1,2,3,2], "c":['aa', 'bb', 'cc', 'dd']})
>>> df1.merge(df2, on='a')
   a  b   c
0  1  a  aa
1  2  b  bb
2  2  b  dd
3  2  d  bb
4  2  d  dd
5  3  c  cc

2在该列中不是唯一的,并且获得所有组合(注意b --〉dd和d --〉dd)。
在您的例子中,x列在两个 Dataframe 中是相同的,这也意味着索引没有改变,您可以将您想要的列分配给df1

df1["delta y"] = df2["delta y"]

相关问题