pandas 2.1.0中的concat函数

ljo96ir5  于 2023-09-29  发布在  其他
关注(0)|答案(1)|浏览(111)

按行合并两个 Dataframe 时出错。上一个版本我使用的是pd.concat([df1, df2], axis=0),但是在pandas版本2.1.0中不起作用。有人知道如何解决这个错误吗?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[47], line 2
      1 print(real_last.shape, real_exp.shape) #(59202, 34) (4583, 34)
----> 2 real_out = pd.concat([real_exp, real_last], axis=0)
      3 print(real_out.shape)

File c:\Users\sarud\anaconda3\envs\ETLupdate\Lib\site-packages\pandas\core\reshape\concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    378     copy = False
    380 op = _Concatenator(
    381     objs,
    382     axis=axis,
   (...)
    390     sort=sort,
    391 )
--> 393 return op.get_result()

File c:\Users\sarud\anaconda3\envs\ETLupdate\Lib\site-packages\pandas\core\reshape\concat.py:680, in _Concatenator.get_result(self)
    676             indexers[ax] = obj_labels.get_indexer(new_labels)
    678     mgrs_indexers.append((obj._mgr, indexers))
...
--> 230 return super()._concat_same_type(to_concat, axis=axis)

File arrays.pyx:190, in pandas._libs.arrays.NDArrayBacked._concat_same_type()

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4583 and the array at index 1 has size 59202

我有包裹:print(sys.version, pd.__version__, np.__version__, sep='\n')

3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]
2.1.0
1.26.0

dataframes具有相同的结构,检查示例:

print(real_last.sample(2).T.to_markdown())

| | 9338 | 9338 |
| --|--|--|
| 奥登| 006710000 | 006781111 |
| 奥佩拉西翁| 0010 | 0020 |
| Operacion.text |XXXXXX| YYYYYY|
| Cl.orden |NP| NP|
| Cl.actividad | 030 | 035 |
| Ubic.tecnica |XXXX-XX-LAS-DES-BAP19| XXXX-XX-S13-MBA|
| Status.sistema |CTEC通知IMOP KKMP PREC| LIB. IMOP KKMP PREC|
| Status.sistema.op |注意事项CTEC NLIQ| LIB. NLIQ|
| Stat.Usuario |TRAT| TRAT|
| Fe.Entrada |2023-06-25 00:00:00| 2023-07-23 00:00:00|
| Fe.Lib |2023年7月6日00:00:00| 2023-07-23 00:00:00|
| Fe.Ini.real.ot |2023-07-01 00:00:00| Nat|
| Fe.Ini.real.op |2023-07-01 00:00:00| Nat|
| Fe.Ini.temp |2023年7月6日00:00:00| 2023-07-23 00:00:00|
| Aviso| 00120100 | 11194911 |
| Modif.por |XXXXX005| XXXXX011|
| Fe.Modif |2023年7月6日00:00:00| 2023-07-23 00:00:00|
| Autor| XXXXX003| XXXXX021|
| Grupo.planif |XXT| XX1|
| G.hojas.ruta |楠|楠|
| CGH|楠|楠|
| Plan.mant.prev |楠|楠|
| Pos.PM |楠|楠|
| Pto.tbjo.resp |XXXXXXXX| XXXXXXXX|
| Pto.tbjo.op |XXXXXXXX| XXXXXXXX|
| 坎蒂达| 1 | 0 |
| Duracion.normal |1.0| 0.0|
| Trabajo| 1.0| 0.0|
| Trabajo.real |1.0| 0.0|
| 科斯托托雷亚尔|一百四十七点零三分|0.0|
| Sum.costo.plan |一百四十七点零三分|四百七十九点九六|
| Tot.plan.general |一百四十七点零三分|四百七十九点九六|
| Total.real.general |一百四十七点零三分|0.0|
| Costo.dist |0.0| 0.0|
print(real_exp.sample(2).T.to_markdown())
| | 990 | 990 |
| --|--|--|
| 奥登| 222212222 | 333323333 |
| 奥佩拉西翁| 0120 | 0040 |
| Operacion.text |XXXXXXXXXX| YYYYYY|
| Cl.orden |PL| PL|
| Cl.actividad | 010 | 010 |
| Ubic.tecnica |XXXX-XX-S07-ALI-CTR 7B| XXXX-XX-SCA-AL2-AOG1C|
| Status.sistema |CTEC通知进口FMAT IMOP MOVM NLIQ PREC*| LIB. NOTI IMPR DOCU IMOP KKMP NLIQ PREC*|
| Status.sistema.op |通知CTEC进口NLIQ|注意事项:进口LIB. NLIQ认证|
| Stat.Usuario |TBTR| TRAT|
| Fe.Entrada |2019 -08-02 00:00:00| 2019 -08-02 00:00:00|
| Fe.Lib |2019 -08-23 00:00:00| 2023-08-21 00:00:00|
| Fe.Ini.real.ot |2019 -09-04 00:00:00| 2019 -09-05 00:00:00|
| Fe.Ini.real.op |2019 -09-05 00:00:00| 2023-09-06 00:00:00|
| Fe.Ini.temp |2023-09-07 00:00:00:00| 2019 -09-04 00:00:00|
| Aviso| 33333333 | 44444444 |
| Modif.por |XXXXX009| XXXXX003|
| Fe.Modif |2023-09-10 00:00:00| 2023-09-07 00:00:00:00|
| Autor| XXXXXXXXXXXX| XXXXXXXXXXXX|
| Grupo.planif |二十|XXC|
| G.hojas.ruta | 1886 | 76326 |
| CGH| 3 | 3 |
| Plan.mant.prev | 8763 | 191111 |
| Pos.PM | 95475 | 357140 |
| Pto.tbjo.resp |XXXXXXXX| XXXXXXXX|
| Pto.tbjo.op |XXXXXXXX| XXXXXXXX|
| 坎蒂达| 4 | 2 |
| Duracion.normal |4.0| 1.0|
| Trabajo| 16.0| 2.0|
| Trabajo.real |16.0| 0.5|
| 科斯托托雷亚尔|1627.5|零点零四分|
| Sum.costo.plan |2336.45|零点零九|
| Tot.plan.general |2336.45|零点零九|
| Total.real.general |1627.5|零点零四分|
| Costo.dist |楠|楠|

qxgroojn

qxgroojn1#

我不能用给定的例子触发ValueError,但是,由于你的 Dataframe 保存datetime值,这可能是由于 dtypes 和/或 resolution mismatch,就像在这个Q/A中一样。您也可以查看讨论类似问题的GH55067
试试这个:

real_out = pd.concat([real_exp, real_last.astype(real_exp.dtypes)], axis=0)

输出量:

print(real_out)

           Orden Operacion  ... Total.real.general Costo.dist
926    222212222      0120  ...             1627.5        NaN
990    333323333      0040  ...               0.04        NaN
43597  006710000      0010  ...             147.03        0.0
9338   006781111      0020  ...                0.0        0.0

[4 rows x 34 columns]

相关问题