基于值数据类型转置PandasDF

8i9zcol2  于 2022-12-28  发布在  其他
关注(0)|答案(3)|浏览(136)

我有PandasDataFrame A。我正在努力将其转换为我想要的格式,请参见DataFrame B。我尝试了pivotmelt,但我不确定如何使其成为条件(string值转换为FIELD_STR_VALUEnumeric值转换为FIELD_NUM_VALUE)。我希望你能给我指出正确的方向。
A:输入 Dataframe

  1. |FIELD_A |FIELD_B |FIELD_C |FIELD_D |
  2. |--------|--------|--------|--------|
  3. |123123 |8 |a |23423 |
  4. |123124 |7 |c |6464 |
  5. |123144 |99 |x |234 |

B:所需的输出 Dataframe

  1. |ID |FIELD_A |FIELD_NAME |FIELD_STR_VALUE |FIELD_NUM_VALUE |
  2. |---|--------|-----------|----------------|----------------|
  3. |1 |123123 |B | |8 |
  4. |2 |123123 |C |a | |
  5. |3 |123123 |D | |23423 |
  6. |4 |123124 |B | |7 |
  7. |5 |123124 |C |c | |
  8. |6 |123124 |D | |6464 |
  9. |7 |123144 |B | |99 |
  10. |8 |123144 |C |x | |
  11. |9 |123144 |D | |234 |
e0bqpujr

e0bqpujr1#

您可以使用:

  1. # dic = {np.int64: 'NUM', object: 'STR'}
  2. (df.set_index('FIELD_A')
  3. .pipe(lambda d: d.set_axis(pd.MultiIndex.from_arrays(
  4. [d.columns, d.dtypes],
  5. # or for custom NAMES
  6. #[d.columns, d.dtypes.map(dic)],
  7. names=['FIELD_NAME', None]),
  8. axis=1)
  9. )
  10. .stack(0).add_prefix('FIELD_').add_suffix('_VALUE')
  11. .reset_index()
  12. )
  • 注意:如果你真的想要STR/NUM,map那些来自dtype的字符串(参见代码中的注解)。*

输出:

  1. FIELD_A FIELD_NAME FIELD_int64_VALUE FIELD_object_VALUE
  2. 0 123123 FIELD_B 8.0 NaN
  3. 1 123123 FIELD_C NaN a
  4. 2 123123 FIELD_D 23423.0 NaN
  5. 3 123124 FIELD_B 7.0 NaN
  6. 4 123124 FIELD_C NaN c
  7. 5 123124 FIELD_D 6464.0 NaN
  8. 6 123144 FIELD_B 99.0 NaN
  9. 7 123144 FIELD_C NaN x
  10. 8 123144 FIELD_D 234.0 NaN
展开查看全部
41ik7eoe

41ik7eoe2#

你也可以试试这个:

  1. (df.melt('FIELD_A')
  2. .pipe(lambda d: d[['FIELD_A', 'value']].join(d.variable.str.extract('\w+_(?P<FIELD_NAME>\w+)')))
  3. .pipe(lambda g: g[['FIELD_A', 'FIELD_NAME']].join(g.value.astype(str).str.extract('(?P<FIELD_STR_VALUE>\D+)|(?P<FIELD_NUM_VALUE>\d+)')))
  4. .sort_values('FIELD_A'))
  5. FIELD_A FIELD_NAME FIELD_STR_VALUE FIELD_NUM_VALUE
  6. 0 123123 B NaN 8
  7. 3 123123 C a NaN
  8. 6 123123 D NaN 23423
  9. 1 123124 B NaN 7
  10. 4 123124 C c NaN
  11. 7 123124 D NaN 6464
  12. 2 123144 B NaN 99
  13. 5 123144 C x NaN
  14. 8 123144 D NaN 234
slmsl1lt

slmsl1lt3#

  1. df.melt(id_vars='FIELD_A', var_name='FIELD_NAME', value_name='FIELD_VALUE').sort_values(by='FIELD_A').reset_index(drop=True)

产出

  1. FIELD_A FIELD_NAME FIELD_VALUE
  2. 0 123123 FIELD_B 8
  3. 1 123123 FIELD_C a
  4. 2 123123 FIELD_D 23423
  5. 3 123124 FIELD_B 7
  6. 4 123124 FIELD_C c
  7. 5 123124 FIELD_D 6464
  8. 6 123144 FIELD_B 99
  9. 7 123144 FIELD_C x
  10. 8 123144 FIELD_D 234

我希望这有助于实现您想要的输出!
我们可以进一步延伸:

  1. (df
  2. .melt(
  3. id_vars='FIELD_A',
  4. var_name='FIELD_NAME',
  5. value_name='FIELD_VALUE')
  6. .assign(
  7. FIELD_NAME = lambda df: df.FIELD_NAME.str[-1],
  8. FIELD_NUM_VALUE = lambda df: df.FIELD_VALUE.where(df.FIELD_VALUE.map(type) == int),
  9. FIELD_STR_VALUE = lambda df: np.where(df.FIELD_NUM_VALUE.isna(), df.FIELD_VALUE, np.nan))
  10. .drop(columns='FIELD_VALUE')
  11. .sort_values(by='FIELD_A', ignore_index = True)
  12. )
  13. FIELD_A FIELD_NAME FIELD_NUM_VALUE FIELD_STR_VALUE
  14. 0 123123 B 8 NaN
  15. 1 123123 C NaN a
  16. 2 123123 D 23423 NaN
  17. 3 123124 B 7 NaN
  18. 4 123124 C NaN c
  19. 5 123124 D 6464 NaN
  20. 6 123144 B 99 NaN
  21. 7 123144 C NaN x
  22. 8 123144 D 234 NaN

使用pd.Series.str.extract的另一个选项:

  1. temp = df.melt(id_vars='FIELD_A', var_name='FIELD_NAME', value_name='FIELD_VALUE')
  2. temp = temp.assign(FIELD_NAME = lambda df: df.FIELD_NAME.str[-1])
  3. regex = r"(?P<FIELD_STR_VALUE>\D+)|(?P<FIELD_NUM_VALUE>\d+)"
  4. extract = temp.FIELD_VALUE.astype(str).str.extract(regex)
  5. temp.drop(columns='FIELD_VALUE').assign(**extract)
  6. FIELD_A FIELD_NAME FIELD_STR_VALUE FIELD_NUM_VALUE
  7. 0 123123 B NaN 8
  8. 1 123124 B NaN 7
  9. 2 123144 B NaN 99
  10. 3 123123 C a NaN
  11. 4 123124 C c NaN
  12. 5 123144 C x NaN
  13. 6 123123 D NaN 23423
  14. 7 123124 D NaN 6464
  15. 8 123144 D NaN 234
展开查看全部

相关问题