pandas 在交叉表中使用堆栈或融合重塑数据

nafvub8i  于 2024-01-04  发布在  其他
关注(0)|答案(2)|浏览(96)

我正在使用一个包含列UPC、date_expected和quantity picked的数据框架。(每个订单一行,一天内包含相同UPC的多个订单),但也不列出每个UPC的每个日期,只列出挑库数量大于0的日期。目标:组织一个按UPC显示quantity_picked,然后按date_expected显示quantity_picked的结构,列出从2019年5月14日到当前的每个日期,即使quantity_picked = 0(显示quantity_picked = 0的行未包含在原始数据源中)。

  1. MFC_order_daily['date_expected'] = pd.to_datetime(MFC_order_daily['date_expected'], format='%Y-%m-%d')
  2. print('Daily Pick Data:')
  3. print(MFC_order_daily)

字符串
数据以这种格式出现:

  1. Daily Pick Data:
  2. UPC quantity_picked date_expected
  3. 0 0001111041660 1.0 2019-05-14
  4. 1 0001111045045 1.0 2019-05-14
  5. ... ... ...
  6. 39694 0004470036000 6.0 2019-06-24
  7. 39695 0007225001116 1.0 2019-06-24
  8. [39696 rows x 3 columns]


尝试使用groupby和reset_index进行组织,如下所示,但收到以下缺少日期的序列号,其中quantity_picked=0:

  1. tipd = MFC_order_daily.groupby(['UPC', 'date_expected']).sum().reset_index()
  2. tipd = tipd[['UPC','date_expected','quantity_picked']]
  3. print(tipd)
  1. UPC date_expected quantity_picked
  2. 0 0000000002554 2019-05-14 4.0
  3. 1 0001111041660 2019-05-14 2.0
  4. 2 0001111041660 2019-05-16 2.0
  5. 3 0004470036000 2019-05-14 3.0
  6. 4 0004470036000 2019-05-16 1.0

的字符串
然后尝试创建一个交叉表来获取零值,并使用stack或melt进行整形。成功创建并生成交叉表:

  1. tipd2 = pd.crosstab([MFC_order_daily["UPC"]], MFC_order_daily["date_expected"])
  2. print(tipd2)
  1. date_expected 2019-05-14 2019-05-15 ... 2019-06-23 2019-06-24
  2. UPC ...
  3. 0000000002554 0 0 ... 0 0
  4. 0000000003082 0 1 ... 2 3
  5. 0000000003107 1 0 ... 2 2
  6. ... ... ... ... ...
  7. 0360600051715 0 0 ... 0 0
  8. 0501072452748 0 0 ... 0 0
  9. 0880100551750 0 0 ... 0 0
  10. [8302 rows x 42 columns]

的字符串
尝试堆叠:

  1. tipd2.stack('date_expected')
  2. print('Stacked tipd2:')
  3. print(tipd2)

结果数据与上图交叉表相同,没有变化,没有错误。

尝试熔化:

  1. tipd2.melt(id_vars=['UPC', 'date_expected'])


产生的错误:

  1. KeyError: "The following 'id_vars' are not present in the DataFrame: ['UPC', 'date_expected']"


期望输出:

  1. UPC date_expected quantity_picked
  2. 0 0000000002554 2019-05-14 4.0
  3. 1 0000000002554 2019-05-15 0.0
  4. 2 0000000002554 2019-05-16 0.0
  5. 3 0001111041660 2019-05-14 2.0
  6. 4 0001111041660 2019-05-15 0.0
  7. 5 0001111041660 2019-05-16 2.0
  8. 6 0004470036000 2019-05-14 3.0
  9. 7 0004470036000 2019-05-15 0.0
  10. 8 0004470036000 2019-05-16 1.0


从5/14/19开始循环每个UPC的每个日期。

o2rvlv0m

o2rvlv0m1#

IIUC,您可以使用pivotstack

  1. # this is after aggregation by `groupby().sum()`
  2. df = pd.DataFrame({'UPC': ['0000000002554', '0001111041660', '0001111041660',
  3. '0004470036000', '0004470036000'],
  4. 'date_expected': ['2019-05-14',
  5. '2019-05-14',
  6. '2019-05-16',
  7. '2019-05-14',
  8. '2019-05-16'],
  9. 'quantity_picked': [4.0, 2.0, 2.0, 3.0, 1.0]})
  10. (df.pivot_table(index='UPC',
  11. columns='date_expected',
  12. values='quantity_picked',
  13. fill_value=0)
  14. .stack()
  15. .reset_index()
  16. )

字符串
输出量:

  1. UPC date_expected 0
  2. 0 0000000002554 2019-05-14 4
  3. 1 0000000002554 2019-05-16 0
  4. 2 0001111041660 2019-05-14 2
  5. 3 0001111041660 2019-05-16 2
  6. 4 0004470036000 2019-05-14 3
  7. 5 0004470036000 2019-05-16 1


如果你也想填写日期,那么你可能想看看reindex

展开查看全部
mctunoxg

mctunoxg2#

pd.melt中的ignore_index=False参数适用于这种情况
演示代码:

  1. import pandas as pd
  2. import string
  3. import random
  4. df_letter = pd.DataFrame({'Upper' : [random.choice(string.ascii_letters[26:30]) for _ in range(100)],
  5. 'Lower' : [random.choice(string.ascii_letters[0:5]) for _ in range(100)]} )

字符串
DF_字母是

  1. Upper Lower
  2. 0 A a
  3. 1 B b
  4. 2 D b
  5. 3 C c
  6. 4 D e


然后计算交叉表

  1. df_cross = pd.crosstab(df_letter.Upper, df_letter.Lower)


DF_Cross:

  1. Lower a b c d e
  2. Upper
  3. A 4 5 9 3 5
  4. B 7 7 5 6 5
  5. C 6 1 7 3 6
  6. D 7 4 3 3 4


然后你可以用ignore_index=False参数来调用pd.melt函数来重新塑造这个框架

  1. df_cross.melt(ignore_index=False)


输出头:

  1. Lower value
  2. Upper
  3. A a 4
  4. B a 7
  5. C a 6
  6. D a 7
  7. A b 5


也可以先添加索引列,然后将id_vars参数传递给pd.melt

  1. df_cross['Upper'] = df_cross.index
  2. df_cross.melt(id_vars='Upper')


输出负责人:

  1. Upper Lower value
  2. 0 A a 4
  3. 1 B a 7
  4. 2 C a 6
  5. 3 D a 7
  6. 4 A b 5
  7. 5 B b 7

展开查看全部

相关问题