pandas 如何从多个数据框列绘制甘特

5q4ezhmt  于 2023-06-20  发布在  其他
关注(0)|答案(2)|浏览(115)

我有一个CSV文件Master File,其中包含了我们仓库中所有产品的详细信息以及它们各自的时间细节(Time_In & Time_Out)。我已经使用pandas将CSV文件读入dataframe。
有人能告诉我,我如何用python将所有产品(Product_A1,...,Product_A7)的详细信息与它们各自的Time_In和Time_out详细信息绘制在一个绘图中。
我需要的是这样的东西。

master_df.csv

Date_A1,Product_A1,TIme_In_A1,TIme_out_A1,Date_A2,Product_A2,TIme_In_A2,TIme_out_A2,Date_A3,Product_A3,TIme_In_A3,TIme_out_A3,Date_A4,Product_A4,TIme_In_A4,TIme_out_A4,Date_A5,Product_A5,TIme_In_A5,TIme_out_A5,Date_A6,Product_A6,TIme_In_A6,TIme_out_A6,Date_A7,Product_A7,TIme_In_A7,TIme_out_A7
07-06-2023,A1,00:00:00,00:45:00,07-06-2023,A2,00:00:00,01:03:00,07-06-2023,A3,00:00:00,00:30:00,07-06-2023,A4,00:00:00,00:15:00,07-06-2023,A5,00:00:00,00:33:00,07-06-2023,A6,00:00:00,00:48:00,07-06-2023,A7,04:00:00,05:03:00
07-06-2023,A1,00:48:00,01:00:00,07-06-2023,A2,01:06:00,01:18:00,07-06-2023,A3,00:33:00,03:33:00,07-06-2023,A4,00:18:00,02:15:00,07-06-2023,A5,00:36:00,02:18:00,07-06-2023,A6,00:51:00,01:06:00,07-06-2023,A7,05:06:00,06:21:00
07-06-2023,A1,01:03:00,01:15:00,07-06-2023,A2,01:21:00,02:45:00,07-06-2023,A3,03:36:00,04:30:00,07-06-2023,A4,02:18:00,03:18:00,07-06-2023,A5,02:21:00,03:36:00,07-06-2023,A6,01:09:00,01:21:00,07-06-2023,A7,06:24:00,07:03:00
07-06-2023,A1,01:18:00,01:30:00,07-06-2023,A2,02:48:00,03:00:00,07-06-2023,A3,04:33:00,05:30:00,07-06-2023,A4,03:21:00,04:18:00,07-06-2023,A5,03:39:00,05:48:00,07-06-2023,A6,01:24:00,01:33:00,07-06-2023,A7,07:06:00,09:09:00
07-06-2023,A1,01:33:00,01:45:00,07-06-2023,A2,03:03:00,03:15:00,07-06-2023,A3,05:33:00,06:15:00,07-06-2023,A4,04:21:00,04:33:00,07-06-2023,A5,05:51:00,06:03:00,07-06-2023,A6,01:36:00,02:48:00,07-06-2023,A7,09:12:00,10:00:00
07-06-2023,A1,01:48:00,03:30:00,07-06-2023,A2,03:18:00,03:48:00,07-06-2023,A3,06:18:00,07:18:00,07-06-2023,A4,04:36:00,04:45:00,07-06-2023,A5,06:06:00,06:48:00,07-06-2023,A6,02:51:00,03:03:00,07-06-2023,A7,10:03:00,12:30:00
07-06-2023,A1,03:33:00,03:45:00,07-06-2023,A2,03:51:00,04:00:00,07-06-2023,A3,07:21:00,08:45:00,07-06-2023,A4,04:48:00,05:33:00,07-06-2023,A5,06:51:00,07:00:00,07-06-2023,A6,03:06:00,03:21:00,07-06-2023,A7,12:33:00,12:48:00
07-06-2023,A1,03:48:00,04:00:00,07-06-2023,A2,04:03:00,04:15:00,07-06-2023,A3,08:48:00,09:03:00,07-06-2023,A4,05:36:00,05:45:00,07-06-2023,A5,07:03:00,07:30:00,07-06-2023,A6,03:24:00,05:00:00,07-06-2023,A7,12:51:00,14:00:00
07-06-2023,A1,08:00:00,09:00:00,07-06-2023,A2,04:18:00,05:15:00,07-06-2023,A3,09:06:00,11:45:00,07-06-2023,A4,05:48:00,06:18:00,07-06-2023,A5,07:33:00,08:51:00,07-06-2023,A6,05:03:00,07:21:00,07-06-2023,A7,14:03:00,14:48:00
07-06-2023,A1,09:03:00,10:15:00,07-06-2023,A2,05:18:00,06:00:00,07-06-2023,A3,11:48:00,12:00:00,07-06-2023,A4,06:21:00,06:45:00,07-06-2023,A5,08:54:00,09:06:00,07-06-2023,A6,07:24:00,07:33:00,07-06-2023,A7,14:51:00,16:18:00
07-06-2023,A1,10:18:00,10:45:00,07-06-2023,A2,06:03:00,06:30:00,07-06-2023,A3,16:00:00,16:15:00,07-06-2023,A4,06:48:00,08:48:00,07-06-2023,A5,09:09:00,09:30:00,07-06-2023,A6,07:36:00,08:30:00,07-06-2023,A7,16:21:00,17:03:00
07-06-2023,A1,10:48:00,11:15:00,07-06-2023,A2,06:33:00,07:15:00,07-06-2023,A3,16:18:00,16:30:00,07-06-2023,A4,08:51:00,10:18:00,07-06-2023,A5,09:33:00,11:33:00,07-06-2023,A6,08:33:00,10:21:00,07-06-2023,A7,17:06:00,19:15:00
07-06-2023,A1,11:18:00,11:30:00,07-06-2023,A2,07:18:00,08:00:00,07-06-2023,A3,16:33:00,17:45:00,07-06-2023,A4,10:21:00,10:30:00,07-06-2023,A5,11:36:00,12:03:00,07-06-2023,A6,10:24:00,13:21:00,07-06-2023,A7,19:18:00,19:33:00
07-06-2023,A1,11:33:00,14:15:00,07-06-2023,A2,12:00:00,13:00:00,07-06-2023,A3,17:48:00,18:45:00,07-06-2023,A4,10:33:00,11:18:00,07-06-2023,A5,12:06:00,12:15:00,07-06-2023,A6,13:24:00,14:39:00,07-06-2023,A7,19:36:00,21:48:00
07-06-2023,A1,14:18:00,14:30:00,07-06-2023,A2,13:03:00,13:30:00,07-06-2023,A3,18:48:00,19:00:00,07-06-2023,A4,11:21:00,11:48:00,07-06-2023,A5,12:18:00,12:45:00,07-06-2023,A6,14:42:00,14:45:00,07-06-2023,A7,21:51:00,22:03:00
07-06-2023,A1,14:33:00,15:15:00,07-06-2023,A2,13:33:00,13:45:00,07-06-2023,A3,19:03:00,22:15:00,07-06-2023,A4,11:51:00,12:00:00,07-06-2023,A5,12:48:00,13:18:00,07-06-2023,A6,14:48:00,16:00:00,07-06-2023,A7,22:06:00,22:33:00
07-06-2023,A1,15:18:00,16:45:00,07-06-2023,A2,13:48:00,16:48:00,07-06-2023,A3,22:30:00,01:00:00,07-06-2023,A4,12:03:00,13:15:00,07-06-2023,A5,13:21:00,13:36:00,07-06-2023,A6,16:03:00,17:00:00,07-06-2023,A7,22:36:00,23:03:00
07-06-2023,A1,16:48:00,17:30:00,07-06-2023,A2,16:51:00,19:45:00,,,,,07-06-2023,A4,13:18:00,13:33:00,07-06-2023,A5,13:39:00,14:36:00,07-06-2023,A6,17:03:00,17:15:00,07-06-2023,A7,23:06:00,23:15:00
07-06-2023,A1,17:33:00,18:30:00,07-06-2023,A2,19:48:00,20:00:00,,,,,07-06-2023,A4,13:36:00,14:33:00,07-06-2023,A5,14:39:00,16:51:00,07-06-2023,A6,17:18:00,21:33:00,07-06-2023,A7,23:18:00,01:57:00
07-06-2023,A1,18:33:00,20:30:00,07-06-2023,A2,20:03:00,20:15:00,,,,,07-06-2023,A4,14:36:00,15:00:00,07-06-2023,A5,16:54:00,17:33:00,07-06-2023,A6,21:36:00,23:33:00,,,,
07-06-2023,A1,20:33:00,21:30:00,07-06-2023,A2,20:18:00,20:33:00,,,,,07-06-2023,A4,15:03:00,16:00:00,07-06-2023,A5,17:36:00,19:30:00,07-06-2023,A6,23:36:00,00:00:00,,,,
07-06-2023,A1,21:33:00,21:45:00,07-06-2023,A2,20:36:00,20:45:00,,,,,07-06-2023,A4,20:00:00,20:48:00,07-06-2023,A5,19:33:00,19:48:00,,,,,,,,
07-06-2023,A1,21:48:00,22:15:00,07-06-2023,A2,20:48:00,21:15:00,,,,,07-06-2023,A4,20:51:00,22:00:00,07-06-2023,A5,19:51:00,20:00:00,,,,,,,,
07-06-2023,A1,22:18:00,22:30:00,07-06-2023,A2,21:18:00,22:18:00,,,,,07-06-2023,A4,22:03:00,00:48:00,,,,,,,,,,,,
07-06-2023,A1,22:33:00,23:45:00,07-06-2023,A2,22:21:00,22:45:00,,,,,,,,,,,,,,,,,,,,
07-06-2023,A1,23:48:00,00:15:00,07-06-2023,A2,22:48:00,23:00:00,,,,,,,,,,,,,,,,,,,,
,,,,07-06-2023,A2,23:03:00,23:30:00,,,,,,,,,,,,,,,,,,,,
,,,,07-06-2023,A2,23:33:00,23:48:00,,,,,,,,,,,,,,,,,,,,
,,,,07-06-2023,A2,23:51:00,01:48:00,,,,,,,,,,,,,,,,,,,,
sh7euo9m

sh7euo9m1#

  • 此选项使用.barh绘制长 Dataframe ,如此answer所示。
  • 如果检测到'TIme_out'为第二天,则基于时间分量小于TIme_In',将1天添加到日期,因此可视化在每个'Product'的正确日期结束。
      • python 3.11.3pandas 2.0.2matplotlib 3.7.1numpy 1.24.3中测试**
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

# load the data
df = pd.read_csv('d:/data/2023-06-07_data_so.csv')

# convert the columns to datetime
for date, time_in, time_out in zip(df.columns[0::4], df.columns[2::4], df.columns[3::4]):
    df[time_in] = pd.to_datetime(df[date] + ' ' + df[time_in])
    df[time_out] = pd.to_datetime(df[date] + ' ' + df[time_out])    
    
    # if time out is less than time in, add 1 day to time out
    df[time_out] = np.where(df[time_out].lt(df[time_in]), df[time_out].add(pd.Timedelta(1, unit='D')), df[time_out])

# add an id column for use in wide_to_long
df['id'] = df.index

# convert the dataframe to long format
dfl = pd.wide_to_long(df.filter(regex='id|TIme'), stubnames=['TIme_In', 'TIme_out'], j='Product', sep='_', i='id', suffix='.+').reset_index(level=1).dropna()

# calculate the time difference
dfl['Width'] = dfl.TIme_out.sub(dfl.TIme_In)

# map Product to a color
cm = dict(zip(dfl.Product.unique(), ['#ff0000', '#92d050', '#00b0f0', '#ffff00', '#c55a11', '#7030a0', '#00b050']))
dfl['color'] = dfl.Product.map(cm)

# create fig and ax
fig, ax = plt.subplots(figsize=(15, 7), dpi=200, tight_layout=True)

# plot 
ax.barh(y='Product', width='Width', left='TIme_In', color='color', data=dfl, ec='k')

# set the x-axis limits
ax.set_xlim(dfl.TIme_In.min(), dfl.TIme_out.max())

# format the xticks and the interval
ax.xaxis.set_major_locator(mdates.MinuteLocator(byminute=range(0, 60, 30)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
fig.autofmt_xdate(rotation=90, ha="center")

dfl.head()

Product             TIme_In            TIme_out           Width    color
id                                                                         
0       A1 2023-07-06 00:00:00 2023-07-06 00:45:00 0 days 00:45:00  #ff0000
1       A1 2023-07-06 00:48:00 2023-07-06 01:00:00 0 days 00:12:00  #ff0000
2       A1 2023-07-06 01:03:00 2023-07-06 01:15:00 0 days 00:12:00  #ff0000
3       A1 2023-07-06 01:18:00 2023-07-06 01:30:00 0 days 00:12:00  #ff0000
4       A1 2023-07-06 01:33:00 2023-07-06 01:45:00 0 days 00:12:00  #ff0000
qoefvg9y

qoefvg9y2#

这里有一个关于broken_barh的建议,但你需要先重塑你的数据集:

import pandas as pd

tmp = pd.read_csv("master_df.csv")

df = (
    tmp.filter(regex="Date|TIme")
        .pipe(lambda x: x.set_axis(
            x.columns.str.rsplit("_", n=1, expand=True), axis=1))
        .stack().droplevel(0).reset_index(names="Product")
        .assign(DateTime_In = lambda x:
                pd.to_datetime(x["Date"].str.cat(x.pop("TIme_In"), sep=" ")),
                DateTime_Out= lambda x:
                pd.to_datetime(x.pop("Date").str.cat(x.pop("TIme_out"), sep=" ")))
)

# to fix certain out dates
df["DateTime_Out"] = (
    df["DateTime_Out"].add(pd.Timedelta(days=1))
        .where(
            (df["DateTime_Out"].sub(df["DateTime_In"])).dt.days.eq(-1),
            df["DateTime_Out"]
        )
)

然后你就可以做GANTT chart了:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots()

groups, products = df.groupby("Product"), df["Product"].unique()

colors = {
    "A1": "#ff0000", "A2": "#92d050", "A3": "#00b0f0",
    "A4": "#ffff00", "A5": "#c55a11", "A6": "#7030a0",
    "A7": "#00b050"
}

height = 0.5

for y, (product, group) in enumerate(groups):
    durations = group["DateTime_Out"] - group["DateTime_In"]
    ax.broken_barh(list(zip(group["DateTime_In"], durations)), (y - height/2, height),
                   facecolors=colors.get(product),
                   edgecolor="blue", linewidth=0.5)

ax.set_ylim(-height, len(products) - 1 + height)
ax.set_xlim(df["DateTime_In"].min(), df["DateTime_Out"].max().ceil("H"))
ax.set_yticks(range(len(products)))
ax.set_yticklabels(sorted(products), fontsize=14)

ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=30))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S"))
fig.autofmt_xdate(rotation=90, ha="center")
fig.set_size_inches(15, 7)

ax.set_xlabel("Time", fontweight="bold", fontsize=14)
ax.set_ylabel("Product", fontweight="bold", fontsize=14)

ax.grid(axis="y", color="lightblue")

plt.show();

输出:

相关问题