我有一个大型数据集的运行统计数据多年存储在excel我想处理在pythonPandas和目前的输出与分组的距离范围作为行和年作为列,每个单元格的表将包含计数,平均速度,和最小速度。我已经能够创建输出的排序得到那里,但输出格式不好
import pandas as pd
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)
# set max width to 100
pd.set_option('display.width', 100)
# Replace "file_path.xlsx" with the actual path to your Excel file
df = pd.read_excel(r"C:\xxxxxx\OneDrive\Documents\garmin running data1.xlsx",
sheet_name='jun 15 2018 to apr 19 2023')
df = df[df['Activity Type'] == 'Running']
# Define the bins for grouping distances and labels for each bin
bins = [0, .5, 2.4, 3.3, 4.6, 5.4, 6.5, 13, 19.5, 20.2, 30]
labels = ['false start', '2 miles', '5k', '4 miles', '5 miles', '10K', 'Half', 'Long Train', '20', 'Marathon']
# Extract the year from the date field
df['Year'] = df['Date'].dt.year
# Create a new column that assigns each distance to a bin with named labels
df['Distances'] = pd.cut(df['Distance'], bins=bins, labels=labels)
df['Avg Pace'] = pd.to_numeric(df['Avg Pace'], errors='coerce')
df = df.dropna(subset=['Avg Pace'])
# Create a pivot table that shows the average of average_paces for each distance bin
pivot_table = pd.pivot_table(df, values='Avg Pace', index=['Distances'], columns=['Year'], aggfunc=('mean', 'count', 'min'))
#pivot_table['Avg Pace'] = pd.to_datetime(pivot_table['Avg Pace'] / 60 * 86400, unit='s').dt.strftime('%M:%S.%f')
# Divide all the values in the pivot table by 60 and multiply by 86400
pivot_table = pivot_table / 60 * 86400
# Format the values as mm:ss.0
pivot_table = pivot_table.applymap(lambda x: pd.to_datetime(x, unit='s').strftime('%M:%S.%S') if not pd.isna(x) else '')
看上面-我想要的是这样的
Year 2018 2019 2020 2021
Distances
mean 8:59.59 7:45.45 8:15.22 8:34.36
5k min 8:05.15 7:13.45 8:05.22 8:04.36
count 50 75 85 35
mean 8:59.59 7:45.45 8:15.22 8:34.36
10k min 8:05.15 7:13.45 8:05.22 8:04.36
count 50 75 85 35
我得到的是:
count mean \
Year 2018 2019 2020 2021 2022 2023 2018 2019
Distances
false start 00:00.00 00:00.00 00:00.00 00:00.00 00:00.00 00:00.00
2 miles 00:00.00 24:00.00 12:00.00 23:59.59 36:00.00 24:00.00 09:14.14 19:00.00
5k 24:00.00 48:00.00 24:00.00 24:00.00 35:59.59 48:00.00 10:11.11 09:18.18
4 miles 12:00.00 36:00.00 36:00.00 00:00.00 00:00.00 12:00.00 09:12.12 08:45.45
5 miles 11:59.59 36:00.00 00:00.00 48:00.00 24:00.00 48:00.00 09:11.11 08:49.49
10K 47:59.59 24:00.00 48:00.00 24:00.00 12:00.00 36:00.00 09:02.02 08:37.37
Half 00:00.00 48:00.00 48:00.00 24:00.00 48:00.00 48:00.00 09:13.13 08:34.34
Long Train 24:00.00 36:00.00 24:00.00 00:00.00 24:00.00 00:00.00 09:08.08 08:21.21
20 00:00.00 12:00.00 24:00.00 00:00.00 00:00.00 00:00.00 09:00.00
Marathon 00:00.00 24:00.00 24:00.00 00:00.00 48:00.00 00:00.00 09:34.34
min \
Year 2020 2021 2022 2023 2018 2019 2020 2021
Distances
false start
2 miles 13:04.04 14:27.27 09:30.30 08:58.58 08:20.20 08:48.48 09:34.34 08:49.49
5k 08:56.56 09:03.03 08:41.41 08:26.26 09:34.34 07:27.27 08:16.16 08:13.13
4 miles 09:00.00 09:06.06 08:36.36 08:30.30 08:28.28 07:35.35 08:05.05 08:18.18
5 miles 08:48.48 08:59.59 08:36.36 08:27.27 08:26.26 07:43.43 07:45.45 08:17.17
10K 08:54.54 09:10.10 08:35.35 08:34.34 07:31.31 07:16.16 08:17.17 09:02.02
Half 08:29.29 08:50.50 08:54.54 08:29.29 09:06.06 07:31.31 08:23.23 08:50.50
Long Train 07:49.49 08:37.37 09:08.08 07:33.33 07:49.49
20 08:34.34 08:44.44 08:34.34
Marathon 08:30.30 09:40.40 09:34.34 08:30.30
Year 2022 2023
Distances
false start
2 miles 09:19.19 08:58.58
5k 07:35.35 07:37.37
4 miles 08:04.04 08:09.09
5 miles 07:39.39 08:09.09
10K 07:43.43 08:28.28
Half 08:33.33 08:23.23
Long Train 08:03.03
20
Marathon 08:53.53
2条答案
按热度按时间rta7y2nd1#
您可以通过以下方式执行此操作:
也就是说
sr4lhrrt2#
我是这样做的:
最后我创建了3个透视表,并通过一些重新格式化将它们合并在一起