如何使用pandas.value_counts计算(a列)中事件发生的次数,以及(b列)中规定的groupby year次数

cu6pst1q  于 2021-09-08  发布在  Java
关注(0)|答案(1)|浏览(265)

我已经预处理了一份包含美国紧急情况和灾难历史信息的df,现在包含了1960-2017年间的“``[”地点、灾难类型、开始日期、结束日期、灾难长度、年份]。
现在,我想创建2个新的dfs。
=每年发生灾难的次数,
=每年发生各类灾害的次数。
这是我目前试图计算每年发生的灾难数量并创建一个新的df的尝试,但我不确定如何让它具体计算每年的灾难数量。

  1. # Number of each Disaster each year
  2. df_yearly_dcount=df_time.groupby(df_time['Start_year']).count()

至于第二个,我不太确定每年有多少次灾难,因为我需要先弄清楚第一次灾难,然后才能继续前进,继续分离。
这是完整的代码:

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import pandas as pd
  4. import seaborn as sns
  5. from scipy.stats import zscore
  6. # Import Datased
  7. df = pd.read_csv('database.csv')
  8. df_time = (df[['County','Disaster Type','Start Date', 'End Date']][0: :])
  9. # Preprocessing
  10. # Number of NaN values
  11. df_nan = df[['County','Disaster Type','Start Date', 'End Date']].isna().sum()
  12. # NaN values as a percentage as total
  13. df_nan_number = [(df_nan.sum(axis=0)), str((((539/45330)*100))) +'%']
  14. # Remove NaN values
  15. df_time.dropna(subset = ["County", 'End Date'], inplace=True)
  16. # Set Date Format
  17. df_time['Start_Date_A'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y')
  18. df_time['End_Date_A'] = pd.to_datetime(df['End Date'], format='%m/%d/%Y')
  19. # Create new column == Disaster Length
  20. df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days
  21. # Create new column == start year
  22. df_time['Start_year'] = df_time['Start_Date_A'].dt.year
  23. # Dropped Old Date Formats from df
  24. df_time = df_time.drop(columns=['Start Date', 'End Date'], axis=1)
  25. # Replace 0 day values with 1 to indicate a Disaster length of 1 Day
  26. df_time['Disaster_Length'] = df_time['Disaster_Length'].replace({0:1})
  27. # Replace all values with absolute values so all days are represented as positive numeric values
  28. df_time['Disaster_Length'] = df_time['Disaster_Length'].abs()
  29. # Locating man-made and and non 'natural' disasters, sorting Disaster types, and analyzing value counts
  30. df_DTypes= df_time['Disaster Type'].values
  31. df_DTypes=pd.DataFrame(df_DTypes)
  32. df_DType_VCounts=(df_DTypes.value_counts()).sort_values(ascending=True)
  33. Df_DType_Natural=(df_DType_VCounts.drop(['Human Cause', 'Chemical', 'Dam/Levee Break', 'Terrorism','Other'],axis=0)).sort_values(ascending=True)
  34. df_time = df_time.rename(columns={'Disaster Type': 'Disaster_Type'})
  35. # Removing non-natural disasters from main df_time
  36. df_time = df_time[(df_time.Disaster_Type != 'Human Cause') & (df_time.Disaster_Type != 'Chemical') & (df_time.Disaster_Type != 'Dam/Levee Break') & (df_time.Disaster_Type != 'Terrorism') & (df_time.Disaster_Type != 'Other') ]
  37. # Analysis
  38. # Dataframe with mean disaster length for each year
  39. df_yearly_mean = df_time.groupby(['Start_year']).mean()
  40. # Number of Disasters per year
  41. df_yearly_dcount=df_time.groupby(df_time['Start_year']).count().reset_index(name='Disaster_Type')
  42. # Number of each Disaster each year

这是df的可复制样品:

  1. ,County,Disaster_Type,Start_Date_A,End_Date_A,Disaster_Length,Start_year
  2. 89,Clay County,Flood,1959-01-29,1959-01-29,1,1959
  3. 181,Alpine County,Flood,1964-12-24,1964-12-24,1,1964
  4. 182,Amador County,Flood,1964-12-24,1964-12-24,1,1964
  5. 183,Butte County,Flood,1964-12-24,1964-12-24,1,1964
  6. 184,Colusa County,Flood,1964-12-24,1964-12-24,1,1964
  7. 185,Del Norte County,Flood,1964-12-24,1964-12-24,1,1964
  8. 186,El Dorado County,Flood,1964-12-24,1964-12-24,1,1964
  9. 187,Glenn County,Flood,1964-12-24,1964-12-24,1,1964
  10. 188,Humboldt County,Flood,1964-12-24,1964-12-24,1,1964
  11. 189,Lake County,Flood,1964-12-24,1964-12-24,1,1964
  12. 190,Lassen County,Flood,1964-12-24,1964-12-24,1,1964
  13. 191,Marin County,Flood,1964-12-24,1964-12-24,1,1964
  14. 192,Mendocino County,Flood,1964-12-24,1964-12-24,1,1964
  15. 193,Modoc County,Flood,1964-12-24,1964-12-24,1,1964
  16. 194,Napa County,Flood,1964-12-24,1964-12-24,1,1964
  17. 195,Nevada County,Flood,1964-12-24,1964-12-24,1,1964
  18. 196,Placer County,Flood,1964-12-24,1964-12-24,1,1964
  19. 197,Plumas County,Flood,1964-12-24,1964-12-24,1,1964
  20. 198,Sacramento County,Flood,1964-12-24,1964-12-24,1,1964
  21. 199,San Joaquin County,Flood,1964-12-24,1964-12-24,1,1964
  22. 200,Shasta County,Flood,1964-12-24,1964-12-24,1,1964
  23. 201,Sierra County,Flood,1964-12-24,1964-12-24,1,1964
  24. 202,Siskiyou County,Flood,1964-12-24,1964-12-24,1,1964
  25. 203,Solano County,Flood,1964-12-24,1964-12-24,1,1964
  26. 204,Sonoma County,Flood,1964-12-24,1964-12-24,1,1964
  27. 205,Stanislaus County,Flood,1964-12-24,1964-12-24,1,1964
  28. 206,Sutter County,Flood,1964-12-24,1964-12-24,1,1964
  29. 207,Tehama County,Flood,1964-12-24,1964-12-24,1,1964
  30. 208,Trinity County,Flood,1964-12-24,1964-12-24,1,1964
  31. 209,Tuolumne County,Flood,1964-12-24,1964-12-24,1,1964
  32. 210,Yolo County,Flood,1964-12-24,1964-12-24,1,1964
  33. 211,Yuba County,Flood,1964-12-24,1964-12-24,1,1964
  34. 212,Baker County,Flood,1964-12-24,1964-12-24,1,1964
  35. 213,Benton County,Flood,1964-12-24,1964-12-24,1,1964
  36. 214,Clackamas County,Flood,1964-12-24,1964-12-24,1,1964
  37. 215,Clatsop County,Flood,1964-12-24,1964-12-24,1,1964
  38. 216,Columbia County,Flood,1964-12-24,1964-12-24,1,1964
  39. 217,Coos County,Flood,1964-12-24,1964-12-24,1,1964
  40. 218,Crook County,Flood,1964-12-24,1964-12-24,1,1964
  41. 219,Curry County,Flood,1964-12-24,1964-12-24,1,1964
  42. 220,Deschutes County,Flood,1964-12-24,1964-12-24,1,1964
  43. 221,Douglas County,Flood,1964-12-24,1964-12-24,1,1964
  44. 222,Gilliam County,Flood,1964-12-24,1964-12-24,1,1964
  45. 223,Grant County,Flood,1964-12-24,1964-12-24,1,1964
  46. 224,Harney County,Flood,1964-12-24,1964-12-24,1,1964
  47. 225,Hood River County,Flood,1964-12-24,1964-12-24,1,1964
  48. 226,Jackson County,Flood,1964-12-24,1964-12-24,1,1964
  49. 227,Jefferson County,Flood,1964-12-24,1964-12-24,1,1964
  50. 228,Josephine County,Flood,1964-12-24,1964-12-24,1,1964
  51. 229,Klamath County,Flood,1964-12-24,1964-12-24,1,1964
  52. 230,Lake County,Flood,1964-12-24,1964-12-24,1,1964
  53. 231,Lane County,Flood,1964-12-24,1964-12-24,1,1964
  54. 232,Lincoln County,Flood,1964-12-24,1964-12-24,1,1964
  55. 233,Linn County,Flood,1964-12-24,1964-12-24,1,1964
  56. 234,Malheur County,Flood,1964-12-24,1964-12-24,1,1964
  57. 235,Marion County,Flood,1964-12-24,1964-12-24,1,1964
  58. 236,Morrow County,Flood,1964-12-24,1964-12-24,1,1964
  59. 237,Multnomah County,Flood,1964-12-24,1964-12-24,1,1964
  60. 238,Polk County,Flood,1964-12-24,1964-12-24,1,1964
  61. 239,Sherman County,Flood,1964-12-24,1964-12-24,1,1964
  62. 240,Tillamook County,Flood,1964-12-24,1964-12-24,1,1964
  63. 241,Umatilla County,Flood,1964-12-24,1964-12-24,1,1964
  64. 242,Union County,Flood,1964-12-24,1964-12-24,1,1964
  65. 243,Wallowa County,Flood,1964-12-24,1964-12-24,1,1964
  66. 244,Wasco County,Flood,1964-12-24,1964-12-24,1,1964
  67. 245,Washington County,Flood,1964-12-24,1964-12-24,1,1964
  68. 246,Wheeler County,Flood,1964-12-24,1964-12-24,1,1964
  69. 247,Yamhill County,Flood,1964-12-24,1964-12-24,1,1964
  70. 248,Asotin County,Flood,1964-12-29,1964-12-29,1,1964
  71. 249,Benton County,Flood,1964-12-29,1964-12-29,1,1964
  72. 250,Clark County,Flood,1964-12-29,1964-12-29,1,1964
  73. 251,Columbia County,Flood,1964-12-29,1964-12-29,1,1964
  74. 252,Cowlitz County,Flood,1964-12-29,1964-12-29,1,1964
  75. 253,Garfield County,Flood,1964-12-29,1964-12-29,1,1964
  76. 254,Grays Harbor County,Flood,1964-12-29,1964-12-29,1,1964
  77. 255,King County,Flood,1964-12-29,1964-12-29,1,1964
  78. 256,Kittitas County,Flood,1964-12-29,1964-12-29,1,1964
  79. 257,Klickitat County,Flood,1964-12-29,1964-12-29,1,1964
  80. 258,Lewis County,Flood,1964-12-29,1964-12-29,1,1964
  81. 259,Mason County,Flood,1964-12-29,1964-12-29,1,1964
  82. 260,Pacific County,Flood,1964-12-29,1964-12-29,1,1964
  83. 261,Pierce County,Flood,1964-12-29,1964-12-29,1,1964
  84. 262,Skamania County,Flood,1964-12-29,1964-12-29,1,1964
  85. 263,Snohomish County,Flood,1964-12-29,1964-12-29,1,1964
  86. 264,Spokane County,Flood,1964-12-29,1964-12-29,1,1964
  87. 265,Wahkiakum County,Flood,1964-12-29,1964-12-29,1,1964
  88. 266,Walla Walla County,Flood,1964-12-29,1964-12-29,1,1964
  89. 267,Whitman County,Flood,1964-12-29,1964-12-29,1,1964
  90. 268,Yakima County,Flood,1964-12-29,1964-12-29,1,1964
  91. 269,Ada County,Flood,1964-12-31,1964-12-31,1,1964
  92. 270,Bannock County,Flood,1964-12-31,1964-12-31,1,1964
  93. 271,Benewah County,Flood,1964-12-31,1964-12-31,1,1964
  94. 272,Blaine County,Flood,1964-12-31,1964-12-31,1,1964
  95. 273,Boise County,Flood,1964-12-31,1964-12-31,1,1964
  96. 274,Bonneville County,Flood,1964-12-31,1964-12-31,1,1964
  97. 275,Butte County,Flood,1964-12-31,1964-12-31,1,1964
  98. 276,Camas County,Flood,1964-12-31,1964-12-31,1,1964
  99. 277,Caribou County,Flood,1964-12-31,1964-12-31,1,1964
  100. 278,Cassia County,Flood,1964-12-31,1964-12-31,1,1964
  101. 279,Clearwater County,Flood,1964-12-31,1964-12-31,1,1964
tktrz96b

tktrz96b1#

你可以打电话 size 在…上 groupby 去拿计数。

  1. # Number of Disasters each year.
  2. df.groupby('Start_year').size()
  3. Start_year
  4. 1959 1
  5. 1964 99
  6. dtype: int64
  7. # Number of each disasters for each year.
  8. df.groupby(['Start_year', 'Disaster_Type']).size()
  9. Start_year Disaster_Type
  10. 1959 Flood 1
  11. 1964 Flood 99
  12. dtype: int64
展开查看全部

相关问题