R语言 要对一个变量求和,同时折叠另一个变量

cu6pst1q  于 2022-12-06  发布在  其他
关注(0)|答案(1)|浏览(120)

以下是我的数据集:https://wiki.csc.calpoly.edu/datasets/attachment/wiki/HighwayAccidents/ACCIDENT2007-FullDataSet.csv

我整理了2007年美国和各省(1:56)的所有车祸数据,并有一个包含9个变量的大型csv文件,如州、车辆、行人、人员、酒后驾车者、死亡人数、日期和时间。CSV将每个事故列为单独的行。州是用数字标识的。我想对每个州的各个列求和,而不对州求和。
我希望有这样的结果:

State        Drunk_Dr
1               345
2              1023

State       Fatalities   Drunk_Dr
34              123        134
35               56         64

等1:56

gab6jxml

gab6jxml1#

library(data.table)

file.in <- "path/to/your/file.csv"
DT.accidents <- fread(file.in)

## Have a look at the different DRUNK_DR values
DT.accidents[, table(DRUNK_DR)]
## Nine?? Really?  

DT.accidents[DRUNK_DR == 9]

## Anyway, to sum up by state and drunk drivers, assuming one row of data is one accident, you can simply use: 

DT.accidents[, .N, by=list(STATE, DRUNK_DR)]

## If you want to ignore cases with zero drunk drivers, filter those out
DT.drunks <- DT.accidents[DRUNK_DR > 0, .N, by=list(STATE, DRUNK_DR)]

## You can reshape it too, if you'd like

library(reshape2)
DT.drunks <- as.data.table(dcast(DT.drunks, STATE ~ DRUNK_DR, value="N"))

添加状态名称

State Names, accorindg to 
ftp://ftp.nhtsa.dot.gov/FARS/FARS-DOC/USERGUIDE-2007.pdf

## start with the built in variable 'state.name' (no "s")
state_names <- state.name[1:50]
## Add in territories
state_names <- sort(c(state_names, "District of Columbia", "Puerto Rico", "Virgin Islands"))
## Create index numbers that match what is shown in the file
state_number <- setdiff(1:56, c(3, 7, 14))
## Create a data.table for joining
DT.states <- data.table(state_number=state_number, state_names=state_names)

## Join in the info
setkey(DT.states, "state_number")
setkey(DT.accidents, "STATE")
DT.accidents[DT.states, STATE_NAMES := state_names]

## Now you can reshape, but include the names
DT.drunks <- DT.accidents[DRUNK_DR > 0, .N, by=list(STATE, STATE_NAMES, DRUNK_DR)]

## You can reshape it too, if you'd like
DT.drunks <- as.data.table(dcast(DT.drunks, STATE + STATE_NAMES ~ DRUNK_DR, value="N"))

现在......至于那起九人酒后驾车的事故。

DT.accidents[DRUNK_DR == 9]

谷歌搜索:“蒙大拿州2007年5月19日交通事故”第一个结果是http://www.city-data.com/accidents/acc-Browning-Montana.html
它有这样一条信息
布朗宁2007年致命车祸和道路交通事故(沉船)名单:2007年5月19日下午05:55,美国2号公路,Sr-464,纬度:48.555692,隆:-113.010247,车辆:1、死亡事故:1、酒后驾车:数据不一致

相关问题