R语言 根据姓氏和党派关系预测种族

fjaof16o  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(116)

我试图使用wru包来预测种族的基础上,姓氏和位置的样本个人在美国使用他们的姓名和地址。predict_race函数的文档可以在here中找到。
但是,我在尝试运行该函数时遇到错误,无法成功执行。这个函数对我的分析非常有用,所以我希望有人能帮助我了解软件包是否有缺陷,或者我是否做错了什么。
这是我使用的代码:

output <- wru::predict_race(
  voter.file = df %>%
    mutate(county = sprintf('%03d', county),
           tract = sprintf('%06d', tract)) %>%
    filter(!is.na(tract)),
  census.geo = "tract",
  census.key = census_api_key, # obtained from: https://api.census.gov/data/key_signup.html
  party = "party_code")

请注意,文档中提到了应传递给函数的地理指示符的格式:

  • 每个人居住州的两个字符缩写(例如,“nj”表示新泽西州)
  • 县是三个字符(例如,“031”而不是“31”),地区是六个字符

这就是为什么我格式化县和区,如上面的代码所示。我在下面分享了一段数据,可以用来复制我得到的错误。
这是我得到的错误:

County 1 of 1: 071
Proceeding with last name predictions...
ℹ Downloading "wru-data-census_last_c.rds"...
  |=======================================================================================| 100%
ℹ Downloading "wru-data-first_c.rds"...
  |=======================================================================================| 100%
ℹ Downloading "wru-data-last_c.rds"...
  |=======================================================================================| 100%
ℹ Downloading "wru-data-mid_c.rds"...
  |=======================================================================================| 100%
Proceeding with Census geographic data at tract level...
Using Census geographic data from provided census.data object...
State 1 of 1: OR
Error in census_helper_new(key = census.key, voter.file = voter.file,  : 
  The following locations in the voter.file are not available in the census data (listed as state-county-tract):
OR-071-030303

我的想法是,它是函数不喜欢的某个县+地区的组合,所以也许我可以将数据分成几个小组(比如n=10),并尝试将每个小组迭代传递给函数,将任何成功的输出保存到csv中。然后,我可以存储并重新访问失败的组,将它们分成越来越小的大小,直到希望至少有一些名称得到预测。我尝试了这一点,但得到完全相同的错误和循环中断。
有关reprex的数据,请参见下文

df <- tibble::tribble(
  ~surname, ~state, ~county, ~tract, ~party_code,
  "ALEXANDER",   "OR",     71L, 30101L,       "NAV",
  "AQUIPEL",   "OR",     71L, 30101L,       "NAV",
  "BABBITT",   "OR",     71L, 30101L,       "NAV",
  "BACKUS",   "OR",     71L, 30101L,       "DEM",
  "BACKUS",   "OR",     71L, 30101L,       "DEM",
  "BARKER",   "OR",     71L, 30101L,       "DEM",
  "BARTMAN",   "OR",     71L, 30303L,       "REP",
  "BARTMAN",   "OR",     71L, 30303L,       "REP",
  "BASS",   "OR",     71L, 30101L,       "DEM",
  "BATTERMAN",   "OR",     71L, 30303L,       "NAV",
  "BATTERMAN",   "OR",     71L, 30303L,       "NAV",
  "BEARDEN",   "OR",     71L, 30101L,       "NAV",
  "BELANDER",   "OR",     71L, 30101L,       "NAV",
  "BELL",   "OR",     71L, 30303L,       "NAV",
  "BEM",   "OR",     71L, 30101L,       "NAV",
  "BENNETT",   "OR",     71L, 30102L,       "NAV",
  "BERG",   "OR",     71L, 30101L,       "NAV",
  "BERGER",   "OR",     71L, 30303L,       "NAV",
  "BESEAU",   "OR",     71L, 30303L,       "NAV",
  "BIERER",   "OR",     71L, 30101L,       "IND",
  "BILLETTE",   "OR",     71L, 30303L,       "IND",
  "BISCHOFF",   "OR",     71L, 30101L,       "NAV",
  "BLATT",   "OR",     71L, 30101L,       "NAV",
  "BOCHART",   "OR",     71L, 30101L,       "NAV",
  "BOWLIN",   "OR",     71L, 30202L,       "NAV",
  "BURGESS",   "OR",     71L, 30303L,       "NAV",
  "BURNETT",   "OR",     71L, 30101L,       "NAV",
  "BURNETT",   "OR",     71L, 30101L,       "NAV",
  "BYE ODEA",   "OR",     71L, 30101L,       "NAV",
  "BYINGTON",   "OR",     71L, 30101L,       "NAV",
  "CARSLEY",   "OR",     71L, 30102L,       "NAV",
  "CARTWRIGHT",   "OR",     71L, 30101L,       "NAV",
  "CATES",   "OR",     71L, 30101L,       "NAV",
  "CHANDLER",   "OR",     71L, 30101L,       "NAV",
  "CHESHIER",   "OR",     71L, 30102L,       "NAV",
  "CISNEROS",   "OR",     71L, 30303L,       "NAV",
  "COE",   "OR",     71L, 30101L,       "NAV",
  "CORREA",   "OR",     71L, 30303L,       "NAV",
  "COSHOW",   "OR",     71L, 30101L,       "NAV",
  "COURTNEY",   "OR",     71L, 30101L,       "NAV",
  "CROFT",   "OR",     71L, 30101L,       "NAV",
  "CROSSLAND",   "OR",     71L, 30101L,       "NAV",
  "CRUZ",   "OR",     71L, 30102L,       "NAV",
  "CULLENS",   "OR",     71L, 30101L,       "NAV",
  "CURRIER",   "OR",     71L, 30101L,       "NAV",
  "DAHME",   "OR",     71L, 30303L,       "DEM",
  "DAHME",   "OR",     71L, 30303L,       "DEM",
  "DAVIS",   "OR",     71L, 30303L,       "NAV",
  "DAVIS",   "OR",     71L, 30101L,       "NAV",
  "DEHART",   "OR",     71L, 30303L,       "NAV",
  "DENMAN",   "OR",     71L, 30101L,       "NAV",
  "DENNIS",   "OR",     71L, 30101L,       "NAV",
  "DILLESHAW",   "OR",     71L, 30101L,       "NAV",
  "DOOTSON",   "OR",     71L, 30101L,       "NAV",
  "EIDE",   "OR",     71L, 30101L,       "NAV",
  "EILERS",   "OR",     71L, 30101L,       "NAV",
  "EKREN",   "OR",     71L, 30101L,       "DEM",
  "ELLIS",   "OR",     71L, 30101L,       "NAV",
  "ERICKSON",   "OR",     71L, 30101L,       "NAV",
  "ESKELSEN",   "OR",     71L, 30101L,       "NAV",
  "EVANS",   "OR",     71L, 30303L,       "NAV",
  "FETTIG",   "OR",     71L, 30102L,       "NAV",
  "FINDLEY",   "OR",     71L, 30102L,       "NAV",
  "FLANAGAN",   "OR",     71L, 30101L,       "DEM",
  "FRAYCHINEAUD",   "OR",     71L, 30102L,       "NAV",
  "FREY",   "OR",     71L, 30101L,       "NAV"
)
mklgxw1f

mklgxw1f1#

因此,如果我将行census.geo = "tract"更改为census.geo = "county",代码运行正常!不是一个直接的答案,因为包声称我可以得到预测在道的水平,但足够好!

相关问题