R语言 如何在另一个框架中使用变量+值查找替换单元格值?

kqqjbcuj  于 2023-10-13  发布在  其他
关注(0)|答案(2)|浏览(103)

我正在处理一个大型调查数据框架,其中每个问题的答案都是一个数字。对于数字调查问题,如年龄,数字就是数字。但是对于多项选择题,数字是一个代码,与保存在单独的查找框中的文本相对应。

如何将每个变量的所有数字替换为查找表中对应的标签?

示例数据:

df_numeric <- 
  tibble::tribble(
    ~gender, ~age, ~city, ~yearly_income, ~fav_colour,  ~over_100_more_vars,
          1,   22,     1,          55000,           1,                "...",
          2,   31,     2,         122000,           2,                "...",
          1,   41,     1,         101000,           2,                "...",
          2,   19,     5,          76000,           1,                "...",
          1,   64,     7,          32000,           6,                "...")
    
df_lookup <- 
  tibble::tribble(
           ~variable, ~number,        ~label,
            "gender",       1,        "Male",
            "gender",       2,      "Female",
              "city",       1,    "New York", 
              "city",       2,      "Sydney",
              "city",       5,      "London",
              "city",       7,       "Paris",
        "fav_colour",       1,         "Red",
        "fav_colour",       2,        "Blue",
        "fav_colour",       6,      "Purple",
   "one_of_100_more",       1,       "Label",
   "one_of_100_more",       2,       "Label",
   "two_of_100_more",       1,       "Label",
               "etc",       1,         "etc")

我理想中想做的是:检查df_numeric中的变量名,在df_lookup中查找该变量,然后对于该特定变量,将每个'number'替换为其对应的'label',然后移动到下一个变量,将其数字替换为其标签,移动到下一个.它看起来应该像这样

df_output <- 
  tibble::tribble(
    ~gender, ~age,      ~city, ~yearly_income, ~fav_colour,  ~over_100_more_vars,
    "Male",   22,  "New York",          55000,       "Red",                "...",
  "Female",   31,    "Sydney",         122000,      "Blue",                "...",
    "Male",   41,  "New York",         101000,      "Blue",                "...",
  "Female",   19,    "London",          76000,       "Red",                "...",
    "Male",   64,     "Paris",          32000,    "Purple",                "...")

重要警告:

  • 有数百个变量,所以在代码中写出每个变量的名称是不可行的(例如,this answer)。
  • 我们只需要替换字符变量,如性别,城市等。不需要替换数字变量的值,如年龄和收入,因为这些已经是正确的格式。这些数值变量已经是正确的格式了,但它们不在df_lookup中。
bejyjqdl

bejyjqdl1#

新版本我会提供这个tidyverse解决方案(当前版本包含年龄处理):

library(tidyverse) 
df_numeric %>% 
  mutate(across(-yearly_income, as.character)) %>% 
  pivot_longer(-c("yearly_income", "age") ) %>% 
  left_join(mutate(df_lookup, number = as.character(number)), by = c("name" = "variable", "value" = "number")) %>% 
  select(-value) %>% 
  pivot_wider(id_cols = c("yearly_income", "age"), values_from = label, names_from = name)

# A tibble: 5 x 6
  yearly_income age   gender city     fav_colour over_100_more_vars
          <dbl> <chr> <chr>  <chr>    <chr>      <chr>             
1         55000 22    Male   New York Red        <NA>              
2        122000 31    Female Sydney   Blue       <NA>              
3        101000 41    Male   New York Blue       <NA>              
4         76000 19    Female London   Red        <NA>              
5         32000 64    Male   Paris    Purple     <NA>
ut6juiuv

ut6juiuv2#

下面是一个使用match()df_lookup子集化为相关值的基R方法。

cols_to_replace <- intersect(df_lookup$variable, names(df_numeric))
# ^^ In this case:  "gender"     "city"       "fav_colour"

# Iterate over these columns matching the label to the lookup value
df_numeric[cols_to_replace] <- lapply(cols_to_replace, \(col) {
    col_df <- df_lookup[df_lookup$variable == col, ]
    col_df$label[match(df_numeric[[col]], col_df$number)]
})

# A tibble: 5 × 6
#   gender   age city     yearly_income fav_colour over_100_more_vars
#   <chr>  <dbl> <chr>            <dbl> <chr>      <chr>             
# 1 Male      22 New York         55000 Red        ...               
# 2 Female    31 Sydney          122000 Blue       ...               
# 3 Male      41 New York        101000 Blue       ...               
# 4 Female    19 London           76000 Red        ...               
# 5 Male      64 Paris            32000 Purple     ...

如果没有找到匹配项,则返回NA

相关问题