R语言通过将多个框架中的某些列乘以单个列来创建新列

yvt65v4c 于 2023-10-13 发布在其他

关注(0)|答案(3)|浏览(148)

我想生成多个新列，方法是将一个数组中的一些列乘以R中的一个列，然后将新列附加到原始的df中。
我的初始数据格式如下

ID  amount  supplier_1  supplier_2   supplier_3 ... supplier_100
1   10       0               1            0             0
1   15       1               0            0             0
1   20       1               0            0             0
2    5       0               0            0             1
2    8       0               1            0             0
2   10       0               0            0             1

#I have more than 100 suppliers in this df.

我想要的输出是将所有supplier_n列（虚拟变量）乘以金额。

ID  amount  supplier_1  supplier_2   supplier_3 ... supplier_100  
1   10       0               1            0             0                 
1   15       1               0            0             0                 
1   20       1               0            0             0                 
2    5       0               0            0             1                 
2    8       0               1            0             0                 
2   10       0               0            0             1                 

amt*supplier_1   amt*supplier_2  amt*supplier_3 ..... amt*supplier_100   Total_amt 
 0               10               0                      0                 45 
15                0               0                      0                 45
20                0               0                      0                 45
 0                0               0                      5                 23
 0                8               0                      0                 23
 0                0               0                     10                 23

#total_amt is the sum of amount conditional on ID.

我在这里找到了一个类似的例子，并尝试使用function（col）命令执行mutate_all，但没有成功
Multiply all columns in dataframe by single column。
如果有人能提供一些建议，我将不胜感激！

来源：https://stackoverflow.com/questions/77246929/create-new-columns-by-multiplying-some-columns-in-dataframe-by-single-column

3条答案

按热度按时间

uqzxnwby1#

您可以将dplyr与mutate()和across()一起使用，以便对多个列执行相同的操作。例如

dd %>% mutate(across(starts_with("supplier"), ~amount * .x))
#   ID amount supplier_1 supplier_2 supplier_3 supplier_100
# 1  1     10          0         10          0            0
# 2  1     15         15          0          0            0
# 3  1     20         20          0          0            0
# 4  2      5          0          0          0            5
# 5  2      8          0          8          0            0
# 6  2     10          0          0          0           10

要添加总金额，如果您假设只有一个供应商有一个值，并且它是0/1，则只需按ID和

dd %>% 
  mutate(across(starts_with("supplier"), ~amount * .x)) %>% 
  mutate(Total_amt = sum(amount), .by=ID)

使用样本数据进行测试

dd <- read.table(text="
ID  amount  supplier_1  supplier_2   supplier_3  supplier_100
1   10       0               1            0             0
1   15       1               0            0             0
1   20       1               0            0             0
2    5       0               0            0             1
2    8       0               1            0             0
2   10       0               0            0             1", header=T)

赞(0）回复(0）举报 2023-10-13

wd2eg0qa2#

在base R中，可以使用lapply在执行操作的同时创建新列：

ccols <- names(df)[grep("supplier", names(df))]
# [1] "supplier_1"   "supplier_2"   "supplier_3"   "supplier_100"

df[paste0("amt_x_",ccols)] <- lapply(df[ccols], \(x) df$amount * x)

输出量：

#   ID amount supplier_1 supplier_2 supplier_3 supplier_100 amt_x_supplier_1 amt_x_supplier_2 amt_x_supplier_3 amt_x_supplier_100
# 1  1     10          0          1          0            0                0               10                0                  0
# 2  1     15          1          0          0            0               15                0                0                  0
# 3  1     20          1          0          0            0               20                0                0                  0
# 4  2      5          0          0          0            1                0                0                0                  5
# 5  2      8          0          1          0            0                0                8                0                  0
# 6  2     10          0          0          0            1                0                0                0                 10

数据类型：

df <- read.table(text = "ID  amount  supplier_1  supplier_2   supplier_3  supplier_100
1   10       0               1            0             0
1   15       1               0            0             0
1   20       1               0            0             0
2    5       0               0            0             1
2    8       0               1            0             0
2   10       0               0            0             1", h = TRUE)

或者，如果您只想替换所需列中的现有值，只需执行以下操作来覆盖列：

df[ccols] <- lapply(df[ccols], \(x) df$amount * x)

输出量：

#   ID amount supplier_1 supplier_2 supplier_3 supplier_100
# 1  1     10          0         10          0            0
# 2  1     15         15          0          0            0
# 3  1     20         20          0          0            0
# 4  2      5          0          0          0            5
# 5  2      8          0          8          0            0
# 6  2     10          0          0          0           10

赞(0）回复(0）举报 2023-10-13

y53ybaqx3#

你（和其他回答者）可能想回忆一下R是矢量化的，并且使用了循环，这意味着你也可以用一个矢量来复制整个矩阵。这里真的不需要lapply之类的。

scls <- grep('^supplier_\\d+$', names(df))
df[scls] <- df[scls]*df$amount

df
#   ID amount supplier_1 supplier_2 supplier_3 supplier_100
# 1  1     10          0         10          0            0
# 2  1     15         15          0          0            0
# 3  1     20         20          0          0            0
# 4  2      5          0          0          0            5
# 5  2      8          0          8          0            0
# 6  2     10          0          0          0           10

数据：*

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), amount = c(10L, 
15L, 20L, 5L, 8L, 10L), supplier_1 = c(0L, 15L, 20L, 0L, 0L, 
0L), supplier_2 = c(10L, 0L, 0L, 0L, 8L, 0L), supplier_3 = c(0L, 
0L, 0L, 0L, 0L, 0L), supplier_100 = c(0L, 0L, 0L, 5L, 0L, 10L
)), row.names = c(NA, -6L), class = "data.frame")

赞(0）回复(0）举报 2023-10-13

我来回答

R语言通过将多个框架中的某些列乘以单个列来创建新列

3条答案

相关问题

热门标签

最新问答

R语言 通过将多个框架中的某些列乘以单个列来创建新列

3条答案

相关问题

热门标签

最新问答

R语言通过将多个框架中的某些列乘以单个列来创建新列