在R中的函数中引用数据.表列

9bfwbjaz  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(109)

在SAP中,我上传了用两个已知字段标识的事务处理的文本文件:“文档编号”和“位置”。文件中的多个事务处理可以有一个唯一ID。共享同一唯一ID的事务处理有一个文档编号,并有连续的位置。下面是文本文件的示例。
| uniqueID|文件编号|位置|
| --|--|--|
| 200 | 1 | 1 |
| 200 | 1 | 2 |
| 300 | 2 | 1 |
| 300 | 2 | 2 |
| 400 | 3 | 1 |
| 400 | 3 | 2 |
我设法写了一个代码,以产生这个文档号和位置序列的一个样本数据(tblSAP).下面的代码工作良好.然而,我需要有这个代码作为一个函数,因为我有许多需要转换成SAP文本文件处理表.因此,我需要将此代码转换成一个函数.

library(data.table)
# sample data table
tblSAP <- data.table(
  UniqueID = c(200,200,300,300,400,400)
  
)

# code to generate Document No and Position fields

## Sort entire tblSAP table
tblSAP <- setorder(tblSAP,cols=UniqueID)

## create a temp table of UniqueID 
dfUniqueID <- tblSAP[,.(UniqueID)]

## create "Document No" and "Position" count
DT1 <- dfUniqueID[,.N, by=.(UniqueID, "Document No"=rleid(UniqueID))]

## extend UniqueIDs with Positions
DT2 <- dfUniqueID[,seq_len(.N), by = UniqueID] |> 
  setnames(old = "V1",new = "Position")

## Join UniqueID, Document Number, and Positions in one table
DT3 <- DT1[DT2,on = .(UniqueID)][,-c("N")]

## Bind SAP Upload table with the columns of Document No and Position from DT3 table
tblSAP <- cbind(tblSAP,DT3[,-c("UniqueID")])

字符串
当我将代码转换为函数时,我只将UniqueID替换为函数中的输入参数,如图所示。但似乎在函数中引用tblSAP列(UniqueID)时出错。我收到错误“Error in setorderv(x,order,na.last):some columns are not in the data.table:IdFld”。
下面是我写的函数:

# function to generate Document No and Position fields
SAPDocPos <- function(tblSAP,IdFld){
  
  ## Sort entire tblSAP table
  tblSAP <- setorder(tblSAP,cols=IdFld)
  
  ## create a temp table of IdFld 
  dfUniqueID <- tblSAP[,.(IdFld)]
  
  ## create "Document No" and "Position" count
  DT1 <- dfUniqueID[,.N, by=.(IdFld, "Document No"=rleid(IdFld))]
  
  ## extend IdFlds with Positions
  DT2 <- dfUniqueID[,seq_len(.N), by = IdFld] %>% 
    setnames(old = "V1",new = "Position")
  
  ## Join IdFlds, Document Number, and Positions in one table
  DT3 <- DT1[DT2,on = .(IdFld)][,-c("N")]
  
  ## Bind SAP Upload table with the columns of Document No and Position from DT3 table
  tblSAP <- cbind(tblSAP,DT3[,-c("UniqueID")])
  return(tblSAP)
}

SAPDocPos(tblSAP,UniqueID)


非常感谢协助解决这一案件。

sshcrbum

sshcrbum1#

几处改动:

  • 为了使用 * 符号 * UniqueID,我们需要使用deparse(substitute(.))将其转换为字符;一般来说,我倾向于建议不要使用非标准求值(正如您在这里尝试做的那样),特别是当它添加的唯一值是省略引号时.
  • setorder需要非标准求值,但我们可以使用setorderv,它需要一个字符向量
  • 我将使用.SDcols=来按列划分子集,而不是尝试.(UniqueID)
  • 对于DT1by=也可以是一个字符向量,但这需要预先定义Document No,因此需要稍微更新
  • 也许只是一个细微差别,我从%>%转移到|>以减少依赖性;如果您已经在其他地方依赖dplyrmagrittr,则不需要
  • 这只是一种偏好,但文字return(.)语句仅在需要函数返回代码中较早的值时才需要,例如在if块中;由于函数总是返回最后一个表达式的值,因此我们可以省略它。(显式调用return(tblSAP)可以,但会向堆栈添加不必要的函数调用。)
SAPDocPos <- function(tblSAP,IdFld){
  IdFld <- deparse(substitute(IdFld))
  
  ## Sort entire tblSAP table
  tblSAP <- setorderv(tblSAP,cols=IdFld)
  
  ## create a temp table of IdFld 
  dfUniqueID <- tblSAP[,.SD, .SDcols = IdFld]
  
  ## create "Document No" and "Position" count
  DT1 <- dfUniqueID[, `Document No` := rleid(get(IdFld))][, .N, by = c(IdFld, "Document No")]
  
  ## extend IdFlds with Positions
  DT2 <- dfUniqueID[, seq_len(.N), by = IdFld] |>
    setnames(old = "V1", new = "Position")
  
  ## Join IdFlds, Document Number, and Positions in one table
  DT3 <- DT1[DT2, on = c(IdFld)][,-c("N")]
  
  ## Bind SAP Upload table with the columns of Document No and Position from DT3 table
  tblSAP <- cbind(tblSAP, DT3[, -c(IdFld), with=FALSE])

  tblSAP
}

SAPDocPos(tblSAP, UniqueID)
#    UniqueID Document No Position
#       <num>       <int>    <int>
# 1:      200           1        1
# 2:      200           1        2
# 3:      300           2        1
# 4:      300           2        2
# 5:      400           3        1
# 6:      400           3        2

字符串
如果你想在data.table的引用(就地)语义上“全部”,你可以添加一些set调用(更直接的data.table值设置)来将列添加到原始框架中。注意:这与data.table是一致的,但与base-R的写时复制语义不一致。

SAPDocPos <- function(tblSAP,IdFld){
  IdFld <- deparse(substitute(IdFld))
  
  ## Sort entire tblSAP table
  tblSAP <- setorderv(tblSAP,cols=IdFld)
  
  ## create a temp table of IdFld 
  dfUniqueID <- tblSAP[,.SD, .SDcols = IdFld]
  
  ## create "Document No" and "Position" count
  DT1 <- dfUniqueID[, `Document No` := rleid(get(IdFld))][, .N, by = c(IdFld, "Document No")]
  
  ## extend IdFlds with Positions
  DT2 <- dfUniqueID[, seq_len(.N), by = IdFld] |>
    setnames(old = "V1", new = "Position")
  
  ## Join IdFlds, Document Number, and Positions in one table
  DT3 <- DT1[DT2, on = c(IdFld)][,-c("N")]
  
  set(tblSAP, i = NULL, j = "Document No", value = DT3[["Document No"]])
  set(tblSAP, i = NULL, j = "Position", value = DT3[["Position"]])

  tblSAP
}

tblSAP
#    UniqueID
#       <num>
# 1:      200
# 2:      200
# 3:      300
# 4:      300
# 5:      400
# 6:      400
SAPDocPos(tblSAP, UniqueID) # not re-assigning over tblSAP
tblSAP
#    UniqueID Document No Position
#       <num>       <int>    <int>
# 1:      200           1        1
# 2:      200           1        2
# 3:      300           2        1
# 4:      300           2        2
# 5:      400           3        1
# 6:      400           3        2

相关问题