将PostgreSQL整数[]数组转换为R中的数值列表

7eumitmz  于 2023-03-05  发布在  PostgreSQL
关注(0)|答案(1)|浏览(161)

我将PostgreSQL查询的结果作为data.frame存储在R中。其中一个“列”是integer[]数组类型。在R中,这是由RPostgreSQL包作为字符串导入的。
如何在data.frame中将字符串类型转换为数字列表类型的列(或单独的数字列)?

连接并获取数据

require(RPostgreSQL)
drv = dbDriver("PostgreSQL")
con = dbConnect(drv, host = ..., post =..., dbname =..., user=..., password=...)
df = dbGetQuery(con, query_string)
dbDisconnect(con)

最小工作示例

library(dplyr)
# randomized arrays of 10 numbers
set.seed(10)
df = data.frame(id = c(1:10)) %>%
  mutate(arrcol = lapply(id, function(X) sample(1:99, 10, replace=T)),
         arrcol = gsub("c(","{{",arrcol,fixed=T),
         arrcol = gsub(")","}}",arrcol,fixed=T))

移除支架

df$arrcol = gsub(fixed=T, "{", "", df$arrcol)
df$arrcol = gsub(fixed=T, "}", "", df$arrcol)

转换为数字列表

# Attempt 1: 
df$arrcol = as.numeric(df$arrcol)
# Error: (list) object cannot be coerced to type 'double'

# Attempt 2:
df$arrcol = lapply(df$arrcol, 
                   function(x) strsplit(x, ",", fixed=T))
# no error, but now the data appears to be stored as a list of character lists: 
# arrcol[1]: list(c("1", "2", "3", "4", "5",...

# Attempt 3:
df$arrcol = lapply(df$arrcol, 
                   function(x) as.numeric(
                     unlist(
                       strsplit(x, ",", fixed=T))
                   )
)
# this one seems to work
vzgqcmou

vzgqcmou1#

我自己的最佳答案:

df$numcol = gsub(fixed=T, "{", "", df$arrcol)
df$numcol = gsub(fixed=T, "}", "", df$numcol)

df$numcol <- lapply(df$numcol, 
                    function(x) as.numeric(
                      unlist(
                        strsplit(x = x, split = ",", fixed=T)
                      )
                    )
)

[更新为一次性执行所有步骤]

df$numcol <- lapply(df$arrcol, 
                    function(x) as.numeric(
                      unlist(
                        strsplit(
                          x = gsub("[{]|[}]", "", x), 
                          split = ",", fixed=T))))

或者,等同于:

df$numcol <- lapply(df$arrcol, 
                    function(x) as.numeric(
                      strsplit(
                          x = gsub("[{]|[}]", "", x), 
                          split = ",", fixed=T)[[1]]
                    )
)

或者,(只要每个数组的长度相同)你可以使用这个技巧(Splitting a dataframe string column into multiple different columns)将字符串解析成单独的列,注意read.table足够聪明,可以识别每个新变量为整数。

newdf = read.table(text = df$arrcol, header = F, sep = ",")

此外,您可以轻松地将这些列作为它们自己的列附加到原始data.frame上:

df = cbind(df, newdf)

或者,知道将产生多少新列:

df[,3:101] <- read.table(text = gsub("[{]|[}]", "", df$arrcol), 
                         header = F, sep = ",")

相关问题