R语言 基于特定列中的值设置行的子集

hfsqlsce  于 2023-07-31  发布在  其他
关注(0)|答案(4)|浏览(117)

我想检索Distance列的值介于-50,000和50,000之间的行。

library(data.table)
all.correlated <- setDT(all.correlated)[, .SD[abs(.N) >= 50000], by = Distance]

字符串
回溯:

Empty data.table (0 rows and 7 cols): Distance,Probe,GeneID,Symbol,Sides,Raw.p...


输入:

all.correlated <- structure(list(Probe = c("cg24800175", "cg08036309", "cg21411366", 
"cg25449950", "cg02155398", "cg03714619"), GeneID = c("ENSG00000179115", 
"ENSG00000002549", "ENSG00000002549", "ENSG00000002549", "ENSG00000171132", 
"ENSG00000171132"), Symbol = c("FARSA", "LAP3", "LAP3", "LAP3", 
"PRKCE", "PRKCE"), Distance = c(-3007, 4041822, -7187580, -7187578, 
717992, 718037), Sides = c("L10", "R10", "L9", "L9", "R1", "R1"
), Raw.p = c(3.89552514236314e-08, 5.19302181010518e-08, 5.19302181010518e-08, 
5.19302181010518e-08, 1.27186058587196e-07, 1.27186058587196e-07
), FDR = c(3.79438408118467e-06, 3.79438408118467e-06, 3.79438408118467e-06, 
3.79438408118467e-06, 3.79438408118467e-06, 3.79438408118467e-06
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x2afcd1808260>)

i5desfxk

i5desfxk1#

tidyverse方法:

filter(all.correlated, between(Distance, -50000, 50000))

字符串
或碱基R:

all.correlated[between(all.correlated$Distance, -50000, 50000), ]

gab6jxml

gab6jxml2#

滤波在data.tablei插槽中完成(参见[?.data.table``),即:

all.correlated[between(Distance, -50000, 50000)]

字符串
你的代码有几个问题:

all.correlated[, .SD[abs(.N) >= 50000], by = Distance]

  1. .N:返回每组的行数:
all.correlated[, .N, by = Distance]
#    Distance     N
#       <num> <int>
# 1:    -3007     1
# 2:  4041822     1
# 3: -7187580     1
# 4: -7187578     1
# 5:   717992     1
# 6:   718037     1


因此,您基本上要求返回所有大于50000的 * 绝对计数 *

  • .SD:来自帮助文件?.SD
  • “.SD”是一个“data.table”,包含每个组的“x”的 Data的 S 子集,不包括“by”(或“keyby”)中使用的任何列。

因此,您请求每个组的子集,然后再次进行行绑定,比较all.correlated[, .SD, by = Distance]all_correlated,您将看到所有更改的是顺序,组列现在位于第一个位置。
1.总的来说,你的代码可以翻译成:
1.按Distance计数观测值的数量
1.返回观测(绝对)数大于50000的所有行

xcitsw88

xcitsw883#

我很难重现你的数据,但你可以使用dplyr包中的filter()。所以呢
your_dataframe <- your_dataframe %>% filter(between(Distance,-50000,50000))

bprjcwpo

bprjcwpo4#

使用[

library(data.table)

all.correlated[Distance >= -50000 & Distance <= 50000,]
        Probe          GeneID Symbol Distance Sides        Raw.p          FDR
1: cg24800175 ENSG00000179115  FARSA    -3007   L10 3.895525e-08 3.794384e-06

字符串
subset

library(data.table)

subset(all.correlated, Distance >= -50000 & Distance <= 50000)
        Probe          GeneID Symbol Distance Sides        Raw.p          FDR
1: cg24800175 ENSG00000179115  FARSA    -3007   L10 3.895525e-08 3.794384e-06

相关问题