R语言 如何合并基于多列的两个数据框?

jchrr9hc  于 9个月前  发布在  其他
关注(0)|答案(1)|浏览(119)

我在R中遇到了一个基于共享染色体、开始位置和结束位置列的从一个 Dataframe (df2)到另一个 Dataframe (df1)的基因符号Map问题。df2中的startpos和endpos值落在df1中的相应区间内。以下是这两个 Dataframe 的结构。

第一个定义:

structure(list(chr = c("1", "1", "1", "2", "2", "2", "2", "2", 
"3", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "6", "6", "6", "6", "6", "6", 
"6", "6", "7", "7", "7", "7", "7", "8", "8", "8", "8", "9", "9", 
"9", "9", "10", "10", "10", "11", "11", "11", "11", "11", "12", 
"12", "12", "12", "13", "13", "13", "13", "14", "14", "14", "14", 
"15", "16", "16", "16", "16", "17", "17", "17", "17", "17", "18", 
"18", "18", "18", "19", "19", "20", "20", "20", "21", "22", "X", 
"X"), startpos = c(3763769L, 30204151L, 145574212L, 41404L, 79025902L, 
84425655L, 97207752L, 195771938L, 319825L, 53724022L, 81670925L, 
84760199L, 130389220L, 167473864L, 4166887L, 9755086L, 36316146L, 
51848345L, 181522885L, 2788095L, 21585311L, 29848748L, 50371143L, 
72115891L, 94628989L, 107861719L, 142773060L, 167755050L, 549364L, 
8054180L, 36024843L, 44302628L, 63211948L, 93358143L, 106544755L, 
122454050L, 2712235L, 63876731L, 77122341L, 116695594L, 122013344L, 
219366L, 4787599L, 46635389L, 116942766L, 407227L, 61665918L, 
68540505L, 131604834L, 972645L, 42785641L, 58400552L, 4367675L, 
26537294L, 54591798L, 69295669L, 100356152L, 140964L, 38670828L, 
92531096L, 123835317L, 23009501L, 58528741L, 67228207L, 89361193L, 
20002158L, 42528760L, 85298658L, 106377432L, 19964897L, 1586202L, 
46618297L, 64982005L, 71230496L, 156366L, 27079757L, 29571810L, 
34959645L, 55315183L, 196829L, 20979714L, 42004300L, 67512592L, 
7117415L, 29606361L, 96321L, 31760029L, 46583816L, 14163568L, 
17424460L, 312451L, 155774775L), endpos = c(29516595L, 119471365L, 
248917151L, 75517955L, 80604356L, 89027732L, 191836667L, 239800120L, 
46460352L, 77635071L, 81670925L, 126852836L, 164163609L, 193626229L, 
5640434L, 32409374L, 48832879L, 177353681L, 186709835L, 16689924L, 
25911075L, 43609241L, 68226894L, 91476629L, 103201499L, 137946019L, 
163475701L, 175509230L, 8015924L, 31288236L, 41282856L, 56607197L, 
90312920L, 106234115L, 119269213L, 167948331L, 57200843L, 72629090L, 
113084740L, 118192084L, 159138060L, 3950145L, 42318418L, 112643523L, 
140300545L, 38615782L, 61666267L, 127344499L, 133402909L, 38378075L, 
54527809L, 131956064L, 23404921L, 50416083L, 63408895L, 96373497L, 
134381883L, 33426473L, 89523741L, 119857421L, 130036761L, 52820957L, 
61414933L, 85795917L, 110503806L, 39399442L, 81916397L, 100984697L, 
106874951L, 101828980L, 27483556L, 58920764L, 71203799L, 89562821L, 
21210841L, 29565997L, 34636381L, 51633681L, 81715184L, 13884567L, 
36653268L, 64232128L, 77436580L, 19679350L, 58478128L, 25320386L, 
46048325L, 64219694L, 42904255L, 49657199L, 2720458L, 155774775L
)), class = "data.frame", row.names = c(NA, -92L))

字符串

第二代数码相框:

structure(list(hgnc_symbol = c("ERBB2", "PAK1"), chr = c("17", 
"11"), startpos = c(39687914L, 77322017L), endpos = c(39730426L, 
77474635L)), row.names = c(NA, -2L), class = "data.frame")


我已经尝试了merge函数,但它返回了零行。

merge(df1, df2, by = c('chr', 'startpos', 'endpos'))


我想知道是否有其他方法可以实现这种Map。
谢谢

vu8f3i0k

vu8f3i0k1#

我尝试的是:

library(data.table)
setDT(df1)
setDT(df2)

# one way of merge
merged_df1 <- df1[df2, on = .(chr, startpos <= endpos, endpos >= startpos)]
merged_df1

# other way of merge
merged_df2 <- df2[df1, on = .(chr, startpos <= endpos, endpos >= startpos)]
merged_df2

字符串
您可以根据需要更改条件。
请让我知道如果这对你有用...

相关问题