我在R中遇到了一个基于共享染色体、开始位置和结束位置列的从一个 Dataframe (df2)到另一个 Dataframe (df1)的基因符号Map问题。df2
中的startpos和endpos值落在df1
中的相应区间内。以下是这两个 Dataframe 的结构。
第一个定义:
structure(list(chr = c("1", "1", "1", "2", "2", "2", "2", "2",
"3", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "5", "5",
"5", "5", "5", "5", "5", "5", "5", "6", "6", "6", "6", "6", "6",
"6", "6", "7", "7", "7", "7", "7", "8", "8", "8", "8", "9", "9",
"9", "9", "10", "10", "10", "11", "11", "11", "11", "11", "12",
"12", "12", "12", "13", "13", "13", "13", "14", "14", "14", "14",
"15", "16", "16", "16", "16", "17", "17", "17", "17", "17", "18",
"18", "18", "18", "19", "19", "20", "20", "20", "21", "22", "X",
"X"), startpos = c(3763769L, 30204151L, 145574212L, 41404L, 79025902L,
84425655L, 97207752L, 195771938L, 319825L, 53724022L, 81670925L,
84760199L, 130389220L, 167473864L, 4166887L, 9755086L, 36316146L,
51848345L, 181522885L, 2788095L, 21585311L, 29848748L, 50371143L,
72115891L, 94628989L, 107861719L, 142773060L, 167755050L, 549364L,
8054180L, 36024843L, 44302628L, 63211948L, 93358143L, 106544755L,
122454050L, 2712235L, 63876731L, 77122341L, 116695594L, 122013344L,
219366L, 4787599L, 46635389L, 116942766L, 407227L, 61665918L,
68540505L, 131604834L, 972645L, 42785641L, 58400552L, 4367675L,
26537294L, 54591798L, 69295669L, 100356152L, 140964L, 38670828L,
92531096L, 123835317L, 23009501L, 58528741L, 67228207L, 89361193L,
20002158L, 42528760L, 85298658L, 106377432L, 19964897L, 1586202L,
46618297L, 64982005L, 71230496L, 156366L, 27079757L, 29571810L,
34959645L, 55315183L, 196829L, 20979714L, 42004300L, 67512592L,
7117415L, 29606361L, 96321L, 31760029L, 46583816L, 14163568L,
17424460L, 312451L, 155774775L), endpos = c(29516595L, 119471365L,
248917151L, 75517955L, 80604356L, 89027732L, 191836667L, 239800120L,
46460352L, 77635071L, 81670925L, 126852836L, 164163609L, 193626229L,
5640434L, 32409374L, 48832879L, 177353681L, 186709835L, 16689924L,
25911075L, 43609241L, 68226894L, 91476629L, 103201499L, 137946019L,
163475701L, 175509230L, 8015924L, 31288236L, 41282856L, 56607197L,
90312920L, 106234115L, 119269213L, 167948331L, 57200843L, 72629090L,
113084740L, 118192084L, 159138060L, 3950145L, 42318418L, 112643523L,
140300545L, 38615782L, 61666267L, 127344499L, 133402909L, 38378075L,
54527809L, 131956064L, 23404921L, 50416083L, 63408895L, 96373497L,
134381883L, 33426473L, 89523741L, 119857421L, 130036761L, 52820957L,
61414933L, 85795917L, 110503806L, 39399442L, 81916397L, 100984697L,
106874951L, 101828980L, 27483556L, 58920764L, 71203799L, 89562821L,
21210841L, 29565997L, 34636381L, 51633681L, 81715184L, 13884567L,
36653268L, 64232128L, 77436580L, 19679350L, 58478128L, 25320386L,
46048325L, 64219694L, 42904255L, 49657199L, 2720458L, 155774775L
)), class = "data.frame", row.names = c(NA, -92L))
字符串
第二代数码相框:
structure(list(hgnc_symbol = c("ERBB2", "PAK1"), chr = c("17",
"11"), startpos = c(39687914L, 77322017L), endpos = c(39730426L,
77474635L)), row.names = c(NA, -2L), class = "data.frame")
型
我已经尝试了merge
函数,但它返回了零行。
merge(df1, df2, by = c('chr', 'startpos', 'endpos'))
型
我想知道是否有其他方法可以实现这种Map。
谢谢
1条答案
按热度按时间vu8f3i0k1#
我尝试的是:
字符串
您可以根据需要更改条件。
请让我知道如果这对你有用...