我一直在努力解决两个数据框X和Y之间的连接X是一系列行业,以及他们的子行业和在那里花费的美元,Y是与代码匹配的相同行业和子行业的指数:
IND<- c("Ag", "Ag", "Total Ag", "Min", "Min", "Min", "Total Min")
SubIND<- c("agriculture", "aquaculture", "Total", "gold", "copper", "zinc", "Total")
Dollars<-sample(1:100,7)
INDcode<-c("A","B","C","D","E","G","H","M","R","Y","Z")
INDi<-c("Ag","Bar","Car","Don","Ec","Gl","Hu","Min","Run","Yt","Zal")
X <- data.frame(IND,SubIND,Dollars)
Y <- data.frame(INDi,INDcode)
join<-left_join(X,Y,by=join_by(IND==INDi))
IND SubIND Dollars INDcode
1 Ag agriculture 4 A
2 Ag aquaculture 63 A
3 Total Ag Total 35 <NA>
4 Min gold 68 M
5 Min copper 14 M
6 Min zinc 80 M
7 Total Min Total 48 <NA>
“Total”在整个dataframe中弹出,我想知道是否有一种方法可以让我加入,以便例如。“Min”和“Total Min”均以INDcode“M”结束
我的df有足够的这些,我实际上可以做它的手,或做一个总和为每一个代码,并取代总行完全,但想知道是否有人有任何想法,如何做得更好?
我一直在看fuzzyjoin包,但不能想出如何使它为这个任务工作!
谢谢!
1条答案
按热度按时间70gysomp1#
可以执行
fuzzy
连接:创建于2023 - 06 - 13带有reprex v2.0.2