在两个不同的 Dataframe 中查找交集

juzqafwq  于 2022-09-21  发布在  其他
关注(0)|答案(1)|浏览(227)

我试图找到这两个数据集的交集(共享元素),但是当我运行我的代码时,它不返回交集,它不应该有相同的行,而是返回好像它们都匹配一样。我在文本比较网站上查了一下,两者有相同的元素,但并不完全匹配。以下是我的代码:

import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt

nofilt_0 = ['LINC02295', 'AL596202.1', 'TAS2R5', 'NOG', 'AL161644.1', 'AL163932.1', 'PIGY-DT', 'LINC01193', 'PLA2G4C', 'GLIS3', 'AC079779.1', 'LINC00501', 'NPAS2', 'SIAH3', 'AC118658.1', 'Z98745.2', 'AC007785.3', 'WARS2-IT1', 'TMEM132E', 'GPC2', 'AL132801.1', 'SLC22A17', 'AC092115.2', 'MMP28', 'SORCS3', 'LINC02714', 'AADACL2-AS1', 'AP000345.2', 'AC138819.1', 'AC099066.1', 'NUPR2', 'AC090018.2', 'DSC1', 'GOLGA6L7', 'AL390866.1', 'CCR7', 'AC112254.1', 'SLC22A31', 'AL442003.1', 'ME1', 'AC013270.1', 'SLC22A10', 'AL449214.1', 'AC022023.2', 'AC096719.1', 'DBX1', 'SSTR3', 'LRRN3', 'FGF17', 'CAV3', 'AL117350.1', 'SEMA3A', 'GABRB3', 'AC008686.1', 'CNNM1', 'SV2B', 'AC001226.1', 'LMOD2', 'DACT1', 'AKR1C1', 'AC129507.1', 'SGCG', 'CHRNG', 'AL109936.2', 'AL157936.1', 'EPAS1', 'AC093599.1', 'FAM83F', 'AP000238.1', 'AC117500.1', 'AC022113.1', 'CGN', 'ACTN1-AS1', 'AC004067.1', 'F11', 'AC007786.1', 'ADTRP', 'AL035252.2', 'CAPSL', 'AC018809.1', 'AC022445.1', 'AC069287.3', 'AL391097.1', 'ASB9', 'PRRT1', 'AC104463.2', 'AL118511.3', 'AL590999.1', 'AC090515.6', 'AC019080.5', 'IGLV1-36', 'A1CF', 'MYO1F', 'AL512452.1', 'ASIC2', 'AC008496.2', 'TEAD3', 'TMEM273', 'DMRTC1', 'LAMC3', 'RASGEF1A', 'AC005899.5', 'PNMA8B', 'AC021148.2', 'AL020997.5', 'SHCBP1L', 'AC069029.1', 'AC092068.2', 'PRR5L', 'AL732437.2', 'SLAMF7', 'LCTL', 'BFSP1', 'AC023794.2', 'GJC3', 'RSPH10B', 'CA6', 'ZNF662', 'AL138686.1', 'RND2', 'FAM242C', 'ANXA2', 'SLC22A13', 'FAM181B', 'IL10', 'AL662907.3', 'ASCL2', 'AC004988.1', 'TUSC1', 'AC109460.4', 'DGKB', 'AC009244.2', 'ACVR2B-AS1', 'PKP3', 'NIBAN1', 'NOXO1', 'MSLNL', 'AP001627.1', 'REG4', 'AC092794.1', 'AC097641.2', 'ALOX15', 'AC139491.4', 'ADAMTSL2', 'DDC', 'AL049557.1', 'RNF144A-AS1', 'IFITM10', 'AC009053.3', 'RAB9B', 'RAI2', 'AC103563.9', 'GIMD1', 'LINC01030', 'ADGRA3', 'AC074044.1', 'AL133445.3', 'CD58', 'AC004477.1', 'IL12A-AS1', 'AC089983.1', 'AC024588.1', 'MTTP', 'TRAJ11', 'AL161729.2', 'CASR', 'AADACL2', 'AL133387.1', 'AL807776.1', 'FREM3', 'AL021918.3', 'AL845472.1', 'AC009041.2', 'IGKV2-29', 'AL359764.3']
dub_0 =['DUSP4', 'AC017002.3', 'LINC02804', 'GML', 'GNAO1', 'AC025946.1', 'TGFBR3L', 'EDN3', 'TMEM200A', 'MSC-AS1', 'ZP1', 'MSC', 'AC090515.6', 'LINC00892', 'HAS3', 'PLSCR2', 'SLC44A3', 'AC013460.1', 'CNTNAP1', 'TMPRSS11E', 'LINC01229', 'WDR17', 'TNFRSF18', 'LINC02295', 'EFNA5', 'AC074366.1', 'AL391056.1', 'ADAMTSL4', 'EPGN', 'APOBEC3H', 'IRAIN', 'CXCR6', 'LINC01498', 'WDR86-AS1', 'TSSK6', 'A1CF', 'CPHXL', 'IGLV7-46', 'AC087473.1', 'AC096577.1', 'AL590550.1', 'TP63', 'AC016866.1']

df = pd.DataFrame(nofilt_0, columns =['markers'])
df1 = pd.DataFrame(dub_0, columns = ['markers'])
int_df = pd.merge(df,df1, how='inner') # intersection
df1[~df1.isin(df)] #not intersecting

不知道哪里出了问题,还是我的整个代码都错了……任何帮助都非常感谢,谢谢你。注:NOFILT_0和DUB_0的列表要大得多(23192行)

jdgnovmf

jdgnovmf1#

为此,最好使用内置方法。使用您的dfdf1 Dataframe ,结果如下所示:

set(df['markers']).intersection(set(df1['markers']))

相关问题