MATLAB-如何将具有另一个文本文件中ID的SDF文件中的分子提取到新的SDF文件中?

ehxuflar  于 2022-11-15  发布在  Matlab
关注(0)|答案(1)|浏览(245)

我有一个包含数千个分子的SDF文件和几个按某些特征分组在一起的ID文本文件。现在,我有一个脚本,它加载到带有分子特征的CSV数据库中,并通过基于这些特征进行分类来生成ID文本文件。我想使用这些文本文件来解析SDF文件,以获得带有相应分子的新SDF文件。此外,我还想在MatLab中实现这一点。
例如,以下是原始SDF文件中的一些分子:

NCGC00178831-03
  Marvin  07111412562D          

 34 37  0  0  0  0            999 V2000
    4.8814   -2.7443    0.0000 Cl  0  5  0  0  0  0  0  0  0  0  0  0
    2.8647   -2.4751    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8647   -1.6501    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
    3.5808   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2970   -1.6501    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.0017   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7179   -1.6501    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.0017   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2970    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5808   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8647    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1485   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1485   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4324   -1.6501    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7162   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -1.6501    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.7162   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4324    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8761   -3.5407    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    3.5923   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.3084   -3.5407    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.0132   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7293   -3.5407    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.0132   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.3084   -5.1908    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5923   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8761   -5.1908    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1599   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1599   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4438   -3.5407    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7276   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0115   -3.5407    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.7276   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4438   -5.1908    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  3 13  1  0  0  0  0
  4  5  1  0  0  0  0
  4 10  1  0  0  0  0
  5  6  2  0  0  0  0
  6  7  1  0  0  0  0
  6  8  1  0  0  0  0
  8  9  2  0  0  0  0
  9 10  1  0  0  0  0
 10 11  2  0  0  0  0
 11 12  1  0  0  0  0
 12 13  2  0  0  0  0
 12 18  1  0  0  0  0
 13 14  1  0  0  0  0
 14 15  2  0  0  0  0
 15 16  1  0  0  0  0
 15 17  1  0  0  0  0
 17 18  2  0  0  0  0
 19 20  2  0  0  0  0
 19 29  1  0  0  0  0
 20 21  1  0  0  0  0
 20 26  1  0  0  0  0
 21 22  2  0  0  0  0
 22 23  1  0  0  0  0
 22 24  1  0  0  0  0
 24 25  2  0  0  0  0
 25 26  1  0  0  0  0
 26 27  2  0  0  0  0
 27 28  1  0  0  0  0
 28 29  2  0  0  0  0
 28 34  1  0  0  0  0
 29 30  1  0  0  0  0
 30 31  2  0  0  0  0
 31 32  1  0  0  0  0
 31 33  1  0  0  0  0
 33 34  2  0  0  0  0
M  CHG  2   1  -1   3   1
M  END
>  <Formula>
C27H25ClN6

>  <FW>
468.9806 (35.4535+224.2805+209.2465)

>  <DSSTox_CID>
25848

>  <SR-HSE>
0

$$$$
NCGC00166114-03
  Marvin  07111412562D          

 31 32  0  0  0  0            999 V2000
    4.9884   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9884   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2748   -2.4764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2748   -3.7038    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9884   -4.1178    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7021   -3.7038    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4157   -4.1178    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    5.7021   -2.8760    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.9884   -4.9385    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2748   -5.3524    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5612   -4.9385    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5612   -4.1178    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5612   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5612   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2748   -0.8279    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.8403   -0.8279    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1267   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1267   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8403   -2.4764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4202   -2.4764    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
    1.4202   -0.8279    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    2.8403    0.0000    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
    5.7021   -2.4764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4229   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4229   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7021   -0.8279    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7021    0.0000    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
    7.1366   -0.8279    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    7.1366   -2.4764    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
    7.0866   -4.1963    0.0000 Na  0  3  0  0  0  0  0  0  0  0  0  0
    0.0000   -0.7708    0.0000 Na  0  3  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1 15  1  0  0  0  0
  1 26  2  0  0  0  0
  2  3  2  0  0  0  0
  2 23  1  0  0  0  0
  3  4  1  0  0  0  0
  3 13  1  0  0  0  0
  4  5  2  0  0  0  0
  4 12  1  0  0  0  0
  5  6  1  0  0  0  0
  5  9  1  0  0  0  0
  6  7  1  0  0  0  0
  6  8  2  0  0  0  0
  9 10  2  0  0  0  0
 10 11  1  0  0  0  0
 11 12  2  0  0  0  0
 13 14  2  0  0  0  0
 13 19  1  0  0  0  0
 14 15  1  0  0  0  0
 14 16  1  0  0  0  0
 16 17  2  0  0  0  0
 16 22  1  0  0  0  0
 17 18  1  0  0  0  0
 17 21  1  0  0  0  0
 18 19  2  0  0  0  0
 18 20  1  0  0  0  0
 23 24  2  0  0  0  0
 24 25  1  0  0  0  0
 24 29  1  0  0  0  0
 25 26  1  0  0  0  0
 25 28  2  0  0  0  0
 26 27  1  0  0  0  0
M  CHG  4   7  -1  21  -1  30   1  31   1
M  END
>  <Formula>
C20H6Br4Na2O5

>  <FW>
691.8542 (645.8757+22.9892+22.9892)

>  <DSSTox_CID>
5234

>  <SR-HSE>
0

$$$$
NCGC00263563-01
  Marvin  07111412562D          

 71 76  0  0  1  0            999 V2000
    2.1953   -4.9878    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    3.6803   -4.9878    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    2.9701   -5.4074    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.5858   -4.9878    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    5.1008   -4.9878    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    2.1953   -4.1484    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   11.8157   -5.6335    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   14.1239   -5.8755    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   11.0893   -5.1008    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    3.6803   -4.1484    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   10.2015   -5.1008    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   12.5905   -5.1653    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   14.9633   -5.8755    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    4.3905   -5.4074    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    5.8755   -5.4074    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.9701   -3.6803    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   11.4606   -4.3905    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   13.6558   -5.1653    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    9.5559   -5.5043    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    7.2476   -5.5043    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.1008   -4.1484    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    1.4850   -5.4074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.8157   -2.4858    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.9578   -4.9878    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    6.5858   -4.1484    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.5905   -2.9055    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   12.3483   -4.3905    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.8157   -1.6626    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8755   -3.6803    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   13.3008   -1.6626    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   12.5905   -1.2429    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   13.3008   -2.4858    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    8.8457   -4.9878    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   11.4606   -3.1961    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   14.1239   -4.5035    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    0.7748   -4.9878    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.4314   -5.2137    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   14.9633   -4.5035    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.9756   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -5.4074    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    7.6673   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1953   -5.7464    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.8764   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.0877   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7748   -4.1484    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   14.5437   -6.4567    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.6803   -3.3736    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.9701   -2.9055    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.8755   -2.9055    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   14.0110   -1.2429    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   12.5905   -0.4197    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.4850   -3.6803    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.5444   -6.4082    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.5566   -4.3905    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.3905   -6.1177    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5035   -3.7933    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.1838   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   14.0110   -2.9055    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.6558   -3.7449    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   16.1416   -5.2137    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2130   -2.9701    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1953   -2.3729    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   14.7858   -1.6626    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.3008    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.0893   -5.8755    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   12.5905   -5.9885    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    8.8941   -5.7464    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.6803   -5.7464    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.1008   -5.7464    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   13.6558   -5.9885    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.4681   -6.7634    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
  1  3  1  0  0  0  0
  1  6  1  0  0  0  0
  1 22  1  6  0  0  0
  1 42  1  1  0  0  0
  2  3  1  0  0  0  0
  2 14  1  0  0  0  0
  2 68  1  1  0  0  0
  2 10  1  0  0  0  0
  4 15  1  0  0  0  0
  4 20  1  1  0  0  0
  4 43  1  0  0  0  0
  4 25  1  0  0  0  0
  5 14  1  0  0  0  0
  5 15  1  0  0  0  0
  5 21  1  0  0  0  0
  5 69  1  1  0  0  0
  6 16  1  0  0  0  0
  6 52  1  1  0  0  0
  7  9  1  0  0  0  0
  7 12  1  0  0  0  0
  8 18  1  0  0  0  0
  8 13  1  0  0  0  0
  9 11  1  0  0  0  0
  9 17  1  0  0  0  0
  9 65  1  6  0  0  0
 10 16  1  0  0  0  0
 10 47  1  1  0  0  0
 11 19  1  0  0  0  0
 11 54  1  6  0  0  0
 11 39  1  0  0  0  0
 12 18  1  0  0  0  0
 12 66  1  1  0  0  0
 12 27  1  0  0  0  0
 13 46  1  1  0  0  0
 13 53  1  6  0  0  0
 13 37  1  0  0  0  0
 14 55  1  1  0  0  0
 16 48  1  6  0  0  0
 17 27  1  0  0  0  0
 17 34  1  1  0  0  0
 18 35  1  0  0  0  0
 18 70  1  1  0  0  0
 19 33  1  0  0  0  0
 20 24  1  0  0  0  0
 21 29  1  0  0  0  0
 21 56  1  6  0  0  0
 22 36  1  0  0  0  0
 23 34  1  0  0  0  0
 23 26  1  0  0  0  0
 23 28  1  0  0  0  0
 24 33  1  0  0  0  0
 24 57  1  6  0  0  0
 24 41  1  0  0  0  0
 25 29  1  0  0  0  0
 26 32  1  0  0  0  0
 28 31  1  0  0  0  0
 29 49  1  1  0  0  0
 30 31  1  0  0  0  0
 30 50  1  1  0  0  0
 30 32  1  0  0  0  0
 31 51  1  6  0  0  0
 32 58  1  6  0  0  0
 33 44  1  0  0  0  0
 33 67  1  6  0  0  0
 35 38  1  0  0  0  0
 35 59  1  1  0  0  0
 36 40  1  0  0  0  0
 36 45  2  0  0  0  0
 37 38  1  0  0  0  0
 37 60  1  1  0  0  0
 39 44  1  0  0  0  0
 41 43  1  0  0  0  0
 47 61  1  0  0  0  0
 48 62  1  0  0  0  0
 50 63  1  0  0  0  0
 51 64  1  0  0  0  0
M  CHG  2  40  -1  71   1
M  END
>  <Formula>
C47H83NO17

>  <FW>
934.1584 (916.1205+18.0379)

>  <DSSTox_CID>
28909

>  <SR-HSE>
0

$$$$

以下是文本文件中的一些ID:

NCGC00015959-03
NCGC00168261-01
NCGC00257010-01
NCGC00254654-01
NCGC00254471-01

生成的SDF文件应该如下所示:

NCGC00015959-03
  Marvin  07111412562D          

 25 30  0  0  0  0            999 V2000
    3.4098   -1.3130    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
    4.8329   -1.3130    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.4098   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.1248   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.6948   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.8329   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.1248   -0.8937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.5547   -0.8937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9799   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.6948   -3.3548    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2718   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2718   -3.3548    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.1248   -3.3548    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9799   -3.7741    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.5547   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2765   -1.3130    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7128   -0.0894    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.4881   -2.2755    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.4881   -3.6160    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.8746   -0.7562    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.5378    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -2.9423    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.4098   -3.7741    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2765   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.6948   -0.8937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  3  1  0  0  0  0
  1  7  2  0  0  0  0
  1 25  1  0  0  0  0
  2  7  1  0  0  0  0
  2  6  2  0  0  0  0
  2  8  1  0  0  0  0
  3  4  2  0  0  0  0
  3  5  1  0  0  0  0
  4 13  1  0  0  0  0
  4  6  1  0  0  0  0
  5  9  1  0  0  0  0
  5 10  2  0  0  0  0
  6 15  1  0  0  0  0
  8 16  2  0  0  0  0
  8 17  1  0  0  0  0
  9 11  2  0  0  0  0
 10 14  1  0  0  0  0
 10 23  1  0  0  0  0
 11 18  1  0  0  0  0
 11 12  1  0  0  0  0
 12 14  2  0  0  0  0
 12 19  1  0  0  0  0
 13 23  2  0  0  0  0
 15 24  2  0  0  0  0
 16 20  1  0  0  0  0
 16 24  1  0  0  0  0
 17 21  1  0  0  0  0
 18 22  1  0  0  0  0
 19 22  1  0  0  0  0
 20 21  1  0  0  0  0
M  CHG  1   1   1
M  END
>  <Formula>
C20H14NO4

>  <FW>
332.3289

>  <DSSTox_CID>
25204

>  <NR-AR>
0

>  <NR-ER-LBD>
1

>  <NR-AhR>
1

$$$$
NCGC00168261-01
  Marvin  07111412562D          

 23 25  0  0  0  0            999 V2000
    2.1236   -2.4895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4205   -2.0662    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1236   -3.3074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4205   -3.7235    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.7174   -2.4895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7174   -3.3074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8554   -2.0662    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -2.0662    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4205   -1.2412    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8554   -3.7235    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5656   -2.4895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5656   -3.3074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8554   -1.2412    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.7174   -0.8251    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -1.2412    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0430   -2.8984    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7174   -4.1324    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2902   -3.7378    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7174    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.0292   -3.3145    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4569   -3.3360    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7538   -3.7378    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.1743   -3.7378    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  2  0  0  0  0
  1  7  1  0  0  0  0
  2  5  2  0  0  0  0
  2  9  1  0  0  0  0
  3  4  1  0  0  0  0
  3 10  1  0  0  0  0
  4  6  1  0  0  0  0
  5  8  1  0  0  0  0
  5  6  1  0  0  0  0
  6 16  1  0  0  0  0
  6 17  1  0  0  0  0
  7 11  2  0  0  0  0
  7 13  1  0  0  0  0
  8 15  2  0  0  0  0
  9 14  2  0  0  0  0
 10 12  2  0  0  0  0
 11 12  1  0  0  0  0
 12 18  1  0  0  0  0
 14 15  1  0  0  0  0
 14 19  1  0  0  0  0
 18 20  1  0  0  0  0
 20 22  1  0  0  0  0
 21 22  1  0  0  0  0
 21 23  1  0  0  0  0
M  END
>  <Formula>
C21H26O2

>  <FW>
310.4299

>  <DSSTox_CID>
28922

>  <NR-AR>
0

>  <NR-AhR>
1

>  <SR-MMP>
1

$$$$
NCGC00257010-01
  Marvin  07111412562D          

 35 37  0  0  0  0            999 V2000
    2.0286   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.0019   -7.8578    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.0019   -0.7019    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8589   -3.5779    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.6092   -2.8589    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.6092   -4.2799    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    3.2784   -4.2799    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    6.5825   -7.1217    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.5825   -1.4381    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.3681   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5024   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5024   -4.9989    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.0915   -4.2799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.3412   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.3412   -4.9989    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7704   -4.2799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7704   -2.8589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.7294   -1.1385    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    6.2829   -0.2996    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    7.7294   -7.4213    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    7.4384   -8.5597    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    6.2829   -8.2601    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    7.4384    0.0000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    7.0019   -2.1485    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.0019   -6.4112    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7607   -1.4381    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7607   -7.1217    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7607   -5.7008    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7607   -2.8589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.5825   -5.7008    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.5825   -2.8589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.3412   -6.4112    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.3412   -2.1485    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -2.9103    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0086   -4.2542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  4  2  0  0  0  0
  1  5  1  0  0  0  0
  1  6  1  0  0  0  0
  2  8  1  0  0  0  0
  2 20  1  0  0  0  0
  2 21  1  0  0  0  0
  2 22  1  0  0  0  0
  3  9  1  0  0  0  0
  3 18  1  0  0  0  0
  3 19  1  0  0  0  0
  3 23  1  0  0  0  0
  4  7  1  0  0  0  0
  5 17  1  0  0  0  0
  6 16  1  0  0  0  0
  7 13  2  0  0  0  0
  8 27  1  0  0  0  0
  8 25  2  0  0  0  0
  9 26  2  0  0  0  0
  9 24  1  0  0  0  0
 10 16  1  0  0  0  0
 10 34  1  0  0  0  0
 10 35  1  0  0  0  0
 10 17  1  0  0  0  0
 11 13  1  0  0  0  0
 11 14  2  0  0  0  0
 12 13  1  0  0  0  0
 12 15  2  0  0  0  0
 14 29  1  0  0  0  0
 15 28  1  0  0  0  0
 24 31  2  0  0  0  0
 25 30  1  0  0  0  0
 26 33  1  0  0  0  0
 27 32  2  0  0  0  0
 28 30  2  0  0  0  0
 28 32  1  0  0  0  0
 29 31  1  0  0  0  0
 29 33  2  0  0  0  0
M  END
>  <Formula>
C25H24F6N4

>  <FW>
494.4753

>  <DSSTox_CID>
3868

>  <NR-AR>
0

>  <NR-ER>
1

>  <NR-AhR>
1

$$$$

我看过这个帖子:Extract molecules in order from SDF file according to IDs given in another file,它在Unix中提供了解决这个问题的解决方案。我在命令行中使用了这个变通方法:awk 'BEGIN{ORS="$$$$"}NR==FNR{a[$1]=$0;next}$1 in a' ids.txt RS="$" molecules.sdf > molecules_by_ids.sdf,并且能够获得我想要的大部分内容。然而,即使我使用这个命令行选项,我也不能100%地从SDF文件中提取分子。例如,有981个分子对其中一个特征是阳性的,文本文件得到981个ID,这个命令给出了SDF文件中的950个分子。
我真正想要的是一个不会遗漏生成文件中任何分子的MatLab解决方案。我感谢任何为解决问题所做的努力。谢谢!

3zwtqj6y

3zwtqj6y1#

我在MatLab中找到的一个解决办法是以下函数,其中“id”是ID TXT文件的名称,“sdfs”是SDF数据库,而“SDF_NAME”是新SDF文件的名称,分子由ID提取:

function write_sdf(id, sdfs, sdf_name)
% Open the text file of ids.
fid = fopen(id);

% Convert the sdf file to a character array.
data = fileread(sdfs);

% For each id, get the portion of the sdf file corresponding
% to the molecule id.
while true
    mol_id = fgetl(fid);
    mol_full = '';

    % When we're at the end of the file, leave the loop.
    if mol_id == -1
        % We're done with the id file.
        fclose(fid);
        break;
    else
        mol_after = extractAfter(data, mol_id);
        mol_between = extractBefore(mol_after, '$$$$');
        mol_full = [char(mol_id) char(mol_between) '$$$$'];

        % Write the molecule to the sdf file.
        writelines(mol_full, sdf_name, WriteMode='append');
    end
 end

end

这个解决方案的问题是它非常慢。如果有人知道更快的方法,请让我知道!现在,我将使用这个。

相关问题