从分子动力学模拟文件中提取信息的Python代码

xytpbqjk  于 2023-04-10  发布在  Python
关注(0)|答案(1)|浏览(136)

我有一个分子动力学模拟的输出文件。文件的一部分如下:

Entering Dynamics:    iteration =  3200
                           time      =   0.7740 pico-seconds

     Andersen thermostat:   22 collisions

ATOMIC_POSITIONS (angstrom)
O             1.2548684197        0.6424461240        0.7655878783    0   0   0
O            13.1348755441        1.7104775362        0.9831193322    0   0   0
Si            0.2961184815        3.9514997961        1.4334508031    0   0   0
Si            3.4858850598        6.3535750817       -0.0759632619    0   0   0
O             4.1551025227        5.0479390695        0.3635581483    0   0   0
Si            7.0355725487        6.6231051760        3.5097262837    0   0   0
O             6.2913557105        5.2296438029        3.7778484627    0   0   0
O             9.8141798579        5.2550557888        0.6738163980    0   0   0
Si            9.4725273964        6.6846863678        1.5204773455    0   0   0
Si            4.8006823796       11.3515927947        1.8133971611    0   0   0
O             5.9975449939        7.8470275094        3.2154093810    0   0   0
O             7.8754782756        9.8763121275        0.5722748194    0   0   0
O            -0.2959658746        0.0533054014        2.8651388320    0   0   0
Si            7.4183110714        1.6011961440        1.4468400910    0   0   0
O             5.0804368525       -0.3344709270        5.3555368062
Si           11.5338967914        1.7892923751        1.3908462020    0   0   0
O            12.0656978779        1.0046462656        4.0351985742
O             0.5207366786        4.4457166923        2.9846728696    0   0   0
Si            5.1841357277        4.0728488582        4.0746941885
O             5.1872386308        3.1106064770        2.7381557077    0   0   0
O             5.8056941280        3.0861498500        5.2873425714
O             8.4040054974        2.9471067707        1.1598483191    0   0   0
O             1.8608924768        6.1316507204        5.0920863017
O             3.7164141756        4.6241968448        4.3720160199
O            10.7430822384        6.8454765622        2.5612287404    0   0   0
Si            1.8961563104        7.4020010486        3.9672729592    0   0   0
O             3.4314681107        7.7675775631        3.6568414736    0   0   0
O             7.8525657574        7.1129666203        4.8139136918
Si            9.2285669588        9.1345382788        1.2296763780    0   0   0
O             8.0175722171        6.4483381802        2.2249489379    0   0   0
Si           -1.5864332607       10.6831540134        3.6182013263    0   0   0
O             4.6458025650        9.9775888757        2.7132759539    0   0   0
Si            4.9655131010       -0.8235046017        3.7462341115    0   0   0
O             8.7806776962       11.8629093332        4.0996484633
Si            9.8620383541       13.0065268916        4.4627978826
O            11.1203784142       11.3989385058        3.0725920949    0   0   0
Si            1.7370720581        0.4985632232        5.4133887769
O             7.8689631579       -0.8411424613        6.8001682976
O             1.8593021975        3.4289762669        5.3588309051
Si            8.2630864331        0.6006636472        6.0640224774
O             7.7797871380        2.0415566369        6.7014630911
O             1.7253534194        5.0053996874        7.1288337168
Si            2.8380624986        4.7709548130        5.7830629234
O             4.8238599095        6.1446392654        8.9217003102
O             6.1138021636        3.8559899537        7.9573682387
Si            5.4316507955        5.2744024202        7.6601085633
O             9.1685324914        4.4050300422        5.4453592810
Si            6.2413164651        2.6077699732        6.7965931056
Si           10.0301402752        5.4300848786        6.3239657737
O            13.7770558709        4.7605959661        6.6173171924
O             1.2583474444        8.5881843912        4.8423762424
Si            3.7672356322        7.4407292523        9.0018747173
O             4.4866104836        8.5787761710        7.9957768354
Si            7.7655911405        7.1471163307        6.4241215090
O             6.6577840429        6.1141075159        7.0251280749
O             9.3451435563        6.7756485511        6.9331690833
O            12.2125407892        9.8457335674        6.5913932616
Si            4.2384301635        9.6692959125        6.7673324052
O             5.0544710564       10.9211013065        7.3404829696
O             9.6907487259       10.2861855536        6.1779769547
C             3.7233324270        7.7989590137       10.8768331090
H             3.0282047684        7.0416044129       11.2795989504
H             4.6979392526        7.5564111614       11.2953932171
H             3.3833398135        8.7948495362       11.2347725215

     kinetic energy (Ekin) =           0.09821287 Ry
     temperature           =         279.39795411 K 
     Ekin + Etot (const)   =       -2637.60538954 Ry

     Writing config-only to output data dir ./SiO2.save/

它为每次迭代提供了这样的原子位置。我想计算键长,即Si原子的最高z坐标和C原子的z坐标之间的距离。键长应该为文件中的每一组坐标计算并存储为数组或列表。如何使用Python脚本来完成?我试过很多方法,但没有一个能给我正确的答案。
我试过这个

import re

# Regular expression pattern to extract the Si atomic positions
pattern = r"Si\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)"

# List to store the highest z coordinate of Si atoms for each iteration
highest_z_coords = []

# Open the file containing the atomic positions and iterate over its lines
with open("cf3.txt", "r") as f:
    for line in f:
        # Check if the line contains the iteration number and atomic positions
        if "ATOMIC_POSITIONS (angstrom)\n" in line:
            # Initialize the highest z coordinate to a negative value
            highest_z = -float("inf")
            # Iterate over the next few lines to extract the atomic positions of Si atoms
            for i in range(6):
                next_line = next(f)
                # Use regular expression to extract the Si atomic positions
                match = re.search(pattern, next_line)
                if match:
                    # Extract the z coordinate of the Si atom and update the highest z coordinate
                    z_coord = float(match.group(3))
                    if z_coord > highest_z:
                        highest_z = z_coord
            # Add the highest z coordinate to the list for this iteration
            highest_z_coords.append(highest_z)

# Print the list of highest z coordinates for each iteration
print(highest_z_coords)

但它给出误差。

yfwxisqw

yfwxisqw1#

在你的代码中,你有一个主循环来逐行读取文件。
检查每行的文本

"ATOMIC_POSITIONS (angstrom)\n"

当找到这个文本时,你循环接下来的6行,从你的Si行中得到z coord
6行中的最高值被设置为变量highest_z,然后在6的循环结束时将其附加到列表highest_z_coords
然后,代码跳回到主循环,继续从刚刚处理的6之后的下一行阅读文件中的行,并再次查找文本为"ATOMIC_POSITIONS (angstrom)\n"的行
在你的示例数据中,只有一行包含"ATOMIC_POSITIONS (angstrom)\n",这意味着循环现在只是迭代文件中的其余行,不再做任何事情。
highest_z_coords不再进行追加,因此它只保存并打印"ATOMIC_POSITIONS (angstrom)\n"后面前6行的一个值
如果你想得到64个元素中每组6个元素的最高z值,你可以包含一个while循环,它基本上是在你开始处理6个元素的片段后检查下一个空行。或者你可以设置它在找到Carbon元素时结束。
我修改了你的代码,循环遍历所有元素(64),并作为一批处理6个元素,从而在highest_z_coords列表中产生10个highest_z values
示例代码

import re

# Regular expression pattern to extract the Si atomic positions
pattern = r"Si\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)"

# List to store the highest z coordinate of Si atoms for each iteration
highest_z_coords = []

# Open the file containing the atomic positions and iterate over its lines
with open("cf3.txt", "r") as f:
    for line in f:
        # Check if the line contains the iteration number and atomic positions
        if "ATOMIC_POSITIONS (angstrom)\n" in line:
            ### Header text for elements found start processing all in the list
            processing_segments = True
            ### While loop continually runs the range of 6 until processing_segments = False
            while processing_segments is True:
                # Initialize the highest z coordinate to a negative value
                highest_z = -float("inf")
                # Iterate over the next few lines to extract the atomic positions of Si atoms
                for i in range(6):
                    next_line = next(f)
                    ### When the next line is empty signifying the line after the last
                    ### element, processing can stop and break from the highest z coord check
                    if next_line == "\n":
                        processing_segments = False  # This will end the while loop
                        break  # This will end the range loop 
                    # Use regular expression to extract the Si atomic positions
                    match = re.search(pattern, next_line)
                    if match:
                        # Extract the z coordinate of the Si atom and update the highest z coordinate
                        z_coord = float(match.group(3))
                        if z_coord > highest_z:
                            highest_z = z_coord

                # Add the highest z coordinate to the list for this iteration
                if highest_z != float('-inf'):
                    highest_z_coords.append(highest_z)

# Print the list of highest z coordinates for each iteration
print(highest_z_coords)

我假设列表highest_z_coords的预期输出是这样的:

[3.5097262837, 1.8133971611, 1.446840091, 4.0746941885, 3.9672729592, 4.4627978826, 6.0640224774, 7.6601085633, 9.0018747173, 6.7673324052]

相关问题