pandas 显示索引[1]中第二次出现两个连续零的索引[0]的值

toiithl6  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(88)

问题/上下文:我正在尝试编写一段代码,它将读取指定文件夹中的每个CSV文件,在列索引1中查找第二个连续出现的零,并在找到时显示列索引[0]中的相应值。下面是我目前所做的,但它不起作用。它只是说没有带有两个连续零的列。但是,当我打开各个文件时,我可以清楚地看到有两个连续的零的列。我以前问过这个问题,并得到了有效的结果,直到我在自己的 Dataframe 中进行了替换。也许是因为我自己的 Dataframe 既包含数字又包含单词?
数据框架

我的天|试验编号| 0||未命名:0|文件名|| 0 |0.0|| 1 |1.0|| 2 |0.0|| 3 |1.0|| 4 |1.0|| 5 |0.0|| 6 |1.0|| 7 |1.0|| 8 |1.0|| 9 |1.0|| 10 |0.0|| 11 |0.0|| 12 |1.0|| 13 |0.0|| 14 |1.0|| 15 |1.0|

当前编码

import os
import pandas as pd

folder_path = "/content/drive/session 1 & 2"

def find_column1_value_for_second_zero(file_path):
    try:
        df = pd.read_csv(file_path)
        consecutive_zeros = 0
        column1_value = None

        for _, row in df.iterrows():
            if row.iloc[1] == 0:
                consecutive_zeros += 1
                if consecutive_zeros == 2:
                    column1_value = row.iloc[0]
                    break
            else:
                consecutive_zeros = 0

        return column1_value
    except Exception as e:
        print(f"Error reading file '{file_path}': {str(e)}")
        return None

for filename in os.listdir(folder_path):
    if filename.endswith(".csv"):  # Assuming your files are CSV format
        file_path = os.path.join(folder_path, filename)
        
        column1_value = find_column1_value_for_second_zero(file_path)
        
        if column1_value is not None:
            print(f"In file '{filename}', the value in column 1 for the second zero in column 2 is: {column1_value}")
        else:
            print(f"In file '{filename}', no second zero in column 2 was found.")

预期效果:获取列[0]中的值,该值与列1中的第二个连续零位于同一行。在上面的dataframe中,它应该返回'11'。
实际结果:每一行返回“在列2中没有找到第二个零”。

deyfvvtc

deyfvvtc1#

不要使用复杂的循环,而是使用向量代码。
假设此示例输入:

col1  col2 col3
0      0     0    A
1      1     1    B
2      2     0    C
3      3     1    D
4      4     1    E
5      5     0    F
6      6     0    G  # should be selected
7      7     0    H
8      8     1    I
9      9     0    K
10    10     0    L  # should be selected

使用groupby.cumcount和布尔索引:

N = 2         # nth consecutive 0 to select
col = 'col2'  # target column

grp = df[col].ne(0).cumsum()      # form groups
cnt = df.groupby(grp).cumcount()  # enumerate 0s
out = df[cnt.eq(N)]               # select nth

输出量:

col1  col2 col3
6      6     0    G
10    10     0    L

中间体:

col1  col2 col3  grp  cnt  cnt.eq(N)
0      0     0    A    0    0      False
1      1     1    B    1    0      False
2      2     0    C    1    1      False
3      3     1    D    2    0      False
4      4     1    E    3    0      False
5      5     0    F    3    1      False
6      6     0    G    3    2       True
7      7     0    H    3    3      False
8      8     1    I    4    0      False
9      9     0    K    4    1      False
10    10     0    L    4    2       True
zi8p0yeb

zi8p0yeb2#

另一个基于我认为您的CSV的工作解决方案是:

from pathlib import Path
import csv

folder_path = '.'

def find_column1_value_for_second_zero(file_path):
    with open(file_path, newline='') as f:
        consec = 0
        for row in csv.reader(f):
            try: x, y = row[0], float(row[1])
            except ValueError: continue
            consec = 0 if y != 0 else consec + 1
            if consec == 2: return x

if __name__ == '__main__':
    
    for filename in Path(folder_path).glob('*csv'):
        column1_value = find_column1_value_for_second_zero(filename.resolve())
        if column1_value:
            print(f"In file '{filename}', the value in column 1 for the second zero in column 2 is: {column1_value}")
        else:
            print(f"In file '{filename}', no second zero in column 2 was found.")

相关问题