pandas 如何在数据框中查找每行最长的字符串,并在超过一定数量时打印行号

icomxhvb  于 2022-12-02  发布在  其他
关注(0)|答案(1)|浏览(170)

我想写一个程序,它搜索一个数据框,如果其中任何一项超过50个字符长,打印行号,并询问是否要继续搜索数据框。

threshold = 50 

mask = (df.drop(columns=exclude, errors='ignore')
          .apply(lambda s: s.str.len().ge(threshold))
        )

out = df.loc[~mask.any(axis=1)]

我尝试使用此方法,但我不想删除行,只打印字符串超过50的行号
输入:

0 "Robert","20221019161921","London"
1 "Edward","20221019161921","London"
2 "Johnny","20221019161921","London"
3 "Insane string which is way too longggggggggggg","20221019161921","London"

输出量:

Row 3 is above the 50-character limit.

我还希望程序打印特定的值或字符串,这是太长。

d7v8vwbk

d7v8vwbk1#

您可以用途:

exclude = []
threshold = 30

mask = (df.drop(columns=exclude, errors='ignore')
          .apply(lambda s: s.str.len().ge(threshold))
        )

s = mask.any(axis=1)

for idx in s[s].index:
    print(f'row {idx} is above the {threshold}-character limit.')
    s2 = mask.loc[idx]
    for string in df.loc[idx, s2.reindex(df.columns, fill_value=False)]:
        print(string)

输出量:

row 3 is above the 30-character limit.
"Insane string which is way too longggggggggggg","20221019161921","London"

中间体s

0    False
1    False
2    False
3     True
dtype: bool

相关问题