我有一个Pandas数据框架,每行都有一个名为“Word”的列。每个句子的分隔符都是一个空字符串"",所以我使用skip_blank_lines来查看分隔符。
df = pd.read_csv("Data-June-2023.txt", sep=" ",skip_blank_lines=False)
df.tail(20)
Index Word _ _ Tag
0 I _ _ O
1 am _ _ O
2 from _ _ O
3 Madrid _ _ B-City
4 NaN NaN NaN NaN
5 Alice _ _ B-Person
6 likes _ _ O
7 Bob _ _ B-Person
我想创建一个名为"Sentence #"的新列,方法是在空行或NaN值上进行迭代。在“Word”中的每个NaN值处,它将为Sentence创建新句子的新计数:1、判决:2、判决:3等
Index Sentence # Word _ _ Tag
0 Sentence: 1 I _ _ O
1 am _ _ O
2 from _ _ O
3 Oxford _ _ B-City
4 NaN NaN NaN NaN
5 Sentence: 2 Alice _ _ B-Person
6 likes _ _ O
7 Bob _ _ B-Person
8 NaN NaN NaN NaN
9 Sentence: 3 Alice _ _ B-Person
感谢您的评分
1条答案
按热度按时间szqfcxe21#
使用boolean indexing:
输出: