pandas 多标签文本分类数据集的Python创建方法

bvjveswy 于 2022-11-20 发布在 Python

关注(0)|答案(1)|浏览(152)

我有一个文本数据集，看起来像这样。

import pandas as pd
df = pd.DataFrame({'Sentence': ['Hello World',
                                'The quick brown fox jumps over the lazy dog.',
                                'Just some text to make third sentence!'
                               ],
                   'label': ['greetings',
                             'dog,fox',
                             'some_class,someother_class'
                            ]})

我想把这些数据转换成这样的东西。

对于多标签分类，是否有一种Python方法来进行这种转换？

pandas

来源：https://stackoverflow.com/questions/74440007/pythonic-way-to-create-dataset-for-multilabel-text-classification

1条答案

按热度按时间

fgw7neuy1#

可以使用pandas.Series.explode分解label列，然后使用pandas.crosstab将其与sentences列交叉。
试试看：

def cross_labels(df):
    return pd.crosstab(df["Sentence"], df["label"])

out = (
        df.assign(label= df["label"].str.split(","))
          .explode("label")
          .pipe(cross_labels)
          .rename_axis(None, axis=1)
          .reset_index()
      )

#输出：

print(out)

                                       Sentence  dog  fox  greetings  some_class  someother_class
0                                   Hello World    0    0          1           0                0
1        Just some text to make third sentence!    0    0          0           1                1
2  The quick brown fox jumps over the lazy dog.    1    1          0           0                0

赞(0）回复(0）举报 2022-11-20

我来回答

pandas 多标签文本分类数据集的Python创建方法

1条答案

#输出：

相关问题

热门标签

最新问答