python 为什么标签编码的索引没有序列化？

6vl6ewon 于 2022-12-25 发布在 Python

关注(0)|答案(1)|浏览(146)

这是我的标签值：

df['Label'].value_counts()
------------------------------------
Benign                    4401366
DDoS attacks-LOIC-HTTP     576191
FTP-BruteForce             193360
SSH-Bruteforce             187589
DoS attacks-GoldenEye       41508
DoS attacks-Slowloris       10990
Name: Label, dtype: int64

我使用标签编码来结束代码：

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
label_encoder.fit(df.Label)
df['Label']= label_encoder.transform(df.Label)

这是结果

df['Label'].value_counts()
------------------------------
0    4380628
1     576191
4     193354
5     187589
2      41508
3      10990
Name: Label, dtype: int64

我想要这样的结果：

df['Label'].value_counts()
------------------------------
0    4380628
1     576191
2     193354
3     187589
4      41508
5      10990
Name: Label, dtype: int64

有人知道是什么问题以及如何解决吗？

python

来源：https://stackoverflow.com/questions/74912173/why-the-index-of-label-encoding-is-not-seriated

1条答案

按热度按时间

swvgeqrz1#

- 示例**

我们需要可复制和最小的例子来回答。让我们

df = pd.DataFrame(list('BACCCCAAAA'), columns=['col1'])

第一个月

col1
0   B
1   A
2   C
3   C
4   C
5   C
6   A
7   A
8   A
9   A

- 代码**

df['col1'].value_counts()

A    5
C    4
B    1
Name: col1, dtype: int64

你的问题是因为它是按照它出现的顺序编码的。
B-0、A-1、C-2在df中出现顺序。
如果要制作A-0，C-1，B-2（按频率），这可以用Pandas单独解决（不需要其他库）。使用以下代码：

s = df['col1'].map(lambda x: df['col1'].value_counts().index.get_loc(x))

s

0    2
1    0
2    1
3    1
4    1
5    1
6    0
7    0
8    0
9    0
Name: col1, dtype: int64

使s到第1列

out = df.assign(col1=s)

out

检查值_计数

out['col1'].value_counts()

0    5
1    4
2    1
Name: col1, dtype: int64

赞(0）回复(0）举报 2022-12-25

我来回答

python 为什么标签编码的索引没有序列化？

1条答案

相关问题

热门标签

最新问答