python 正在删除\xf字符

pinkon5k 于 2022-10-30 发布在 Python

关注(0)|答案(3)|浏览(129)

我正在尝试删除所有

\xf0\x9f\x93\xa2, \xf0\x9f\x95\x91\n\, \xe2\x80\xa6,\xe2\x80\x99t

在Python中输入以下字符串中的字符

Text
  _____________________________________________________
"b'Hello! \xf0\x9f\x93\xa2 End Climate Silence is looking for volunteers! \n\n1-2 hours per week. \xf0\x9f\x95\x91\n\nExperience doing digital research\xe2\x80\xa6

"b'I doubt if climate emergency 8s real, I think people will look ba\xe2\x80\xa6 '

"b'No, thankfully it doesn\xe2\x80\x99t. Can\xe2\x80\x99t see how cheap to overtourism in the alan alps can h\xe2\x80\xa6"

"b'Climate Change Poses a WidelllThreat to National Security "

"b""This doesn't feel like targeted propaganda at all. I mean states\xe2\x80\xa6"

"b'berates climate change activist who confronted her in airport\xc2\xa0

以上内容在Pandas数据框中作为一列。
我在努力

string.encode('ascii', errors= 'ignore')

和regex，但没有运气。如果我能得到一些建议，这将是有帮助的。

python

来源：https://stackoverflow.com/questions/69765477/removing-xf-characters

3条答案

按热度按时间

eagi6jfj1#

问题中的当前数据表明，您正在使用重新编码为Unicode字符串的字节串表示形式，现在您希望对这些字节串进行解码，但首先需要将字符串编码回字节串。
因此，在您的情况下，您可以使用

x = "b""This doesn't feel like targeted propaganda at all. I mean states\xe2\x80\xa6"
x = x.encode('latin1').decode('unicode-escape').encode('latin1').decode('utf8')
print(x)

# => bThis doesn't feel like targeted propaganda at all. I mean states…

请参阅this Python demo。

赞(0）回复(0）举报 2022-10-30

kuhbmx9i2#

请尝试解码字节。

text=b'Hello! \xf0\x9f\x93\xa2 End Climate Silence is looking for volunteers! \n\n1-2 hours per week. \xf0\x9f\x95\x91\n\nExperience doing digital research\xe2\x80\xa6'.decode("utf8")
print(text) 
>> Hello! 📢 End Climate Silence is looking for volunteers! 

1-2 hours per week. 🕑

赞(0）回复(0）举报 2022-10-30

vom3gejh3#

这些是编码后到达的十六进制转义字符。所有出现的\x[AB]类型（其中A或B可以是[ 0123456789abcdefABCDEF ]）都可视为此形式。请尝试使用带模式的正则表达式。\\x[0123456789abcdefABCDEF][0123456789abcdefABCDEF]

赞(0）回复(0）举报 2022-10-30