TypeError:字符串索引必须是带反斜杠的整数JSON

8fq7wneg  于 2022-11-26  发布在  其他
关注(0)|答案(1)|浏览(187)

我尝试从脚本标记中提取JSON数据,并从中提取数据。
我的准则。

import requests, json
from bs4 import BeautifulSoup
head = {
    "Accept": 'application/json, text/plain, */*',
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8,km;q=0.7",
    "Connection": "keep-alive",
    "Host": "www.ixigua.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"
}
url = "https://www.ixigua.com/home/58484635562"
ree = requests.get(url, headers=head)
soup = BeautifulSoup(ree.content, 'html.parser')

script = soup.find_all('script')[-2].text
print(script)

with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(script, f, ensure_ascii = False)

结果如下

"window._SSR_HYDRATED_DATA={\"recommendFeed\":null,\"attentionFeed\":null,\"nbaFeed\":null,\"livingFeed\":null,\"channelFeed\":[],\"homeFeed\":null,\"adBanner\":[],\"channelInfo\":null,\"ChannelFeedList\":[],\"UserDetail\":{\"enableTabs\":[],\"hotPersonList\":[],\"userInfo\":{\"name\":\"-\",\"description\":\"-\",\"avatar\":\"\",\"followersCount\":0,\"followingCount\":0,\"user_id\":\"\",\"follow\":false},\"videoData\":{\"videoList\":[],\"loading\":true},\"hotsoonData\":{\"hotsoonList\":[]},\"preview_series\":[],\"seriesData\":{\"series_list\":[],\"hasMore\":false,\"nextCursor\":\"0\"}},\"FooterLinks\":[],\"LvideoChannel\":[],\"LvideoChannelOnTcc\":[],\"LvideoCategory\":[],\"AlbumInCategory\":[],\"ChannelFeedV2\":[],\"ChannelLevelOneConfig\":[],\"ChannelLevelTwoConfig\":[],\"HighQualityFeed\":[],\"ChannelBannerConfig\":[],\"Teleplay\":null,\"Projection\":{\"video\":{},\"series\":{},\"pSeries\":{},\"playlist\":{\"item_num\":0},\"shouldReturn404\":false,\"item_id\":\"\",\"key\":undefined},\"CinemaChannelFeed\":[],\"CinemaFeedRebojiemu\":[],\"CinemaFeedFromRedis\":[],\"MyWatchHistory\":[{\"type\":\"all\",\"videoFeed\":[],\"hasMore\":true},{\"type\":\"svideo\",\"videoFeed\":[],\"hasMore\":true},{\"type\":\"lvideo\",\"videoFeed\":[],\"hasMore\":true}],\"MyFavorite\":[{\"type\":\"all\",\"videoFeed\":[],\"hasMore\":true},{\"type\":\"svideo\",\"videoFeed\":[],\"hasMore\":true},{\"type\":\"lvideo\",\"videoFeed\":[],\"hasMore\":true}],\"AuthorDetailInfo\":{\"user_id\":\"58484635562\",\"media_id\":\"1562629337991170\",\"name\":\"鼎力推鉴王鼎杰工作室\",\"introduce\":\"小细节里的大战略,大格局里的小动作。\",\"avatar\":\"https:\\u002F\\u002Fsf3-cdn-tos.bdxiguastatic....

但每当我尝试打印[“AuthorDetailInfo”]时,我都收到错误。

print(script["AuthorDetailInfo"])

错误结果

print(script["AuthorDetailInfo"])
TypeError: string indices must be integers

我怎么能打印这个?我怎么能从JSON中删除所有的反斜杠?
编码

print(script["AuthorDetailInfo"])

预期结果

{
 "user_id":"58484635562",
 "media_id":"1562629337991170",
 "name":"鼎力推鉴王鼎杰工作室",
 "introduce":"小细节里的大战略"...
}
bq3bfh9z

bq3bfh9z1#

script是JavaScript代码,而不是JSON。请注意{前面的window._SSR_HYDRATED_DATA=。后面的所有内容都可以被视为JSON(尽管从技术上讲它不是JSON)。您必须首先处理变量赋值。一种方法是使用split()

_, my_json = script.split('=', maxsplit=1)

现在可以使用json.loads()来解析它:

obj = my_json.loads(my_json)

最后你就能得到你想要的部分:

print(obj['AuthorDetailInfo'])

注意:maxsplit=1只是为了防止字符串中有其他的=字符,而且,只有在赋值中的JavaScript对象是有效的JSON时才起作用。

相关问题