如何格式化在Python中没有格式的HTML URL

m528fe3b  于 2023-09-28  发布在  Python
关注(0)|答案(2)|浏览(129)

我正在建立一个简单的刮刀,将去一个网址,并拉从该页面的信息。我知道,对吧?我的问题是,当我拉页面信息,它不是一个传统的HTML与标题和格式。这是简单的文本。有没有一种方法可以只获取某些信息?我打算尝试导出页面信息,然后通过它阅读,并使另一个文本文件,只有我需要的位!
我需要这个的原因是,我正试图拉13,000+项目ID的,并在一个大ID转储组织他们!我试图将其转换为网站通常使用的JSON文本格式。这是Moviestarplanet 2,我的任务是研究这个游戏。
这是我的代码到目前为止(我知道它的基本!):

# Web scraper test
from bs4 import BeautifulSoup
import html_to_json
import json
import requests
import time

IDNum = 688
url = 'https://us.mspapis.com/shopinventory/v1/shops/listings/' + str(IDNum)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)

我需要尽快解决这个问题,因为这是这个刮刀的主要功能。任何想法或建议将是有益的!我不是python的高级,但我通常可以很好地沿着。如果我问了任何多余或愚蠢的问题,请提前道歉。
我试过使用HTML到JSON库,内置的JSON,谷歌了2个小时,只是拍打的东西,看看它是否工作。我想真正学习,而不是从别人那里复制和粘贴它,看看它为什么会这样做。
编辑!这是我正在尝试格式化的数据!

{'id': '688', 'item': {'id': '912', 'type': 'item', 'singlePurchase': True, 'objectSource': 'curatedcontentitemtemplates', 'objectId': '596', 'resourceIdentifiers': [{'type': 'name', 'key': 'Neutral'}, {'type': 'graphics', 'key': 'default'}], 'tags': [{'hidden': False, 'id': '62', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_MOODS'}, {'type': 'graphics', 'key': 'moods'}], 'type': 'category.animation', 'gameId': '5lxc'}, {'hidden': False, 'id': '85', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_MOODS_BASIC'}, {'type': 'graphics', 'key': 'moods'}], 'type': 'subcategory.animation.62', 'gameId': '5lxc'}, {'hidden': False, 'id': '168', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_FREE'}, {'type': 'graphics', 'key': 'free'}], 'type': 'category.artbooks', 'gameId': '5lxc', 'lookUpId': 'tag_free'}], 'lookUpId': 'f4b919d8-15f9-4dae-964f-bd9262db0a5b', 'additionalData': {'NebulaData': {'DefaultColors': '#FFFFFF', 'Snapshot': 'default_preview'}, 'MSP2Data': {'Loop': 'false'}}}, 'shopId': '8', 'price': {'currency': 'soft', 'salesPrice': 0.0, 'onSale': False}, 'lookUpId': '827e8ca7-60de-4d07-b0ae-61154d579b77'}
2o7dmzc5

2o7dmzc51#

该端点只返回JSON,因此只需调用resp.json()

import requests
import pprint

IDNum = 688
url = f'https://us.mspapis.com/shopinventory/v1/shops/listings/{IDNum}'
resp = requests.get(url)
resp.raise_for_status()
data = resp.json()
pprint(data["item"])  # or whatever
6vl6ewon

6vl6ewon2#

如果我理解正确的话,你想在多行上打印JSON:

import json
import requests

IDNum = 688
url = "https://us.mspapis.com/shopinventory/v1/shops/listings/{}"

page = requests.get(url.format(IDNum))
data = page.json()

# print the Json on multiple lines:
print(json.dumps(data, indent=4))

图纸:

{
    "id": "688",
    "item": {
        "id": "912",
        "type": "item",
        "singlePurchase": true,
        "objectSource": "curatedcontentitemtemplates",
        "objectId": "596",
        "resourceIdentifiers": [
            {
                "type": "name",
                "key": "Neutral"
            },
            {
                "type": "graphics",
                "key": "default"
            }
        ],
        "tags": [
            {
                "hidden": false,
                "id": "62",
                "resourceIdentifiers": [
                    {
                        "type": "label",
                        "key": "TAG_MOODS"
                    },
                    {
                        "type": "graphics",
                        "key": "moods"
                    }
                ],
                "type": "category.animation",
                "gameId": "5lxc"
            },
            {
                "hidden": false,
                "id": "85",
                "resourceIdentifiers": [
                    {
                        "type": "label",
                        "key": "TAG_MOODS_BASIC"
                    },
                    {
                        "type": "graphics",
                        "key": "moods"
                    }
                ],
                "type": "subcategory.animation.62",
                "gameId": "5lxc"
            },
            {
                "hidden": false,
                "id": "168",
                "resourceIdentifiers": [
                    {
                        "type": "label",
                        "key": "TAG_FREE"
                    },
                    {
                        "type": "graphics",
                        "key": "free"
                    }
                ],
                "type": "category.artbooks",
                "gameId": "5lxc",
                "lookUpId": "tag_free"
            }
        ],
        "lookUpId": "f4b919d8-15f9-4dae-964f-bd9262db0a5b",
        "additionalData": {
            "NebulaData": {
                "DefaultColors": "#FFFFFF",
                "Snapshot": "default_preview"
            },
            "MSP2Data": {
                "Loop": "false"
            }
        }
    },
    "shopId": "8",
    "price": {
        "currency": "soft",
        "salesPrice": 0.0,
        "onSale": false
    },
    "lookUpId": "827e8ca7-60de-4d07-b0ae-61154d579b77"
}

相关问题