如何格式化在Python中没有格式的HTML URL

m528fe3b 于 2023-09-28 发布在 Python

关注(0)|答案(2)|浏览(129)

我正在建立一个简单的刮刀，将去一个网址，并拉从该页面的信息。我知道，对吧？我的问题是，当我拉页面信息，它不是一个传统的HTML与标题和格式。这是简单的文本。有没有一种方法可以只获取某些信息？我打算尝试导出页面信息，然后通过它阅读，并使另一个文本文件，只有我需要的位！
我需要这个的原因是，我正试图拉13，000+项目ID的，并在一个大ID转储组织他们！我试图将其转换为网站通常使用的JSON文本格式。这是Moviestarplanet 2，我的任务是研究这个游戏。
这是我的代码到目前为止（我知道它的基本！）：

# Web scraper test
from bs4 import BeautifulSoup
import html_to_json
import json
import requests
import time

IDNum = 688
url = 'https://us.mspapis.com/shopinventory/v1/shops/listings/' + str(IDNum)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)

我需要尽快解决这个问题，因为这是这个刮刀的主要功能。任何想法或建议将是有益的！我不是python的高级，但我通常可以很好地沿着。如果我问了任何多余或愚蠢的问题，请提前道歉。
我试过使用HTML到JSON库，内置的JSON，谷歌了2个小时，只是拍打的东西，看看它是否工作。我想真正学习，而不是从别人那里复制和粘贴它，看看它为什么会这样做。
编辑！这是我正在尝试格式化的数据！

{'id': '688', 'item': {'id': '912', 'type': 'item', 'singlePurchase': True, 'objectSource': 'curatedcontentitemtemplates', 'objectId': '596', 'resourceIdentifiers': [{'type': 'name', 'key': 'Neutral'}, {'type': 'graphics', 'key': 'default'}], 'tags': [{'hidden': False, 'id': '62', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_MOODS'}, {'type': 'graphics', 'key': 'moods'}], 'type': 'category.animation', 'gameId': '5lxc'}, {'hidden': False, 'id': '85', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_MOODS_BASIC'}, {'type': 'graphics', 'key': 'moods'}], 'type': 'subcategory.animation.62', 'gameId': '5lxc'}, {'hidden': False, 'id': '168', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_FREE'}, {'type': 'graphics', 'key': 'free'}], 'type': 'category.artbooks', 'gameId': '5lxc', 'lookUpId': 'tag_free'}], 'lookUpId': 'f4b919d8-15f9-4dae-964f-bd9262db0a5b', 'additionalData': {'NebulaData': {'DefaultColors': '#FFFFFF', 'Snapshot': 'default_preview'}, 'MSP2Data': {'Loop': 'false'}}}, 'shopId': '8', 'price': {'currency': 'soft', 'salesPrice': 0.0, 'onSale': False}, 'lookUpId': '827e8ca7-60de-4d07-b0ae-61154d579b77'}

Html

来源：https://stackoverflow.com/questions/77152783/how-do-i-format-an-html-url-that-doesnt-have-formatting-in-python

2条答案

按热度按时间

2o7dmzc51#

该端点只返回JSON，因此只需调用resp.json()。

import requests
import pprint

IDNum = 688
url = f'https://us.mspapis.com/shopinventory/v1/shops/listings/{IDNum}'
resp = requests.get(url)
resp.raise_for_status()
data = resp.json()
pprint(data["item"])  # or whatever

赞(0）回复(0）举报 2023-09-28

6vl6ewon2#

如果我理解正确的话，你想在多行上打印JSON：

import json
import requests

IDNum = 688
url = "https://us.mspapis.com/shopinventory/v1/shops/listings/{}"

page = requests.get(url.format(IDNum))
data = page.json()

# print the Json on multiple lines:
print(json.dumps(data, indent=4))

图纸：

{
    "id": "688",
    "item": {
        "id": "912",
        "type": "item",
        "singlePurchase": true,
        "objectSource": "curatedcontentitemtemplates",
        "objectId": "596",
        "resourceIdentifiers": [
            {
                "type": "name",
                "key": "Neutral"
            },
            {
                "type": "graphics",
                "key": "default"
            }
        ],
        "tags": [
            {
                "hidden": false,
                "id": "62",
                "resourceIdentifiers": [
                    {
                        "type": "label",
                        "key": "TAG_MOODS"
                    },
                    {
                        "type": "graphics",
                        "key": "moods"
                    }
                ],
                "type": "category.animation",
                "gameId": "5lxc"
            },
            {
                "hidden": false,
                "id": "85",
                "resourceIdentifiers": [
                    {
                        "type": "label",
                        "key": "TAG_MOODS_BASIC"
                    },
                    {
                        "type": "graphics",
                        "key": "moods"
                    }
                ],
                "type": "subcategory.animation.62",
                "gameId": "5lxc"
            },
            {
                "hidden": false,
                "id": "168",
                "resourceIdentifiers": [
                    {
                        "type": "label",
                        "key": "TAG_FREE"
                    },
                    {
                        "type": "graphics",
                        "key": "free"
                    }
                ],
                "type": "category.artbooks",
                "gameId": "5lxc",
                "lookUpId": "tag_free"
            }
        ],
        "lookUpId": "f4b919d8-15f9-4dae-964f-bd9262db0a5b",
        "additionalData": {
            "NebulaData": {
                "DefaultColors": "#FFFFFF",
                "Snapshot": "default_preview"
            },
            "MSP2Data": {
                "Loop": "false"
            }
        }
    },
    "shopId": "8",
    "price": {
        "currency": "soft",
        "salesPrice": 0.0,
        "onSale": false
    },
    "lookUpId": "827e8ca7-60de-4d07-b0ae-61154d579b77"
}

赞(0）回复(0）举报 2023-09-28

我来回答

如何格式化在Python中没有格式的HTML URL

2条答案

相关问题

热门标签

最新问答