pandas 突然,这个Python程序无法从bseindia API获取数据,有什么方法可以调试API中的更改并传递正确的参数吗?

r7knjye2  于 2023-08-01  发布在  Python
关注(0)|答案(2)|浏览(105)

这个代码已经工作了很多年。几年前,我也遇到过类似的问题,API发生了变化,我不记得我是如何调试它的,并看到添加了页码的额外参数。现在似乎又有一些轻微的变化,我的程序是无法获取数据。任何帮助都将不胜感激。

import requests
import pandas as pd
import sys
import numpy as np
from pandas.io.json import json_normalize
pdate ="20230721"               # starting date
date ="20230724"            # till this date
url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

payload = {
'Pageno': 1,
'strCat': '-1',
'strPrevDate': pdate,
'strScrip': '',
'strSearch': 'P',
'strToDate':   date,
'strType': 'C'}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"Fetching page {payload['Pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload['Pageno'] += 1
        # every thing we want to do

    else:
        should_fetch_next_page = False

df = pd.DataFrame(data)
print(df)

字符串

ulmd4ohb

ulmd4ohb1#

API url更改,HTTP header Referer中的服务器需要更改:

import requests
import pandas as pd

pdate = "20230721"  # starting date
date = "20230724"  # till this date
url = "https://api.bseindia.com/BseIndiaAPI/api/AnnSubCategoryGetData/w"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
    "Referer": "https://www.bseindia.com/",
}

payload = {
    "pageno": 1,
    "strCat": "-1",
    "strPrevDate": pdate,
    "strScrip": "",
    "strSearch": "P",
    "strToDate": pdate,
    "strType": "C",
    "subcategory": "",
}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"Fetching page {payload['pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload["pageno"] += 1
        # every thing we want to do

    else:
        should_fetch_next_page = False

df = pd.DataFrame(data)
print(df)

字符串
图纸:

Fetching page 1 ...
Fetching page 2 ...
Fetching page 3 ...

...


注意:为了调试将来的问题,此API请求的基本URL是https://www.bseindia.com/corporates/ann.html。因此,在浏览器中打开URL,打开Web Developer Tools -> Network选项卡,然后重新加载页面。
您应该看到API url+所需的参数/HTTP头/cookie/等等...

gab6jxml

gab6jxml2#

嘿@Andrej Kesely你的代码运行良好,谢谢。你是如何解决这个问题的,因为bse已经阻止了对API的直接访问,以及如何直接访问bse提供的api链接,当我点击了url(解释:https://api.bseindia.com/BseIndiaAPI/api/AnnSubCategoryGetData/w?pageno=1&strCat=-1&strPrevDate=20230731&strScrip=&strSearch=P&strToDate=20230731&strType=C&subcategory=)它将我重定向到另一个站点(解释:https://www.bseindia.com/members/showinterest.aspx
请您分享一下您的意见,谢谢

相关问题