我正在尝试使用API调用读取印度股市数据。在这个例子中,我使用了10只股票。我当前的程序是:
首先,我定义函数:
def get_prices(stock):
start_unix = 1669794745
end_unix = start_unix + 1800
interval = 1
url = 'https://priceapi.moneycontrol.com/techCharts/indianMarket/stock/history?symbol=' + str(stock) + "&resolution="+ str(interval) + "&from=" + str(start_unix) + "&to=" + str(end_unix)
url_data = requests.get(url).json()
print(url_data['c'])
接下来,我使用多线程。我对多线程的功能了解不多-我只是使用了网络上的教程中的代码。
from threading import Thread
stocks = ['ACC','ADANIENT','ADANIGREEN','ADANIPORTS','ADANITRANS','AMBUJACEM','ASIANPAINT','ATGL','BAJAJ-AUTO','BAJAJHLDNG']
threads = []
for i in stocks:
threads.append(Thread(target=get_prices, args=(i,)))
threads[-1].start()
for thread in threads:
thread.join()
运行上述程序所需的时间大约为250到300毫秒。实际上,我需要运行数千只股票的程序。有没有什么方法可以让它运行得更快?我正在苹果M1 8核心芯片上运行Jupyter笔记本电脑中的代码。任何帮助都将不胜感激。谢谢!
1条答案
按热度按时间qyyhg6bp1#
When scraping data from the web, most of the type is typically spent on waiting for server responses. In order to issue a large amount of queries and to get responses as fast as possible, issuing multiple queries in parallel is the right approach. To be as efficient as possible, you have to find the right balance between a large amount of parallel requests and being throttled (or blacklisted) by the remote service.
In your code, you are creating as many threads as there are requests. In general, you would want to limit the number of threads and reuse the threads once they have performed a request in order to save resources. This is called a thread pool.
Since you are using Python, another lighter alternative to multiple threads is to run parallel "I/O" tasks using
asyncio
in Python. Sample implementations of parallel requests using either a thread pool orasyncio
are shown in this Stack Overflow answer .Edit: here is an adapted example from your code using
asyncio
: