如何在PANDA中为从列表追加的数据在一列中创建多列

sqyvllje  于 2022-09-21  发布在  其他
关注(0)|答案(1)|浏览(83)

我正在从雅虎财经搜集数据所有的数据搜集工作都很顺利。但是,当我想要将附加的列表存储到可索引的 Dataframe 中时,它会返回一个空的 Dataframe 。但是,当我将数据存储在不可索引的 Dataframe 中时,它会存储数据。

当我打印Temp时,我可以看到数据,即使我将Temp转换为DataFrame,它也会成功转换。但是,当我运行financial_dir[ticker]=temp.append(soup.find('div', {'class' : "D(tbrg)"}).find_all('div')[i].get_text(separator='|').split('|'))时,它不会创建可索引的 Dataframe ,而是运行空的 Dataframe 。

我想为不同的股票创建可调用的financial_dir like this,例如,当我运行Financial_dir[‘INDUSINDBK.NS’]时,它应该给出INDUSINDBK.NS的 Dataframe ,如图所示。如有任何帮助,我们将不胜感激

“”“

import requests
from bs4 import BeautifulSoup
import pandas as pd

tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
           'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
           'ULTRACEMCO.NS']

financial_dir = pd.DataFrame()
temp = []
for ticker in tickers:
    url = 'https://finance.yahoo.com/quote/'+ticker+'/financials?p='+ticker
    page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'})

# page_content = page.content

    soup = BeautifulSoup(page.text, 'html.parser')
    a = list(range(0,2000,1))

# while IndexError(True):

    try:
        for i in a:
            financial_dir[ticker]=temp.append(soup.find('div', {'class' : "D(tbrg)"}).find_all('div')[i].get_text(separator='|').split('|'))
    except:
        pass

temp
data5 = pd.DataFrame(temp)
financial_dir

“”“

bkhjykvo

bkhjykvo1#

试试这个:

1.创建函数,为每个自动收报机返回一个 Dataframe :

def f(ticker):
    url = 'https://finance.yahoo.com/quote/'+ticker+'/financials?p='+ticker
    page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'})
    soup = BeautifulSoup(page.text, 'html.parser')
    ticker_header = [i.text for i in soup.find('div', {'class' : "D(tbhg)"}).find('div', {'class' : 'D(tbr)'}).find_all('div', {'class': 'D(ib)'})]
    values = [i.text for i in soup.find('div', {'class' : "D(tbrg)"}).find_all('div', {'class': 'Ta(c)'})]
    ticker_index = [i.text for i in soup.find('div', {'class' : "D(tbrg)"}).find_all('div', {'class': 'D(ib)'})]
    chunk_size = 5
    list_chunked = [values[i:i + chunk_size] for i in range(0, len(values), chunk_size)]
    df = pd.DataFrame(list_chunked, columns=ticker_header[1:])
    df_index = pd.Index(ticker_index)
    df = df.set_index(df_index)
    df['ticker'] = ticker
    df = df.reset_index()
    return df

f('TATACONSUM.NS') #return dataframe
    index               ttm         3/31/2022   3/31/2021   3/31/2020   3/31/2019   ticker
0   Total Revenue       126,653,800 123,470,100 115,832,200 95,966,000  72,093,500  TATACONSUM.NS
1   Cost of Revenue     74,531,800  73,265,100  70,742,800  55,775,900  41,540,400  TATACONSUM.NS
2   Gross Profit        52,122,000  50,205,000  45,089,400  40,190,100  30,553,100  TATACONSUM.NS
3   Operating Expense   37,051,000  35,650,800  32,199,200  29,685,700  24,003,600  TATACONSUM.NS

# ...

f('HINDALCO.NS') #return dataframe
    index               ttm             3/31/2022   3/31/2021   3/31/2020   3/31/2019   ticker
0   Total Revenue       2,104,160,000   1,937,560,000   1,310,090,000   1,171,400,000   1,297,455,700   HINDALCO.NS
1   Cost of Revenue     1,531,870,000   1,398,820,000   953,430,000 859,720,000 958,279,000 HINDALCO.NS
2   Gross Profit        572,290,000     538,740,000 356,660,000 311,680,000 339,176,700 HINDALCO.NS
3   Operating Expense   312,010,000     298,540,000 240,410,000 215,740,000 230,666,900 HINDALCO.NS

# ...

1.然后您可以将每个工单保存在单独的CSV文件中,并分别使用每个工单:

tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
           'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
           'ULTRACEMCO.NS']

for ticker in tickers:
    f(ticker).to_csv(f'{ticker}.csv', index=False)

1.或者您可以将它们放在一个 Dataframe 中:

tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
           'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
           'ULTRACEMCO.NS']

all_dataframes = []
for ticker in tickers:
    print(ticker)
    all_dataframes.append(f(all_dataframes))

df_all = pd.concat(all_dataframes)

1.您还可以旋转您获得的 Dataframe :

df_all.pivot(index='ticker', columns='index', values=[ 'ttm', '3/31/2022', '3/31/2021', '3/31/2020', '3/31/2019',])

相关问题