pandas 用一个键对多列排序

67up9zun  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(116)

我有一个下面的代码,它从yfinance获取数据的单个股票代码(GOOGL),然后按收盘价(升序)排序此数据.

import bs4 as bs
import requests
import yfinance as yf
import datetime
import pandas as pd
import time
import numpy as np

starttimer = time.time()
resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable'})

tickers = []

for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text
    tickers.append(ticker)

tickers = [s.replace('\n', '') for s in tickers]

start = datetime.datetime(2013, 2, 27)
end = datetime.datetime(2023, 3, 3)
data = yf.download("GOOGL MSFT AAPL", start=start, end=end)

df = data.stack().reset_index().rename(index=str, columns={"level_1": "Symbol"}).sort_values(['Symbol','Date'])
df['Date'] = df['Date'].apply(lambda x: x.strftime('%d%m%Y'))
df.set_index('Date', inplace=True)
df = pd.DataFrame(data)
print(df)
eodPrices   = pd.DataFrame(data=data);
percentageChange = round(eodPrices.pct_change()*100,2).shift(-1);
percentageChange.sort_values(by=['Close'],inplace=True)
print(percentageChange)
endtimer = time.time()
elapsed_time = endtimer - starttimer
print('Execution time: ', elapsed_time, 'seconds' )

但是,当我想获得多个股票代码的排序列时

data = yf.download("GOOGL MSFT AAPL", start=start, end=end)

我收到以下错误:

数值错误:列标签“Close”不唯一。对于多索引,该标签必须是包含与每个级别对应的元素的元组。

我的目标是排序股票GOOGL的收盘价,股票MSFT的收盘价和股票AAPL的收盘价,然后打印这些'收盘'列(只有这3列)。

e0bqpujr

e0bqpujr1#

要只获取Close列,可以用途:

df1 = yf.download("GOOGL MSFT AAPL", start=start, end=end)['Close']

要对每列的值进行排序(是否忽略日期索引?),可以用途:

df2 = pd.concat([df[col].sort_values().reset_index(drop=True)
                    for col in df.columns], axis=1)

输出:

>>> df1
                  AAPL      GOOGL        MSFT
Date                                         
2013-02-26   16.034643  19.773024   27.370001
2013-02-27   15.877500  20.014515   27.809999
2013-02-28   15.764286  20.050051   27.799999
2013-03-01   15.373929  20.174925   27.950001
2013-03-04   15.001786  20.558058   28.150000
...                ...        ...         ...
2023-02-24  146.710007  89.129997  249.220001
2023-02-27  147.919998  89.870003  250.160004
2023-02-28  147.410004  90.059998  249.419998
2023-03-01  145.309998  90.360001  246.270004
2023-03-02  145.910004  92.000000  251.110001

[2522 rows x 3 columns]

>>> df2
            AAPL       GOOGL        MSFT
0      13.947500   19.166918   27.370001
1      14.001786   19.390640   27.799999
2      14.063571   19.460711   27.809999
3      14.161786   19.567818   27.870001
4      14.216786   19.583584   27.910000
...          ...         ...         ...
2517  179.380005  148.852005  341.269989
2518  179.449997  148.918503  341.950012
2519  179.699997  148.926498  342.450012
2520  180.330002  149.031006  342.540009
2521  182.009995  149.838501  343.109985

[2522 rows x 3 columns]
knpiaxh1

knpiaxh12#

首先,可以重置索引(不再是键)并将其设置为列,如下所示:

df = yf.download("GOOGL MSFT AAPL", start=start, end=end)['Close'].reset_index()

要对每列单独排序,可以使用apply()lambda函数。

df1 = df.apply(lambda x: x.sort_values().reset_index(drop=True))

相关问题