pandas 为什么我在左连接后得到NaN?

chhqkbe1  于 2023-01-19  发布在  其他
关注(0)|答案(1)|浏览(198)

正在做Udacity ML课程。在df_final.join(df_temp, how="left")之后得到NaN,但是在课程venv中一切都工作得很好。可能是哪里出了问题?
pidoss.:我也尝试了df_temp.index = pd.to_datetime(df_temp.index, utc=True)为每个,似乎没有效果。
我们在这里加载数据。

import yfinance as yf

tickets = ["AAPL", "AMD", "GOOG", "GLD"]

def download_tickets(tickets):
    for ticket in tickets:      
        df = yf.Ticker(ticket)
        df = df.history(period="max")
        df.to_csv(symbol_to_path(ticket))

这里我们创建从符号到csv的路径。

def symbol_to_path(symbol, base_dir="data"):
    if not os.path.exists(base_dir):
        os.mkdir(base_dir)
    return os.path.join(base_dir, "{}.csv".format(str(symbol)))

这里我们连接数据。

# Create empty df with specified dates. 
    start_date = "2022-01-01"
    end_date = "2023-01-01"
    dates = pd.date_range(start_date, end_date)
    df_final = pd.DataFrame(index=dates)
    df_final.index = pd.to_datetime(df_final.index, utc=True)
    
    # Combine all with df_final
    for ticket in tickets:
        file_path = symbol_to_path(symbol)
        df_temp = pd.read_csv(file_path, parse_dates=True, index_col="Date",
                              usecols=["Date", "Close"], na_values=["nan"])
        df_temp = df_temp.rename(columns={"Close": symbol})
        df_final = df_final.join(df_temp, how="left")
        print(df_temp.head())
        print(df_final.head())

    return df_final

输出:
As you see, float converts to NaN for left join
For right join we get data, but not for the range 2022-01-01/2023-01-01
Inner join
Outer join
谢谢你。
通用PD:Data after 2021

wrrgggsh

wrrgggsh1#

问题出在时区上,Tickets数据在-05:00(我假设纽约),而您在UTC +00:00生成df_final,当您连接时,pandas无法在索引中找到交集。
对我来说,最简单的解决方案是更改df_final timezone(tz),即使用正确的tz生成

# Create empty df with specified dates. 
start_date = "2022-01-01"
end_date = "2023-01-01"
dates = pd.date_range(start_date, end_date, tz='-05:00') # change here
df_final = pd.DataFrame(index=dates)
#     df_final.index = pd.to_datetime(df_final.index, utc=True) # NOT needed anymore

相关问题