pandas 使用打印输出生成 Dataframe

5us2dqdw  于 2023-04-04  发布在  其他
关注(0)|答案(1)|浏览(107)

我想根据单元的字符串名称创建 Dataframe 。考虑:

import pandas as pd

# Imports our data

df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv', 
sep=';', parse_dates=['Year'], index_col='Year')

# Sorts our data
df = df.sort_values(by=['State', 'Year'])

# Generates a unique ID for our units-- why doesn't it begin from 1?
df['id'] = df.State.map(hash)


# Stores the treated units names in a list
treatedunit = df[df['treated']==1].State.unique().tolist()

# Checks we only have ONE treated unit

assert len(treatedunit) == 1

# Extracts the unique ID of the treated unit

df[df['treated']==1]['id'].describe().loc['min']

# Extract its name from the list

res = [treatedunit[0]]

# Putting it in 'quotes'

trname = print(str(res)[1:-1])

# Now we create a dataframe based on its name

df_treat = df[df['State'] == trname]

df_treat

我希望Python生成一个新的数据框,其中只包含加州的值。

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv', 
sep=';', parse_dates=['Year'], index_col='Year')

df = df.sort_values(by=['State', 'Year'])

df['id'] = df.State.map(hash)

df[df['treated']==1]['id'].describe().loc['min']

treatedunit = df[df['treated']==1].State.unique().tolist()

res = [treatedunit[0]]

df_treat = df[df['State'] == 'California']

df_treat

Python给了我我所期望的。但是我必须推广这一点,因为处理的感兴趣的单元不总是加州。那么,我如何才能让Python知道trname'California'而不是空的 Dataframe ?

fae0ux8s

fae0ux8s1#

从技术上讲,是有可能做到的

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv', 
sep=';', parse_dates=['Year'], index_col='Year')

df = df.sort_values(by=['State', 'Year'])

df['id'] = df.State.map(hash)

trid = df[df['treated']==1]['id'].describe().loc['min']

df_treat = df[df['id'] == trid]
df_treat

但是,我仍然想知道如何使用字符串。

相关问题