我想根据单元的字符串名称创建 Dataframe 。考虑:
import pandas as pd
# Imports our data
df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv',
sep=';', parse_dates=['Year'], index_col='Year')
# Sorts our data
df = df.sort_values(by=['State', 'Year'])
# Generates a unique ID for our units-- why doesn't it begin from 1?
df['id'] = df.State.map(hash)
# Stores the treated units names in a list
treatedunit = df[df['treated']==1].State.unique().tolist()
# Checks we only have ONE treated unit
assert len(treatedunit) == 1
# Extracts the unique ID of the treated unit
df[df['treated']==1]['id'].describe().loc['min']
# Extract its name from the list
res = [treatedunit[0]]
# Putting it in 'quotes'
trname = print(str(res)[1:-1])
# Now we create a dataframe based on its name
df_treat = df[df['State'] == trname]
df_treat
我希望Python生成一个新的数据框,其中只包含加州的值。
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv',
sep=';', parse_dates=['Year'], index_col='Year')
df = df.sort_values(by=['State', 'Year'])
df['id'] = df.State.map(hash)
df[df['treated']==1]['id'].describe().loc['min']
treatedunit = df[df['treated']==1].State.unique().tolist()
res = [treatedunit[0]]
df_treat = df[df['State'] == 'California']
df_treat
Python给了我我所期望的。但是我必须推广这一点,因为处理的感兴趣的单元不总是加州。那么,我如何才能让Python知道trname
是'California'
而不是空的 Dataframe ?
1条答案
按热度按时间fae0ux8s1#
从技术上讲,是有可能做到的
但是,我仍然想知道如何使用字符串。