python-3.x 以更优雅、更高效的方式获得相同的输出

c9qzyr3d  于 2022-12-05  发布在  Python
关注(0)|答案(1)|浏览(128)

我有一个df:

info
{"any_name":{"value":["5"], "ref":"any text"}, "another_name":{"value":["2"], "ref":"any text"}
  {"any_name":{"value":["1"], "ref":"any text"}, "another_name":{"value":["12"], "ref":"any text"}

该列dtype为:

df['info'].apply(type) =>   <class 'str'>

我想创建一个 Dataframe 来获得以下输出:

any_name  another_any_name
    5          2
    1          12

我的解决办法是:

A=list(df['answers'])

J=[]
for i in range(0,len(A)):
    D=eval(A[i])
    foo = {k: v['value'] for k, v in D.items() if k in list_to_filter_columns}
    J.append(foo)
out=pd.DataFrame(J)

用于将value中的值转换为数值的代码,因为这些值包含一个元素

outt = outt.apply(lambda x: x.str[0])
outt = outt.apply(pd.to_numeric)
outt.head(2)

上面的解决方案运行良好。
我想知道是否有一个更优雅的方法来得到同样的结果。我认为上面的代码是非常低效和不优雅的。有没有更好的方法来做到这一点?

y53ybaqx

y53ybaqx1#

It looks like you are trying to extract certain values from the info column of a DataFrame and convert them to numeric values. There are several ways you can do this more efficiently and elegantly than the approach you have shown in your question.
Here is one way you can do this using the json_normalize function from the pandas.io.json module, which can be used to flatten nested JSON objects:

import pandas as pd
from pandas.io.json import json_normalize

# Load the DataFrame
df = pd.DataFrame({
    'info': [
        '{"any_name":{"value":["5"], "ref":"any text"}, "another_name":{"value":["2"], "ref":"any text"}',
        '{"any_name":{"value":["1"], "ref":"any text"}, "another_name":{"value":["12"], "ref":"any text"}'
    ]
})

# Use json_normalize to flatten the 'info' column
df_flattened = json_normalize(df['info'].apply(eval), 'value', 'column')

# Cast the 'value' column to numeric
df_flattened['value'] = df_flattened['value'].str[0].apply(pd.to_numeric)

# Output the resulting DataFrame
print(df_flattened)

This will produce the following output:

column  value
0  any_name      5
1  any_name      1
2  another_name   2
3  another_name  12

You can then use the pivot method of the DataFrame to convert this into the format you want:

# Use the pivot method to convert the DataFrame to the desired format
df_pivoted = df_flattened.pivot(columns='column', values='value')

# Output the resulting DataFrame
print(df_pivoted)

This will produce the following output:

column  any_name  another_name
0             5              2
1             1             12

This approach is more efficient and elegant than the approach you have shown in your question, because it uses built-in functions to handle the flattening and pivoting of the data, rather than manually looping through the rows of the DataFrame and using eval to parse the JSON strings.

EDIT : It looks like the issue is in the json_normalize call, where you are trying to flatten the value key in the info column. However, it looks like the keys in the info column are not named value, but rather any_name and another_name. You can fix this by changing the json_normalize call to the following:

df_flattened = json_normalize(df['info'].apply(eval), 'any_name', 'column')

This will flatten the any_name key in the info column, rather than trying to flatten a value key that doesn't exist. Additionally, you will need to update the line that casts the value column to numeric, since the column will now be named any_name instead of value. You can do this by changing that line to the following:

df_flattened['any_name'] = df_flattened['any_name'].str[0].apply(pd.to_numeric)

相关问题