具有多索引 Dataframe 的Pandas评估

wgx48brx 于 2023-03-11 发布在其他

关注(0)|答案(3)|浏览(97)

考虑一个多索引 Dataframe df：

A       bar                flux          
B       one     three       six     three
x  0.627915  0.507184  0.690787  1.166318
y  0.927342  0.788232  1.776677 -0.512259
z  1.000000  1.000000  1.000000  0.000000

我想使用eval从('flux', six')中减去('bar', 'one')。eval语法支持这种类型的索引吗？

pandas

来源：https://stackoverflow.com/questions/28422081/pandas-eval-with-multi-index-dataframes

3条答案

按热度按时间

xzv2uavs1#

不使用eval，也可以使用等效的标准Python符号来完成此操作：

df['bar']['one'] - df['flux']['six']`

看看这个参考资料，下面是一个例子，基于你问题中的对象：

from pandas import DataFrame, MultiIndex

# Create the object
columns = [
    ('bar', 'one'),
    ('bar', 'three'),
    ('flux', 'six'),
    ('flux', 'three')
]
data    = [
    [0.627915, 0.507184, 0.690787, 1.166318],
    [0.927342, 0.788232, 1.776677, -0.512259],
    [1.000000, 1.000000, 1.000000, 0.000000]
]
index   = MultiIndex.from_tuples(columns, names=['A', 'B'])
df      = DataFrame(data, index=['x', 'y', 'z'], columns=index)

# Calculate the difference
sub = df['bar']['one'] - df['flux']['six']
print sub

# Assign that difference to a new column in the object
df['new', 'col'] = sub
print df

相应的结果为：

A       bar                flux                 new
B       one     three       six     three       col
x  0.627915  0.507184  0.690787  1.166318 -0.062872
y  0.927342  0.788232  1.776677 -0.512259 -0.849335
z  1.000000  1.000000  1.000000  0.000000  0.000000

赞(0）回复(0）举报 2023-03-11

ar5n3qh52#

下面是一个变通方法的示例，它允许您在DataFrame eval函数中使用元组索引。我知道这是一个旧的方法，但我无法找到原始问题的好答案。

from pandas import DataFrame, MultiIndex
import re

LEVEL_DELIMITER = "___"

def tuples_to_str(t):
    return LEVEL_DELIMITER.join(t)

def str_to_tuples(s):
    return tuple(s.split(LEVEL_DELIMITER))

def flatten_mi_var_expression(e):
    # Find match to multi-index variables and flatten
    tuple_re = r'\(.*?,.*?\)'
    for tuple_str in re.findall(tuple_re, e):
        e = e.replace(tuple_str, tuples_to_str(eval(tuple_str)))
    return e

# Create the object
columns = [
    ('bar', 'one'),
    ('bar', 'three'),
    ('flux', 'six'),
    ('flux', 'three')
]
data = [
    [0.627915, 0.507184, 0.690787, 1.166318],
    [0.927342, 0.788232, 1.776677, -0.512259],
    [1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)

# Desired multi-index variable expression (using tuple indexes)
new_col = ('new', 'col')
mi_expression = f"{new_col} = {('flux', 'six')} + {('bar', 'one')}"

# Capture the original multi-index column object
mi_cols = df.columns

# Flatten the multi-index columns
df.columns = [LEVEL_DELIMITER.join(col) for col in df.columns.values]

# Convert multi-index variable expression to flattened indexing
flat_expression = flatten_mi_var_expression(mi_expression)

# Evaluate
df.eval(flat_expression, inplace=True)

# Append the new column to the original multi-index instance and assign to the DataFrame
df.columns = MultiIndex.from_tuples(mi_cols.tolist() + [new_col], names=mi_cols.names)

print(df)

这应提供以下内容。

A       bar                flux                 new
B       one     three       six     three       col
x  0.627915  0.507184  0.690787  1.166318  1.318702
y  0.927342  0.788232  1.776677 -0.512259  2.704019
z  1.000000  1.000000  1.000000  0.000000  2.000000

不确定使用python eval（实际上并不需要）是否安全，但这个例子看起来是可行的。

赞(0）回复(0）举报 2023-03-11

nom7f22z3#

对于2级多索引，可以用途：

f"`('{level1}', '{level2}')`"

你的例子是

df.eval("`('bar', 'one')` = `('flux', 'six')`", inplace=True)

赞(0）回复(0）举报 2023-03-11

我来回答

具有多索引 Dataframe 的Pandas评估

3条答案

相关问题

热门标签

最新问答