pandas 按元素应用函数

ebdffaop  于 2023-10-14  发布在  其他
关注(0)|答案(3)|浏览(94)

在xarray中,如何将非矢量化、非通用函数应用于DataArrray,以便将值中的每个元素Map到新的元素?自定义函数应采用标量值并返回标量值:

import numpy as np
import xarray as xr

def custom_function(x):
    # imagine some non-vectorized, non-numpy-ufunc stuff
    return type(x) # dummy example function

data = xr.DataArray([1, 2, 3, 4], dims='x')

# this doesn't work, custom_function actually gets send the whole array [1, 2, 3, 4]
# xr.apply_ufunc(custom_function, data)

# I'd expect something like this, where this is basically a loop on all elementss
# xr.apply(custom_function, data)

# in pandas, I would just use the .apply or .map method of Series
import pandas as pd
s = data.to_series()
s.apply(custom_function)
s.map(custom_function)
xuo3flqw

xuo3flqw1#

您可以通过使用apply_ufunc方法和vectorize=False,将自定义函数逐元素应用于DataArray
就像这样:

import numpy as np
import xarray as xr

def custom_function(x):
    # Your custom non-vectorized function here
    scalar_output = x * 2
    return scalar_output

data = xr.DataArray([1, 2, 3, 4], dims='x')

# Apply the custom function element-wise
result = xr.apply_ufunc(custom_function, data, vectorize=False)

print(result)

输出量:

<xarray.DataArray (x: 4)>
array([2, 4, 6, 8])
Dimensions without coordinates: x

**[编辑]**您可以通过将DataArray转换为pandas Series,应用您的自定义函数,然后将其转换回DataArray来实现这一点。你可以这样做:

import numpy as np
import xarray as xr

def custom_function(x):
    scalar_output = x * 2
    return scalar_output

data = xr.DataArray([1, 2, 3, 4], dims='x')

# Convert DataArray to pandas Series
data_series = data.to_series()

# Apply custom function to each element
result_series = data_series.apply(custom_function)

# Convert the result back to a DataArray
result = xr.DataArray(result_series, dims='x')

print(result)

输出量:

<xarray.DataArray (x: 4)>
array([2, 4, 6, 8], dtype=int64)
Coordinates:
  * x        (x) int64 0 1 2 3
ewm0tg9j

ewm0tg9j2#

或者另一个头脑 Storm 的想法:列表理解

PS:

  • 虽然list comprehension对于简单的情况很简单,但它不会自动处理元数据或维度信息。如果这些方面对您的用例很重要,则需要手动管理它们。
  • 如果您主要使用xarray DataArrays并希望维护维度和元数据信息,则apply_ufunc是更合适的选择。但是,对于更一般的情况,或者当您需要对操作进行更细粒度的控制时,列表解析可能是一个合适的选择。
    片段:
result_values = [custom_function(x) for x in data.values]
result = xr.DataArray(result_values, dims='x')

示例代码:

import xarray as xr

def custom_function(x):
    scalar_output = x * 2
    return scalar_output

data = xr.DataArray([1, 2, 3, 4], dims='x')

# Apply the custom function element-wise using a list comprehension
result_values = [custom_function(x) for x in data.values]
result = xr.DataArray(result_values, dims='x')

print(result)

输出:

<xarray.DataArray (x: 4)>
array([2, 4, 6, 8])
Dimensions without coordinates: x
vzgqcmou

vzgqcmou3#

按照建议使用apply_ufunc,我得到了vectorize=True的预期结果:

data = xr.DataArray([1, 2, 3, 4], dims='x')
# xr.apply_ufunc(custom_function, data) # fails, equivalent to vectorize=False
xr.apply_ufunc(custom_function, data, vectorize=True) # working
<xarray.DataArray (x: 4)>
array([<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>],
      dtype=object)
Dimensions without coordinates: x

说明:using vectorize=Truenumpy.vectorize Package 了custom_function,基本上把它变成了一个for循环,这正是我所需要的。

相关问题