pandas 如何从4D xarray数据集创建多索引矩阵?

nlejzf6q  于 2023-04-10  发布在  其他
关注(0)|答案(1)|浏览(96)

我目前在xarray中有一个4D数据集ds,看起来像这样:

<xarray.Dataset>
Dimensions:  (lat: 60, lon: 78, time: 216, pres: 395)
Coordinates:
  * lat      (lat) float32 0.5 1.5 2.5 3.5 4.5 5.5 ... 55.5 56.5 57.5 58.5 59.5
  * lon      (lon) float32 -45.5 -44.5 -43.5 -42.5 ... -69.5 -75.5 -74.5 -76.5
  * time     (time) float32 7.32e+05 7.32e+05 7.32e+05 ... 7.385e+05 7.385e+05
  * pres     (pres) float64 2.5 7.5 12.5 17.5 ... 1.962e+03 1.968e+03 1.972e+03
Data variables:
    var       (pres, lat, lon, time) float64 2.03e+03 2.03e+03 ... nan nan>

我的目标是把它变成一个看起来像这样的pandas df:

id   time  pres param  20.5-70.5  20.5-71.5  20.5-72.5
0     0     0   var       2085       2073       2057
1     0     1   var       2114       2156       2054
2     0     2   var       2039       2006       2179
3     1     0   var       2199       2144       2033
4     1     1   var       2056       2102       2191
5     1     2   var       2062       2033       2052
6     2     0   var       2001       2153       2170
7     2     1   var       2187       2120       2100
8     2     2   var       2138       2076       2002

这里有一个timepres的多索引,一个param列(因为我可能一次有多个变量),每个像素(如此配对的lat-lon)作为列标题,使得对于每个像素列I,具有对应于timepresvar值。我的分析的下一部分(包括一些矢量化)需要使用此格式。
我尝试了一些东西,包括stacked = ds.stack(coordinates=["lat", "lon"]),我认为这是我想做的事情的开始,然后做stacked.to_dataframe(),但后者解栈我的配对坐标。我想我错过了一些东西,但我不太确定如何去做?
任何帮助都非常感谢!
谢谢

mhd8tkvw

mhd8tkvw1#

(由于您没有提供示例,因此应该修改此示例)
使用stack/unstack重塑数据集:

import xarray as xr
import pandas as pd

ds = xr.tutorial.load_dataset('air_temperature')
df = ds.to_dataframe().rename_axis(columns='param').stack('param').unstack(['lat', 'lon'])
df.columns = [f"{lat}-{lon}" for lat, lon in df.columns]

输出:

>>> df
                           75.0-200.0  75.0-202.5  75.0-205.0  75.0-207.5  ...  15.0-322.5  15.0-325.0  15.0-327.5  15.0-330.0
time                param                                                  ...                                                
2013-01-01 00:00:00 air    241.199997  242.500000  243.500000  244.000000  ...  297.600006  296.899994  296.790009  296.600006
2013-01-01 06:00:00 air    242.099991  242.699997  243.099991  243.389999  ...  296.899994  296.399994  296.399994  296.600006
2013-01-01 12:00:00 air    242.299988  242.199997  242.299988  242.500000  ...  297.600006  297.000000  297.000000  296.790009
2013-01-01 18:00:00 air    241.889999  241.799988  241.799988  242.099991  ...  298.199982  297.790009  298.000000  297.899994
2013-01-02 00:00:00 air    243.199997  243.099991  243.099991  243.299988  ...  297.699982  297.100006  297.399994  297.399994
...                               ...         ...         ...         ...  ...         ...         ...         ...         ...
2014-12-30 18:00:00 air    243.089996  243.389999  243.689987  243.789993  ...  297.989990  297.389984  296.889984  296.089996
2014-12-31 00:00:00 air    242.489990  242.389999  242.189987  241.689987  ...  297.290009  296.589996  295.989990  295.489990
2014-12-31 06:00:00 air    243.489990  242.989990  242.089996  240.689987  ...  297.089996  296.089996  295.790009  295.790009
2014-12-31 12:00:00 air    245.789993  244.789993  243.489990  241.889999  ...  296.589996  295.690002  295.489990  295.190002
2014-12-31 18:00:00 air    245.089996  244.289993  243.289993  242.189987  ...  297.190002  296.489990  296.190002  295.690002

[2920 rows x 1325 columns]

>>> ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7

相关问题