pandas 如何使用xarray将数据放入文件netCDF的数据变量部分

5n0oy7gb  于 2023-04-28  发布在  Etcd
关注(0)|答案(1)|浏览(281)

问题

我正在编写一个程序,它读取Excel文件并提取数据并将其放入netcdf文件中,问题是我无法将数据放入数据变量部分。我使用xarray和pandas来实现这一点。下面是我的代码:

# Ouvrir le xlsx
df = pd.read_excel('./data/data_example/dd.xlsx')

col_names = df.columns.tolist()

col_names_clean = []

col_units = []

col_names_no_unit = []

for col_name in col_names:

    if "(" in col_name and "Beta" not in col_name:
        col_unit = col_name[col_name.find("(")+1:col_name.find(")")]
        col_name = col_name.replace("("+col_unit+")", "")
        col_units.append(col_unit)
    else:
        col_names_no_unit.append(col_name)

global_attributes = {
    'title': 'Data from Guyane',
    'institution': 'CNRS',
    'source': 'Data from Guyane',
    'history': 'Created by the CNRS',
    'references': 'https://www.cnrs.fr/',
    'comment': 'Data from Guyane',
    'Conventions': 'CF-1.6',
}

col_values1 = []
for col_name in col_names_clean:
    col_values1.append(sorted(list(set(df[col_name+"("+col_names_units[col_name]+")"]))))
    
col_values2 = []
# On parcours la liste des noms de colonnes
for col_name in col_names_no_unit:
    # On ajoute les valeurs de la colonne dans la liste des valeurs
    col_values2.append(sorted(list(set(df[col_name]))))

# On créé un dictionnaire pour stocker les noms des colonnes et les valeurs 
col_names_values2 = dict(zip(col_names_no_unit, col_values2))

# On créé la dataset
ds = xr.Dataset(
    coords = {
        # On prend les valeurs de col_names_values en fonction du nom
        col_name: (col_name, col_names_values[col_name]) for col_name in col_names_clean
    },  
# Here the problem :
    data_vars= {col_name: (col_name, col_names_values2[col_name]) for col_name in col_names_no_unit},
    attrs = global_attributes
)

这是我运行代码时得到的结果,我只想让它在坐标部分说:时间,压力,温度,萨尔,CHL。我不明白为什么这些变量会被放在那里。有人有解决方案吗?

<xarray.Dataset>
Dimensions:      (_G100_: 1, _G125_: 1, _G150_: 1, bbp_VSF_532: 1113,
                  Beta(488): 146, Beta(510): 38, Beta(532): 1, Beta(595): 1,
                  Beta(650): 194, Beta(676): 1, Beta(715): 1, Beta(765): 1,
                  Beta(865): 1, bbp488: 1148, bbp510: 1136, bbp532: 1113,
                  bbp595: 1081, bbp650: 1130, bbp676: 1101, bbp715: 1111,
                  bbp765: 258, bbp865: 258, CDOM-ppb: 283, a401: 1175,
                  ...
                  c716: 1064, c717: 1064, c718: 1064, c719: 1066, c720: 1066,
                  c721: 1066, c722: 1066, c723: 1068, c724: 1068, c725: 1068,
                  c726: 1068, c727: 1075, c728: 1075, c729: 1075, c730: 1075,
                  c731: 1083, c732: 1083, c733: 1083, c734: 1083, c735: 1011,
                  c736: 1090, Time: 1317, Pres: 126, Temp: 201, Cond: 219,
                  Sal: 258, CHL: 54)
Coordinates: (12/701)
  * _G100_       (_G100_) float64 55
  * _G125_       (_G125_) float64 55
  * _G150_       (_G150_) float64 55
  * bbp_VSF_532  (bbp_VSF_532) float64 55
  * Beta(488)    (Beta(488)) float64 55
  * Beta(510)    (Beta(510)) float64 55
...
  * Time         (Time) int32 55 55
  * Pres         (Pres) float64 55
  * Temp         (Temp) float64 55
  * Cond         (Cond) float64 55
  * Sal          (Sal) float64 55
  * CHL          (CHL) float64 55
Data variables:
    *empty*
Attributes:
    title:        Data from Guyane
    institution:  CNRS
    source:       Data from Guyane
    history:      Created by the CNRS
    references:   https://www.cnrs.fr/
    comment:      Data from Guyane
    Conventions:  CF-1.6

解决方案:

你只需要给你的变量加上一个“描述”,这样它就可以放在data_variable部分。比如:

for i in col_names_clean:
        # On ajoute les valeurs de la colonne dans le dictionnaire coords
        coords[i] = ("Coords", col_names_values[i])
    
    # On créé une dataset avec toutes les valeurs et on y met dans la section Data Variables
    ds = xr.Dataset(
        coords=coords,
        
        data_vars={
            i: ("Variables", col_names_values2[i]) for i in col_names_no_unit
        },
        attrs=global_attributes
    )

这里是终端中的结果:

<xarray.Dataset>
Dimensions:      (Variables: 1317, Coords: 1317)
Coordinates:
    Time         (Coords) int32 55
    Pres         (Coords) float64 55
    Temp         (Coords) float64 55 29.98
    Cond         (Coords) float64 55
    Sal          (Coords) float64 55 26.95
    CHL          (Coords) float64 55
Dimensions without coordinates: Variables, Coords
Data variables: (12/695)
    _G100_       (Variables) float64 55
    _G125_       (Variables) float64 55 
    _G150_       (Variables) float64 55
    bbp_VSF_532  (Variables) float64 55
    Beta(488)    (Variables) float64 55
    Beta(510)    (Variables) float64 55
    ...           ...
    c731         (Variables) float64 355
    c732         (Variables) float64 55
    c733         (Variables) float64 55
    c734         (Variables) float64 55
    c735         (Variables) float64 55
    c736         (Variables) float64 55
Attributes:
    title:        Data from Guyane
    institution:  CNRS
    source:       Data from Guyane
    history:      Created by the CNRS
    references:   https://www.cnrs.fr/
    comment:      Data from Guyane
    Conventions:  CF-1.6
dzjeubhm

dzjeubhm1#

我不确定我是否正确理解了你的问题,但似乎你混淆了数据变量和维度。当使用xarray.Dataset时,你应该将维度作为参数,而不是重复col_name
因此,而不是:

data_vars= {col_name: (col_name, col_names_values2[col_name]) for col_name in col_names_no_unit}

我建议你用途:

data_vars= {col_name: (["Time", "Pres", "Temp Cond", "Sal", "CHL"], col_names_values2[col_name]) for col_name in col_names_no_unit}

或不久

data_vars= {col_name: (col_names_clean, col_names_values2[col_name]) for col_name in col_names_no_unit}

如果我理解得好的话。
然而,你可以通过指出你的尺寸、坐标和变量以及它们与col_names_cleancol_names_no_unit的关系来更精确地回答你的问题。

相关问题