python-3.x 为ruamel.yaml提供隐式解析器和鲁棒的表示器,以支持人性化的元组和np数组

mzillmmw  于 2022-12-30  发布在  Python
关注(0)|答案(3)|浏览(171)

我有一个项目,用户需要手动编写一个yaml文件,这个yaml文件可能有一些条目被格式化为元组或numpy数组,我们在python内部区分元组和列表,为用户提供一个方便的界面,例如(1,2,3)不同于[1,2,3]。
为了方便起见,我希望用户能够直接使用括号输入元组,如name: (1,2,3)。我还希望用户能够通过输入other_name: np.array([1,2,3])之类的内容来提供numpy数组。我知道这不会保持numpy数组的精确数值精度,但我们认为这是一个公平的折衷方案,可以改善用户体验。
我使用ruamel.yaml,主要是因为它保留注解。
我设法做了一些有用的东西,但我觉得它不"正确",尤其是解析部分。基本上没有隐式解析器,我用一个肮脏的eval代替。我确实设法找到了一些关于隐式解析器的信息,在ruamel.yaml在线,在SO上,并通过翻找源代码,但我不能真正理解它。
下面是一个最小的工作示例,其中的注解指出了我觉得实现不够健壮或不干净的地方。

import sys
import numpy as np
import ruamel.yaml

def _tupleRepresenter(dumper, data):
    # TODO: Make this more robust
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', str(data))

def _numpyRepresenter(dumper, data):
    # TODO: Make this more robust
    as_string = 'np.array(' + np.array2string(data, max_line_width=np.inf, precision=16, prefix='np.array(', separator=', ', suffix=')') + ')'
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', as_string)

def load_yaml(file):
    # TODO: Resolve tuples and arrays properly when loading
    yaml = ruamel.yaml.YAML()
    yaml.Representer.add_representer(tuple, _tupleRepresenter)
    yaml.Representer.add_representer(np.ndarray, _numpyRepresenter)
    return yaml.load(file)

def dump_yaml(data, file):
    yaml = ruamel.yaml.YAML()
    yaml.Representer.add_representer(tuple, _tupleRepresenter)
    yaml.Representer.add_representer(np.ndarray, _numpyRepresenter)
    return yaml.dump(data, file)

yaml_file = """
test_tuple: (1, 2, 3)
test_array: np.array([4,5,6])
"""

data = load_yaml(yaml_file)

data['test_tuple'] = eval(data['test_tuple']) # This feels dirty
data['test_array'] = eval(data['test_array']) # This feels dirty

dump_yaml(data, sys.stdout)
# test_tuple: (1, 2, 3)
# test_array: np.array([4, 5, 6])

我欢迎任何关于使用适当的隐式解析器、健壮的表示器以及更像预期那样使用ruamel.yaml来改进此实现的帮助。

    • 更新**:

在注解的帮助下,我设法完成了几乎完全有效的工作。现在让我们忽略我需要编写一个合适的非eval解析器。
剩下的唯一问题是新标签现在被导出为字符串,因此在重新加载时不能正确地解释它们,而是变成字符串,并且不能承受多次往返。
我怎么能避免呢?
下面是一个简单的工作示例:

import sys
import numpy as np
import ruamel.yaml

# TODO: Replace evals by actual parsing
# TODO: Represent custom types without the string quotes

_tuple_re = "^(?:\((?:.|\n|\r)*,(?:.|\n|\r)*\){1}(?: |\n|\r)*$)"
_array_re = "^(?:(np\.|)array\(\[(?:.|\n|\r)*,(?:.|\n|\r)*\]\){1}(?: |\n|\r)*$)"
_complex_re = "^(?:(?:\d+(?:(?:\.\d+)?(?:e[+\-]\d+)?)?)?(?: *[+\-] *))?(?:\d+(?:(?:\.\d+)?(?:e[+\-]\d+)?)?)?[jJ]$"

def _tuple_constructor(self, node):
    return eval(self.construct_scalar(node))

def _array_constructor(self, node):
    value = node.value
    if not value.startswith('np.'):
        value = 'np.' + value
    return eval(value)

def _complex_constructor(self, node):
    return eval(node.value)

def _tuple_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', str(data))

def _array_representer(dumper, data):
    as_string = 'np.array(' + np.array2string(data, max_line_width=np.inf, precision=16, prefix='np.array(', separator=', ', suffix=')') + ')'
    as_string = as_string.replace(' ', '').replace(',', ', ')
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', as_string)

def _complex_representer(dumper, data):
    repr = str(data).replace('(', '').replace(')', '')
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', repr)

custom_types = {
    '!tuple':   {'re':_tuple_re,   'constructor': _tuple_constructor,   'representer':_tuple_representer,   'type': tuple,      'first':list('(')             },
    '!nparray': {'re':_array_re,   'constructor': _array_constructor,   'representer':_array_representer,   'type': np.ndarray, 'first':list('an')            },
    '!complex': {'re':_complex_re, 'constructor': _complex_constructor, 'representer':_complex_representer, 'type': complex,    'first':list('0123456789+-jJ')},
}

def load_yaml(file):
    yaml = ruamel.yaml.YAML()
    for tag,ct in custom_types.items():
        yaml.Constructor.add_constructor(tag, ct['constructor'])
        yaml.Resolver.add_implicit_resolver(tag, ruamel.yaml.util.RegExp(ct['re']), ct['first'])
        yaml.Representer.add_representer(ct['type'], ct['representer'])
    return yaml.load(file)

def dump_yaml(data, file):
    yaml = ruamel.yaml.YAML()
    for tag,ct in custom_types.items():
        yaml.Constructor.add_constructor(tag, ct['constructor'])
        yaml.Resolver.add_implicit_resolver(tag, ruamel.yaml.util.RegExp(ct['re']), ct['first'])
        yaml.Representer.add_representer(ct['type'], ct['representer'])
    return yaml.dump(data, file)

yaml_file = """
test_tuple: (1, 2, 3)
test_array: array([4.0,5+0j,6.0j])
test_complex: 3 + 2j
"""

data = load_yaml(yaml_file)

dump_yaml(data, sys.stdout)
# test_tuple: '(1, 2, 3)'
# test_array: 'np.array([4.+0.j, 5.+0.j, 0.+6.j])'
# test_complex: '3+2j'

谢谢大家!

j0pj023g

j0pj023g1#

ruamel.yaml中的represter用于以特定的方式将Python类型转储为YAML,通常不能使用它从YAML的某个部分创建Python类型。对于后者,需要一个构造函数。
构造函数可以是显式的,使用标签(例如!!float,可以在here中找到这些标签的列表),也可以是隐式的,即识别在ruamel.yaml中最初使用正则表达式在标量上完成的输入。
您的示例似乎需要扩展YAML的集合类型,使其超出mapping和dict的范围。我认为如果不重写大部分ruamel.yaml代码,您将无法成功。我建议您编写代码,从标记的输入构造numpy数组,首先如下所示:

test_tuple: !np.array [1, 2, 3]

即使你不想让你的用户用这种方式写东西,并且用这个标签转储numpy数组。
下一步是编写一个构造函数,匹配以左括号和右括号开头和结尾的标量。或以np.array([开始并以])结束(即使有一个[在那里,这并不开始一个序列时,在中间的标量。你应该保持跟踪的两种格式中的哪一种是用来构造NumPy数组(例如:使用某个唯一属性,该属性对tagged-、parathesis-或np.array- input具有三种状态)。您需要解析匹配的标量,但不需要使用eval()进行解析。对于替代方法,请查看!timestamp的处理。虽然您的示例只有整数数组,但您可能还需要查看接受浮点数的情况。
一旦有了这些附加的非标记构造函数,就可以根据属性调整NumPy数组的表示器以使用非标记格式。
上面的一个很好的例子是浮点数的往返处理(保留科学记数法)和前面提到的时间戳。

polkgigr

polkgigr2#

在评论中Anthon的帮助下,并阅读了他的ruamel.yaml源代码,我设法回答了我的问题。
我在这里给出了一个最小可行的解决方案作为参考,用一个实际的解析器代替evals可能是一个好主意,以避免漏洞,如果这是在一个你不信任的yaml文件上执行的话。

import sys
import numpy as np
import ruamel.yaml

from ruamel.yaml.comments import TaggedScalar

# TODO: Replace evals by actual parsing

_tuple_re = "^(?:\((?:.|\n|\r)*,(?:.|\n|\r)*\){1}(?: |\n|\r)*$)"
_array_re = "^(?:(np\.|)array\(\[(?:.|\n|\r)*,(?:.|\n|\r)*\]\){1}(?: |\n|\r)*$)"

def _tuple_constructor(self, node):
    return eval(self.construct_scalar(node))

def _array_constructor(self, node):
    value = node.value
    if not value.startswith('np.'):
        value = 'np.' + value
    return eval(value)

def _tuple_representer(dumper, data):
    repr = str(data)
    return dumper.represent_tagged_scalar(TaggedScalar(repr, style=None, tag='!tuple'))

def _array_representer(dumper, data):
    repr = 'np.array(' + np.array2string(data, max_line_width=np.inf, precision=16, prefix='np.array(', separator=', ', suffix=')') + ')'
    repr = repr.replace(' ', '').replace(',', ', ')
    return dumper.represent_tagged_scalar(TaggedScalar(repr, style=None, tag='!nparray'))

custom_types = {
    '!tuple':   {'re':_tuple_re,   'constructor': _tuple_constructor,   'representer':_tuple_representer,   'type': tuple,      'first':list('(')             },
    '!nparray': {'re':_array_re,   'constructor': _array_constructor,   'representer':_array_representer,   'type': np.ndarray, 'first':list('an')            },
}

def load_yaml(file):
    yaml = ruamel.yaml.YAML()
    for tag,ct in custom_types.items():
        yaml.Constructor.add_constructor(tag, ct['constructor'])
        yaml.Resolver.add_implicit_resolver(tag, ruamel.yaml.util.RegExp(ct['re']), ct['first'])
        yaml.Representer.add_representer(ct['type'], ct['representer'])
    return yaml.load(file)

def dump_yaml(data, file):
    yaml = ruamel.yaml.YAML()
    for tag,ct in custom_types.items():
        yaml.Constructor.add_constructor(tag, ct['constructor'])
        yaml.Resolver.add_implicit_resolver(tag, ruamel.yaml.util.RegExp(ct['re']), ct['first'])
        yaml.Representer.add_representer(ct['type'], ct['representer'])
    return yaml.dump(data, file)

yaml_file = """
test_tuple: (1, 2, 3)
test_array: array([4.0,5+0j,6.0j])
"""

data = load_yaml(yaml_file)

dump_yaml(data, sys.stdout)
# test_tuple: (1, 2, 3)
# test_array: np.array([4.+0.j, 5.+0.j, 0.+6.j])
vd2z7a6w

vd2z7a6w3#

我想通过添加一种不使用eval()的方法来扩展所提供的解决方案。它在某些情况下也可能失败,可能需要使用numpy数组的tolist()方法来转储。
避免eval()的基本方法是构造函数将数组或元组调用的内部解析为列表,一旦它是类似列表的形式,ruamel.yaml解析器就可以尝试将项加载为列表,然后调用np.array()tuple()就可以将对象变成我们想要的样子。
这里添加的是复数正则表达式、构造函数和表示器,这是yaml能够自己将numpy数组的内部作为列表加载所必需的,这样它就支持复数列表。
加载和转储功能也发生了一些轻微的外观变化。

import sys
import numpy as np
import ruamel.yaml
import re

from ruamel.yaml.comments import TaggedScalar

def _complex_re_gen():
    '''
    Because it is complicated, returns a string which parses complex expressions.
    # Gave up and looked for complex number regular expression,
    #   modified to include scientific numbers. 
    # See https://web.archive.org/web/20221228150825/https://stackoverflow.com/questions/67818976/regular-expression-for-complex-numbers
    '''
    num = r'(?:[+\-]?(?:\d*\.)?\d+)'
    num_sci = r'(?:{num}(?:e[+\-]?\d+)?)'.format(num=num)
    cx_num = r'(?:{num_sci}?{num_sci}[ij])'.format(num_sci=num_sci)
    cx_match_wrapped= r"^(?:{cx_num}|\({cx_num}\))$".format(cx_num=cx_num)
    return cx_match_wrapped
_tuple_re = r"^(?:\((?:.|\n|\r)*,(?:.|\n|\r)*\){1}(?: |\n|\r)*$)"
_array_re = r"^(?:(np\.|)array\(\[(?:.|\n|\r)*,(?:.|\n|\r)*\]\){1}(?: |\n|\r)*$)"
_complex_re= _complex_re_gen()

# def _tuple_constructor(self, node):
#    return eval(self.construct_scalar(node))

def _tuple_constructor_safe(self,node):
    value = node.value
    value = re.sub("^\(","[",value)
    value = re.sub("\)$","]",value)
    safe_l = yaml_load(value,yaml=None)
    return tuple(safe_l)

def _complex_constructor(self,node):
    return complex(node.value)

def _array_constructor_safe(self,node):
    value = node.value
    value = re.sub("^(?:np\.|)array\(","",value)
    value = re.sub("\)$","",value)
    safe_l = yaml_load(value,yaml=None)
    return np.array(safe_l)

def _tuple_representer(dumper, data):
    repr = str(data)
    return dumper.represent_tagged_scalar(TaggedScalar(repr, style=None, tag='!tuple'))

def _complex_representer(dumper,data):
    repr = str(data)
    repr = re.sub("()","",repr)
    return dumper.represent_tagged_scalar(TaggedScalar(repr, style=None, tag='!complex'))

def _array_representer(dumper, data):
    repr = 'np.array(' + np.array2string(data, max_line_width=np.inf, precision=16, prefix='np.array(', separator=', ', suffix=')') + ')'
    repr = repr.replace(' ', '').replace(',', ', ')
    return dumper.represent_tagged_scalar(TaggedScalar(repr, style=None, tag='!nparray'))
    
def _complex_resolver(str_resolve,match_re = re.compile(r'[ij]')):
    '''
    For debugging. Sees if the built-in complex constructor allows it.
    '''
    if re.search(match_re,str_resolve):
        try:
            cplx = complex(str_resolve.replace('i','j'))
            return cplx
        except ValueError:
            pass
    return None

custom_types = {
    '!tuple':   {'re':_tuple_re,   'constructor': _tuple_constructor_safe,   'representer':_tuple_representer,   'type': tuple,      'first':list('(')             },
    '!nparray': {'re':_array_re,   'constructor': _array_constructor_safe,   'representer':_array_representer,   'type': np.ndarray, 'first':list('an')},
    '!complex': {'re':_complex_re,   'constructor': _complex_constructor,   'representer':_complex_representer,   'type': complex, 'first':None }
}


def yaml_add_custom_types(yaml,custom_types):
    for tag,ct in custom_types.items():
            yaml.Constructor.add_constructor(tag, ct['constructor'])
            yaml.Resolver.add_implicit_resolver(tag, ruamel.yaml.util.RegExp(ct['re']), ct['first'])
            yaml.Representer.add_representer(ct['type'], ct['representer'])
    
def setup_yaml(yaml,custom_types):
    yaml_add_custom_types(yaml,custom_types)

yaml = yaml_preset = ruamel.yaml.YAML()
def yaml_load(fin,yaml = yaml_preset, custom_setup=True):
    '''
    Assumes globally setup yaml is used. 
    If we want to start from scratch and configure new YAML instance,
        we set yaml=None. custom_setup 
        then will setup yaml to deal with custom types
    '''
    if yaml is None:
        yaml = ruamel.yaml.YAML()
    if custom_setup:
        setup_yaml(yaml,custom_types)
    return yaml.load(fin)

def yaml_dump(obj,fout, yaml= yaml_preset,custom_setup=True):
    '''
    Assumes globally setup yaml is used. 
    If we want to start from scratch and configure new YAML instance,
        we set yaml=None. custom_setup 
        then will setup yaml to deal with custom types
    
    '''
    if yaml is None:
        yaml = ruamel.yaml.YAML()
    if custom_setup:
        setup_yaml(yaml,custom_types)
    return yaml.dump(obj,fout)
    
setup_yaml(yaml,custom_types)


yaml_file = """
test_tuple: (1, 2, 3)
test_array: array([4.0,5+0j,6.0j])
"""

data = yaml_load(yaml_file)

yaml_dump(data, sys.stdout)
# test_tuple: (1, 2, 3)
# test_array: np.array([4.+0.j, 5.+0.j, 0.+6.j])

为了更有效地强调新的求解方法,我将更详细地解释一个构造函数:

def _tuple_constructor_safe(self,node):
    value = node.value
    value = re.sub("^\(","[",value)
    value = re.sub("\)$","]",value)
    safe_l = yaml_load(value,yaml=None)
    return tuple(safe_l)

我们修改了解析的字符串的值,使其带有括号。这使其成为一个YAML列表。然后,我们使用yaml加载YAML列表。一个关键的事情是,我们必须创建一个新的YAML对象,因为当前加载列表的YAML对象处于无效状态,无法加载另一个对象。

相关问题