json jq将多个文本文件拆分为多个数组

r6l8ljro  于 2023-08-08  发布在  其他
关注(0)|答案(5)|浏览(92)

我有一个问题,我有一些文件的内容有多个键值对,我想转换成多个数组。
让我用一些制作的例子来说明我的意思。首先是文件的内容:

  1. # cat content/1.yaml
  2. time: "2020-09-14T22:33:40Z"
  3. id: ed1d4321
  4. name: One
  5. description: 'Here is number "one"
  6. this is good'
  7. # cat content/2yaml
  8. time: "2021-09-14T22:33:40Z"
  9. id: eg134841
  10. name: Two
  11. description: 'Here is number "two"
  12. best of all'
  13. newkey: value

字符串
在下一步中,我将这些文件合并到一个blob中,其中包含我想要保留的文件名:

  1. # for file in $(ls content/*yaml); do echo filename: $file; cat $file; done
  2. filename: content/1.yaml
  3. time: "2020-09-14T22:33:40Z"
  4. id: ed1d4321
  5. name: One
  6. description: 'Here is number "one"
  7. this is good'
  8. filename: content/2yaml
  9. time: "2021-09-14T22:33:40Z"
  10. id: eg134841
  11. name: Two
  12. description: 'Here is number "two"
  13. best of all'
  14. newkey: value


现在问题开始了,如何将这些集合到json数组中?
这就是我到目前为止的想法:

  1. # for file in $(ls content/*yaml); do echo filename: $file; cat $file; done | jq -Rn '[inputs|split(": ")] | map({(.[0]): .[1]})'
  2. [
  3. {
  4. "filename": "content/1.yaml"
  5. },
  6. {
  7. "time": "\"2020-09-14T22:33:40Z\""
  8. },
  9. {
  10. "id": "ed1d4321"
  11. },
  12. {
  13. "name": "One"
  14. },
  15. {
  16. "description": "'Here is number \"one\""
  17. },
  18. {
  19. " this is good'": null
  20. },
  21. {
  22. "filename": "content/2yaml"
  23. },
  24. {
  25. "time": "\"2021-09-14T22:33:40Z\""
  26. },
  27. {
  28. "id": "eg134841"
  29. },
  30. {
  31. "name": "Two"
  32. },
  33. {
  34. "description": "'Here is number \"two\""
  35. },
  36. {
  37. " best of all'": null
  38. },
  39. {
  40. "newkey": "value"
  41. }
  42. ]


这已经很接近了,但我仍然需要解决一些问题,我没有找到解决方案:
1.文件名不会分散到单独的数组中。

  1. time字段不应该有转义的带引号的字符串。我想有一个解决方案,在所有领域的迭代,并会扩大这些内容的引号像这里的例子"time": "2021-09-14T22:33:40Z"
  2. description值分布在多行中,我希望看到它们合并成一个值,但这不是目前发生的事情,所以应该是这样的:"description": "Here is number \"two\" best of all。单引号不应保留。
    所以最后的结果应该是这样的:
  1. [
  2. {
  3. "filename": "content/1.yaml",
  4. "time": "2020-09-14T22:33:40Z",
  5. "id": "ed1d4321",
  6. "name": "One",
  7. "description": "Here is number \"one\" this is good"
  8. },
  9. {
  10. "filename": "content/2yaml",
  11. "time": "2021-09-14T22:33:40Z",
  12. "id": "eg134841",
  13. "name": "Two",
  14. "description": "Here is number \"two\" best of all",
  15. "newkey": "value"
  16. }
  17. ]

3ks5zfa0

3ks5zfa01#

下面的代码一次处理一个文件,并假定使用-R和-s命令行选项(jq -Rs)调用jq。将多个文件的结果组合起来作为练习。(提示:对于文件名,使用input_filename

  1. def objectify:
  2. capture("(?<key>[^:]+): *(?<value>.*)")
  3. | .value = (.value | (fromjson? // .))
  4. | [.]
  5. | from_entries;
  6. gsub("\n *"; " ") # join dangling text
  7. | . / "\n" # split
  8. | map(select(length>0)) # ignore ""
  9. | map(objectify) # {key, value}
  10. | add

字符串

mfuanj7w

mfuanj7w2#

这是一个部分解决方案--值还没有被“清理”。这是留给读者的摘录:-)
jq --slurp --raw-input开始:

  1. # split lines
  2. split("\n")
  3. # join lines starting with whitespace with previous line
  4. | reduce .[] as $l (
  5. null;
  6. if $l | startswith(" ") then .[-1] += $l else . += [$l] end
  7. )
  8. # split on first colon, returning an array of objects like {key: X, value: Y}
  9. | map(capture("^(?<key>[^:]+):\\s*(?<value>.*)$"))
  10. # combine these simple objects into bigger objects but begin a new objects when encountering "filename"
  11. | reduce .[] as $e (null;
  12. if $e.key == "filename" then . += [{}] else . end
  13. | .[-1][$e.key] = $e.value
  14. )

字符串
输出如下:

  1. [
  2. {
  3. "filename": "content/1.yaml",
  4. "time": "\"2020-09-14T22:33:40Z\"",
  5. "id": "ed1d4321",
  6. "name": "One",
  7. "description": "'Here is number \"one\" this is good'"
  8. },
  9. {
  10. "filename": "content/2yaml",
  11. "time": "\"2021-09-14T22:33:40Z\"",
  12. "id": "eg134841",
  13. "name": "Two",
  14. "description": "'Here is number \"two\" best of all'",
  15. "newkey": "value"
  16. }
  17. ]

展开查看全部
sbdsn5lh

sbdsn5lh3#

这可能更适合yq,而不是尝试重新实现YAML解析器。
这样的东西会起作用:

  1. yq eval-all -o=json '[{"filename": filename} + .]' *.yaml

字符串
导致

  1. [
  2. {
  3. "filename": "1.yaml",
  4. "time": "2020-09-14T22:33:40Z",
  5. "id": "ed1d4321",
  6. "name": "One",
  7. "description": "Here is number \"one\" this is good"
  8. },
  9. {
  10. "filename": "2.yaml",
  11. "time": "2021-09-14T22:33:40Z",
  12. "id": "eg134841",
  13. "name": "Two",
  14. "description": "Here is number \"two\" best of all",
  15. "newkey": "value"
  16. }
  17. ]

展开查看全部
nwo49xxi

nwo49xxi4#

好吧,我找到了另一个答案,它不是使用yq,而是使用Python,它很可能安装在很多机器上:

  1. # for file in $(ls content/*yaml); do (echo filename: $file; cat $file) | python -c 'import yaml; import json; import sys; print(json.dumps(yaml.safe_load(sys.stdin)));' ; done | jq -s
  2. [
  3. {
  4. "filename": "content/1.yaml",
  5. "time": "2020-09-14T22:33:40Z",
  6. "id": "ed1d4321",
  7. "name": "One",
  8. "description": "Here is number \"one\" this is good"
  9. },
  10. {
  11. "filename": "content/2yaml",
  12. "time": "2021-09-14T22:33:40Z",
  13. "id": "eg134841",
  14. "name": "Two",
  15. "description": "Here is number \"two\" best of all",
  16. "newkey": "value"
  17. }
  18. ]

字符串

展开查看全部
au9on6nz

au9on6nz5#

为了记录,你可以使用gojq,jq的Go实现,因为它支持YAML:

  1. gojq -n --yaml-input '[inputs | {filename: input_filename} + .] ' *.yaml

字符串
请注意,gojq将对对象键进行排序。

相关问题