如何在python中解析lisp可读的属性列表文件

o3imoua4  于 2021-08-20  发布在  Java
关注(0)|答案(0)|浏览(250)

为了使用python构建nlp应用程序,我正在尝试解析一个动词英语词典,因此我必须将它与我的nltk脚本合并,词典是一个lisp可读的属性列表文件,但我需要一个更简单的格式,如json文件或数据框。
该词典数据库的一个例子是:

  1. ;; Grid: 51.2#1#_th,src#abandon#abandon#abandon#abandon+ingly#(1.5,01269572,01188040,01269413,00345378)(1.6,01524319,01421290,01524047,00415625)###AD
  2. (
  3. :DEF_WORD "abandon"
  4. :CLASS "51.2"
  5. :WN_SENSE (("1.5" 01269572 01188040 01269413 00345378)
  6. ("1.6" 01524319 01421290 01524047 00415625))
  7. :PROPBANK ("arg1 arg2")
  8. :THETA_ROLES ((1 "_th,src"))
  9. :LCS (go loc (* thing 2)
  10. (away_from loc (thing 2) (at loc (thing 2) (* thing 4)))
  11. (abandon+ingly 26))
  12. :VAR_SPEC ((4 :optional) (2 (animate +)))
  13. )
  14. ;; Grid: 45.4.a#1#_ag_th,instr(with)#abase#abase#abase#abase+ed#(1.5,01024949)(1.6,01228249)###AD
  15. (
  16. :DEF_WORD "abase"
  17. :CLASS "45.4.a"
  18. :WN_SENSE (("1.5" 01024949)
  19. ("1.6" 01228249))
  20. :PROPBANK ("arg0 arg1 arg2(with)")
  21. :THETA_ROLES ((1 "_ag_th,instr(with)"))
  22. :LCS (cause (* thing 1)
  23. (go ident (* thing 2)
  24. (toward ident (thing 2) (at ident (thing 2) (abase+ed 9))))
  25. ((* with 19) instr (*head*) (thing 20)))
  26. :VAR_SPEC ((1 (animate +)))
  27. )

这里有完整的数据https://raw.githubusercontent.com/ihmc/lcs/master/verbs-english.lcs
我尝试过在本文中发表的想法,使用类似的东西用python解析lisp文件,但我得到了一种与我所寻找的格式不同的格式

  1. inputdata = '''
  2. (
  3. :DEF_WORD "abandon"
  4. :CLASS "51.2"
  5. :WN_SENSE (("1.5" 01269572 01188040 01269413 00345378)
  6. ("1.6" 01524319 01421290 01524047 00415625))
  7. :PROPBANK ("arg1 arg2")
  8. :THETA_ROLES ((1 "_th,src"))
  9. :LCS (go loc (* thing 2)
  10. (away_from loc (thing 2) (at loc (thing 2) (* thing 4)))
  11. (abandon+ingly 26))
  12. :VAR_SPEC ((4 :optional) (2 (animate +)))
  13. )
  14. (
  15. :DEF_WORD "abase"
  16. :CLASS "45.4.a"
  17. :WN_SENSE (("1.5" 01024949)
  18. ("1.6" 01228249))
  19. :PROPBANK ("arg0 arg1 arg2(with)")
  20. :THETA_ROLES ((1 "_ag_th,instr(with)"))
  21. :LCS (cause (* thing 1)
  22. (go ident (* thing 2)
  23. (toward ident (thing 2) (at ident (thing 2) (abase+ed 9))))
  24. ((* with 19) instr (*head*) (thing 20)))
  25. :VAR_SPEC ((1 (animate +)))
  26. )'''
  27. from pyparsing import OneOrMore, nestedExpr
  28. data = OneOrMore(nestedExpr()).parseString(inputdata)
  29. print (data)

我得到了如下输出:

  1. [
  2. [ ':DEF_WORD', '"abandon"',
  3. ':CLASS', '"51.2"',
  4. ':WN_SENSE', [
  5. ['"1.5"', '01269572', '01188040', '01269413', '00345378'],
  6. ['"1.6"', '01524319', '01421290', '01524047', '00415625']
  7. ],
  8. ':PROPBANK', ['"arg1 arg2"'],
  9. ':THETA_ROLES', [['1', '"_th,src"']],
  10. ':LCS', ['go', 'loc', ['*', 'thing', '2'],
  11. ['away_from', 'loc', ['thing', '2'],
  12. ['at', 'loc', ['thing', '2'], ['*', 'thing', '4']]], ['abandon+ingly', '26']],
  13. ':VAR_SPEC', [['4', ':optional'], ['2', ['animate', '+']]]]
  14. ,
  15. [':DEF_WORD', '"abase"',
  16. ':CLASS', '"45.4.a"',
  17. ':WN_SENSE', [
  18. ['"1.5"', '01024949'],
  19. ['"1.6"', '01228249']
  20. ],
  21. ':PROPBANK', ['"arg0 arg1 arg2(with)"'],
  22. ':THETA_ROLES', [['1', '"_ag_th,instr(with)"']],
  23. ':LCS', ['cause', ['*', 'thing', '1'],
  24. ['go', 'ident', ['*', 'thing', '2'],
  25. ['toward', 'ident', ['thing', '2'],
  26. ['at', 'ident', ['thing', '2'],
  27. ['abase+ed', '9']]]],
  28. [['*', 'with', '19'], 'instr', ['*head*'], ['thing', '20']]],
  29. ':VAR_SPEC', [['1', ['animate', '+']]]
  30. ]
  31. ]

我不确定如何处理这种输出格式,以便获得本词典中的例如θ_角色值或其他动词特征,我使用pandas和nltk将所有句子放在一个数组中,因此,我的想法是寻找在本词典中具有一种动词和特定θ_角色值或其他特征的句子。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题