spaCy Displacy可视化器有时只显示标签,

evrscar2  于 9个月前  发布在  其他
关注(0)|答案(8)|浏览(133)

在很多情况下,我在displacy中看不到span开头的标签。这不应该是一个特定标签的问题,因为我有时在使用不同的文本示例时也能看到它。
有什么需要我注意的地方来避免这个问题吗?

  1. colors = {#omitted in this example}
  2. options = {"spans_key": "sentences", "colors": colors}
  3. displacy.serve(doc, style="span", options=options)

谢谢!

ffvjumwh

ffvjumwh1#

嘿,高勋,
感谢您分享您的观察!我可以请您提供一些标签呈现与不呈现的示例吗?

vd2z7a6w

vd2z7a6w2#

你好,@kadarakos,感谢你的回复。以下是一些缺失标签的示例:

lokaqttq

lokaqttq3#

感谢您的示例!但我能否请您提供一个代码示例,这样我更容易测试它?

dy2hfwbg

dy2hfwbg4#

对于迟来的回复表示歉意。
我用来创建模式的代码:

  1. import spacy
  2. from spacy.tokens import SpanGroup, Span
  3. from spacy import displacy
  4. from spacy.tokens import DocBin
  5. import re
  6. nlp = spacy.blank("en")
  7. ruler = nlp.add_pipe("span_ruler")
  8. nlp.add_pipe("sentencizer")
  9. patterns = [{"label": "Governing Law", "pattern": "the laws of the"},
  10. {"label": "Governing Law", "pattern": "shall be governed by"},
  11. {"label": "Governing Law", "pattern": "governed in accordance with"},
  12. {"label": "Governing Law", "pattern": "governed under"},
  13. {"label": "Governing Law", "pattern": "in accordance with the laws of"},
  14. {"label": "Assignment", "pattern": "may be assigned"},
  15. {"label": "Assignment", "pattern": "may not be assigned"},
  16. {"label": "Assignment", "pattern": "shall not assign"},
  17. {"label": "Assignment", "pattern": "shall not be assigned"},
  18. {"label": "Assignment", "pattern": "shall not be assignable"},
  19. {"label": "Assignment", "pattern": "the right to assign"},
  20. {"label": "Assignment", "pattern": "no assignment"},
  21. {"label": "Pricing", "pattern": "calculated as follows"},
  22. {"label": "Pricing", "pattern": "the price shall be"},
  23. {"label": "Pricing", "pattern": "shall pay"},
  24. {"label": "Pricing", "pattern": "undertakes to pay"},
  25. {"label": "Notices", "pattern": "Notices under this"},
  26. {"label": "Notices", "pattern": "any notice required"},
  27. {"label": "Notices", "pattern": "any notice served"},
  28. {"label": "Notices", "pattern": "any notice given"},
  29. {"label": "Notices", "pattern": "all notices provided"},
  30. {"label": "Notices", "pattern": "every notice"},
  31. {"label": "Term", "pattern": "term of this"},
  32. {"label": "Term", "pattern": "shall commence on the Effective Date"},
  33. {"label": "Term", "pattern": "come into force on the date"},
  34. {"label": "Term", "pattern": "effective until terminated"},
  35. {"label": "Term", "pattern": "this agreement commences"},
  36. {"label": "License Grant", "pattern": "grant to each other a limited license"},
  37. {"label": "License Grant", "pattern": "Licensor hereby grants"},
  38. {"label": "License Grant", "pattern": "irrevocable, worldwide"},
  39. {"label": "License Grant", "pattern": "fully paid, limited, non exclusive"},
  40. {"label": "Termination/Convenience", "pattern": "may terminate this Agreement"},
  41. {"label": "Termination/Convenience", "pattern": "may terminate this Agreement at any time"},
  42. {"label": "Termination/Convenience", "pattern": "to terminate this Agreement"},
  43. {"label": "Termination/Convenience", "pattern": "right to terminate this Agreement immediately upon written notice"},
  44. {"label": "Termination/Convenience", "pattern": "may terminate this Agreement for no reason or for any reason "},
  45. {"label": "Termination/Convenience", "pattern": "may terminate this Agreement for any reason"},
  46. {"label": "Termination/Convenience", "pattern": "shall have the right to terminate"},
  47. {"label": "Non-solicit", "pattern": "employee of the other"},
  48. {"label": "Non-solicit", "pattern": "any employee of the"},
  49. {"label": "Insurance", "pattern": "any employee of the"},
  50. {"label": "Insurance", "pattern": "as an additional insured"},
  51. {"label": "Covenant Not to Sue", "pattern": "shall not now or in the future contest the validity of"},
  52. {"label": "Covenant Not to Sue", "pattern": "contest the validity of"},
  53. {"label": "IP Assignment", "pattern": "right, title and interest in and to"},
  54. {"label": "Warranty", "pattern": "represents and warrants that"},
  55. {"label": "Warranty", "pattern": "be free from defects"}
  56. ]
  57. ruler.add_patterns(patterns)
  58. for i in range(1,400):
  59. text_open = open(f"inputfiles/ ({i}).txt", "r", encoding='utf8')
  60. text = text_open.read()
  61. doc = nlp(text)
  62. doc.spans["sentences"] = SpanGroup(doc)
  63. db = DocBin()
  64. for sentence in doc.sents:
  65. for span in doc.spans["ruler"]:
  66. if span.start >= sentence.start and span.end <= sentence.end:
  67. doc.spans["sentences"] += [
  68. Span(doc, start=sentence.start, end=sentence.end, label=span.label_)
  69. ]
  70. doc.set_ents(entities=[span], default="unmodified")
  71. text_open.close()
  72. db.add(doc)
  73. db.to_disk(f"./train/{i}.spacy")

你可以在以下 Flask app 中测试模型(例如,使用来自 CUAD 数据集的随机合同)。它只是一个DisplaCy渲染,所以没有什么特别之处。

展开查看全部
z9ju0rcb

z9ju0rcb5#

请提供一个示例 Doc (作为 DocBin,如果可能的话),其中可视化是错误的?

azpvetkf

azpvetkf6#

我无法在这里上传DocBin,所以希望这个会对你们有所帮助。在根目录下运行下面的代码example.txt 。我还注意到,只在可视化整个文本时标签才会丢失。如果我复制一部分并可视化它,它将正确显示(如图底部的图片所示)。

$x_1a_b_1^x$

$x_1c_d_1^x$

s6fujrry

s6fujrry7#

感谢您提供的所有信息!我们将对此进行更详细的调查,并将告知您我们的发现。

7rtdyuoh

7rtdyuoh8#

我尝试使用你提供的代码复现这个问题,并在Jupyter Notebook上运行你提供的代码。以下是我在笔记本中呈现的内容的截图。

在第一个示例中,我的笔记本中呈现了"Termination/Convenience"标签,但在你提供的截图中缺失:

再次强调,我的笔记本中正确呈现了"Pricing"标签,但在你提供的截图中缺失:

很遗憾,我无法复现你遇到的问题。

相关问题