在很多情况下,我在displacy中看不到span开头的标签。这不应该是一个特定标签的问题,因为我有时在使用不同的文本示例时也能看到它。有什么需要我注意的地方来避免这个问题吗?
colors = {#omitted in this example}options = {"spans_key": "sentences", "colors": colors}displacy.serve(doc, style="span", options=options)
colors = {#omitted in this example}
options = {"spans_key": "sentences", "colors": colors}
displacy.serve(doc, style="span", options=options)
谢谢!
ffvjumwh1#
嘿,高勋,感谢您分享您的观察!我可以请您提供一些标签呈现与不呈现的示例吗?
vd2z7a6w2#
你好,@kadarakos,感谢你的回复。以下是一些缺失标签的示例:
lokaqttq3#
感谢您的示例!但我能否请您提供一个代码示例,这样我更容易测试它?
dy2hfwbg4#
对于迟来的回复表示歉意。我用来创建模式的代码:
import spacyfrom spacy.tokens import SpanGroup, Spanfrom spacy import displacyfrom spacy.tokens import DocBinimport renlp = spacy.blank("en")ruler = nlp.add_pipe("span_ruler")nlp.add_pipe("sentencizer")patterns = [{"label": "Governing Law", "pattern": "the laws of the"}, {"label": "Governing Law", "pattern": "shall be governed by"}, {"label": "Governing Law", "pattern": "governed in accordance with"}, {"label": "Governing Law", "pattern": "governed under"}, {"label": "Governing Law", "pattern": "in accordance with the laws of"}, {"label": "Assignment", "pattern": "may be assigned"}, {"label": "Assignment", "pattern": "may not be assigned"}, {"label": "Assignment", "pattern": "shall not assign"}, {"label": "Assignment", "pattern": "shall not be assigned"}, {"label": "Assignment", "pattern": "shall not be assignable"}, {"label": "Assignment", "pattern": "the right to assign"}, {"label": "Assignment", "pattern": "no assignment"}, {"label": "Pricing", "pattern": "calculated as follows"}, {"label": "Pricing", "pattern": "the price shall be"}, {"label": "Pricing", "pattern": "shall pay"}, {"label": "Pricing", "pattern": "undertakes to pay"}, {"label": "Notices", "pattern": "Notices under this"}, {"label": "Notices", "pattern": "any notice required"}, {"label": "Notices", "pattern": "any notice served"}, {"label": "Notices", "pattern": "any notice given"}, {"label": "Notices", "pattern": "all notices provided"}, {"label": "Notices", "pattern": "every notice"}, {"label": "Term", "pattern": "term of this"}, {"label": "Term", "pattern": "shall commence on the Effective Date"}, {"label": "Term", "pattern": "come into force on the date"}, {"label": "Term", "pattern": "effective until terminated"}, {"label": "Term", "pattern": "this agreement commences"}, {"label": "License Grant", "pattern": "grant to each other a limited license"}, {"label": "License Grant", "pattern": "Licensor hereby grants"}, {"label": "License Grant", "pattern": "irrevocable, worldwide"}, {"label": "License Grant", "pattern": "fully paid, limited, non exclusive"}, {"label": "Termination/Convenience", "pattern": "may terminate this Agreement"}, {"label": "Termination/Convenience", "pattern": "may terminate this Agreement at any time"}, {"label": "Termination/Convenience", "pattern": "to terminate this Agreement"}, {"label": "Termination/Convenience", "pattern": "right to terminate this Agreement immediately upon written notice"}, {"label": "Termination/Convenience", "pattern": "may terminate this Agreement for no reason or for any reason "}, {"label": "Termination/Convenience", "pattern": "may terminate this Agreement for any reason"}, {"label": "Termination/Convenience", "pattern": "shall have the right to terminate"}, {"label": "Non-solicit", "pattern": "employee of the other"}, {"label": "Non-solicit", "pattern": "any employee of the"}, {"label": "Insurance", "pattern": "any employee of the"}, {"label": "Insurance", "pattern": "as an additional insured"}, {"label": "Covenant Not to Sue", "pattern": "shall not now or in the future contest the validity of"}, {"label": "Covenant Not to Sue", "pattern": "contest the validity of"}, {"label": "IP Assignment", "pattern": "right, title and interest in and to"}, {"label": "Warranty", "pattern": "represents and warrants that"}, {"label": "Warranty", "pattern": "be free from defects"} ]ruler.add_patterns(patterns)for i in range(1,400): text_open = open(f"inputfiles/ ({i}).txt", "r", encoding='utf8') text = text_open.read() doc = nlp(text) doc.spans["sentences"] = SpanGroup(doc) db = DocBin() for sentence in doc.sents: for span in doc.spans["ruler"]: if span.start >= sentence.start and span.end <= sentence.end: doc.spans["sentences"] += [ Span(doc, start=sentence.start, end=sentence.end, label=span.label_) ] doc.set_ents(entities=[span], default="unmodified") text_open.close() db.add(doc) db.to_disk(f"./train/{i}.spacy")
import spacy
from spacy.tokens import SpanGroup, Span
from spacy import displacy
from spacy.tokens import DocBin
import re
nlp = spacy.blank("en")
ruler = nlp.add_pipe("span_ruler")
nlp.add_pipe("sentencizer")
patterns = [{"label": "Governing Law", "pattern": "the laws of the"},
{"label": "Governing Law", "pattern": "shall be governed by"},
{"label": "Governing Law", "pattern": "governed in accordance with"},
{"label": "Governing Law", "pattern": "governed under"},
{"label": "Governing Law", "pattern": "in accordance with the laws of"},
{"label": "Assignment", "pattern": "may be assigned"},
{"label": "Assignment", "pattern": "may not be assigned"},
{"label": "Assignment", "pattern": "shall not assign"},
{"label": "Assignment", "pattern": "shall not be assigned"},
{"label": "Assignment", "pattern": "shall not be assignable"},
{"label": "Assignment", "pattern": "the right to assign"},
{"label": "Assignment", "pattern": "no assignment"},
{"label": "Pricing", "pattern": "calculated as follows"},
{"label": "Pricing", "pattern": "the price shall be"},
{"label": "Pricing", "pattern": "shall pay"},
{"label": "Pricing", "pattern": "undertakes to pay"},
{"label": "Notices", "pattern": "Notices under this"},
{"label": "Notices", "pattern": "any notice required"},
{"label": "Notices", "pattern": "any notice served"},
{"label": "Notices", "pattern": "any notice given"},
{"label": "Notices", "pattern": "all notices provided"},
{"label": "Notices", "pattern": "every notice"},
{"label": "Term", "pattern": "term of this"},
{"label": "Term", "pattern": "shall commence on the Effective Date"},
{"label": "Term", "pattern": "come into force on the date"},
{"label": "Term", "pattern": "effective until terminated"},
{"label": "Term", "pattern": "this agreement commences"},
{"label": "License Grant", "pattern": "grant to each other a limited license"},
{"label": "License Grant", "pattern": "Licensor hereby grants"},
{"label": "License Grant", "pattern": "irrevocable, worldwide"},
{"label": "License Grant", "pattern": "fully paid, limited, non exclusive"},
{"label": "Termination/Convenience", "pattern": "may terminate this Agreement"},
{"label": "Termination/Convenience", "pattern": "may terminate this Agreement at any time"},
{"label": "Termination/Convenience", "pattern": "to terminate this Agreement"},
{"label": "Termination/Convenience", "pattern": "right to terminate this Agreement immediately upon written notice"},
{"label": "Termination/Convenience", "pattern": "may terminate this Agreement for no reason or for any reason "},
{"label": "Termination/Convenience", "pattern": "may terminate this Agreement for any reason"},
{"label": "Termination/Convenience", "pattern": "shall have the right to terminate"},
{"label": "Non-solicit", "pattern": "employee of the other"},
{"label": "Non-solicit", "pattern": "any employee of the"},
{"label": "Insurance", "pattern": "any employee of the"},
{"label": "Insurance", "pattern": "as an additional insured"},
{"label": "Covenant Not to Sue", "pattern": "shall not now or in the future contest the validity of"},
{"label": "Covenant Not to Sue", "pattern": "contest the validity of"},
{"label": "IP Assignment", "pattern": "right, title and interest in and to"},
{"label": "Warranty", "pattern": "represents and warrants that"},
{"label": "Warranty", "pattern": "be free from defects"}
]
ruler.add_patterns(patterns)
for i in range(1,400):
text_open = open(f"inputfiles/ ({i}).txt", "r", encoding='utf8')
text = text_open.read()
doc = nlp(text)
doc.spans["sentences"] = SpanGroup(doc)
db = DocBin()
for sentence in doc.sents:
for span in doc.spans["ruler"]:
if span.start >= sentence.start and span.end <= sentence.end:
doc.spans["sentences"] += [
Span(doc, start=sentence.start, end=sentence.end, label=span.label_)
doc.set_ents(entities=[span], default="unmodified")
text_open.close()
db.add(doc)
db.to_disk(f"./train/{i}.spacy")
你可以在以下 Flask app 中测试模型(例如,使用来自 CUAD 数据集的随机合同)。它只是一个DisplaCy渲染,所以没有什么特别之处。
z9ju0rcb5#
请提供一个示例 Doc (作为 DocBin,如果可能的话),其中可视化是错误的?
Doc
DocBin
azpvetkf6#
我无法在这里上传DocBin,所以希望这个会对你们有所帮助。在根目录下运行下面的代码example.txt 。我还注意到,只在可视化整个文本时标签才会丢失。如果我复制一部分并可视化它,它将正确显示(如图底部的图片所示)。
$x_1a_b_1^x$
$x_1c_d_1^x$
s6fujrry7#
感谢您提供的所有信息!我们将对此进行更详细的调查,并将告知您我们的发现。
7rtdyuoh8#
我尝试使用你提供的代码复现这个问题,并在Jupyter Notebook上运行你提供的代码。以下是我在笔记本中呈现的内容的截图。
在第一个示例中,我的笔记本中呈现了"Termination/Convenience"标签,但在你提供的截图中缺失:
再次强调,我的笔记本中正确呈现了"Pricing"标签,但在你提供的截图中缺失:
很遗憾,我无法复现你遇到的问题。
8条答案
按热度按时间ffvjumwh1#
嘿,高勋,
感谢您分享您的观察!我可以请您提供一些标签呈现与不呈现的示例吗?
vd2z7a6w2#
你好,@kadarakos,感谢你的回复。以下是一些缺失标签的示例:
lokaqttq3#
感谢您的示例!但我能否请您提供一个代码示例,这样我更容易测试它?
dy2hfwbg4#
对于迟来的回复表示歉意。
我用来创建模式的代码:
你可以在以下 Flask app 中测试模型(例如,使用来自 CUAD 数据集的随机合同)。它只是一个DisplaCy渲染,所以没有什么特别之处。
z9ju0rcb5#
请提供一个示例
Doc
(作为DocBin
,如果可能的话),其中可视化是错误的?azpvetkf6#
我无法在这里上传DocBin,所以希望这个会对你们有所帮助。在根目录下运行下面的代码example.txt 。我还注意到,只在可视化整个文本时标签才会丢失。如果我复制一部分并可视化它,它将正确显示(如图底部的图片所示)。
$x_1a_b_1^x$
$x_1c_d_1^x$
s6fujrry7#
感谢您提供的所有信息!我们将对此进行更详细的调查,并将告知您我们的发现。
7rtdyuoh8#
我尝试使用你提供的代码复现这个问题,并在Jupyter Notebook上运行你提供的代码。以下是我在笔记本中呈现的内容的截图。
在第一个示例中,我的笔记本中呈现了"Termination/Convenience"标签,但在你提供的截图中缺失:
再次强调,我的笔记本中正确呈现了"Pricing"标签,但在你提供的截图中缺失:
很遗憾,我无法复现你遇到的问题。