python-3.x 从不调用spaCy自定义组件函数

kg7wmglp 于 2023-10-21 发布在 Python

关注(0)|答案(1)|浏览(152)

我正在向spaCy添加一个自定义组件，但它从未被调用：

@Language.component("custom_sentence_boundaries")
def custom_sentence_boundaries(doc):
    print(".")
    for token in doc[:-1]:
        if token.text == "\n":
            doc[token.i + 1].is_sent_start = True
    return doc

nlp = spacy.load("de_core_web_sm")
nlp.add_pipe("custom_sentence_boundaries", after="parser")
nlp.analyze_pipes(pretty=True)
doc = nlp(text)
sentences = [sent.text for sent in doc.sents]

我在sentences中得到一个结果，分析器确实列出了我的组件，但我的自定义组件接缝没有效果，我从来没有看到打印的点出现。
有什么想法吗？

python-3.x

来源：https://stackoverflow.com/questions/77311518/spacy-custom-component-function-is-never-called

1条答案

按热度按时间

okxuctiv1#

在您粘贴的代码中：
您正在执行：

nlp = spacy.load("de_core_web_sm")

然而，它应该是：

nlp = spacy.load("en_core_web_sm")

我试着复制你的代码，我得到的结果是

@Language.component("custom_sentence_boundaries")
def custom_sentence_boundaries(doc):
    print("...$...")                     # I am printing "...$..." so that it is visible easily 
    for token in doc[:-1]:
        if token.text == "\n":
            doc[token.i + 1].is_sent_start = True
    return doc

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("custom_sentence_boundaries", after="parser")
nlp.analyze_pipes(pretty=True)
text = ("When Sebastian Thrun started working on self-driving cars at "
        "Google in 2007, few people outside of the company took him "
        "seriously. “I can tell you very senior CEOs of major American "
        "car companies would shake my hand and turn away because I wasn’t "
        "worth talking to,” said Thrun, in an interview with Recode earlier "
        "this week.")
doc = nlp(text)
sentences = [sent.text for sent in doc.sents]

输出

（请参阅底部...$...被打印，custom_sentence_boundaries被打印在parser之后，因为我们在关键字参数中声明了after="parser"）

============================= Pipeline Overview =============================

#   Component                    Assigns               Requires   Scores             Retokenizes
-   --------------------------   -------------------   --------   ----------------   -----------
0   tok2vec                      doc.tensor                                          False      
                                                                                                
1   tagger                       token.tag                        tag_acc            False      
                                                                                                
2   parser                       token.dep                        dep_uas            False      
                                 token.head                       dep_las                       
                                 token.is_sent_start              dep_las_per_type              
                                 doc.sents                        sents_p                       
                                                                  sents_r                       
                                                                  sents_f                       
                                                                                                
3   custom_sentence_boundaries                                                       False      
                                                                                                
4   attribute_ruler                                                                  False      
                                                                                                
5   lemmatizer                   token.lemma                      lemma_acc          False      
                                                                                                
6   ner                          doc.ents                         ents_f             False      
                                 token.ent_iob                    ents_p                        
                                 token.ent_type                   ents_r                        
                                                                  ents_per_type                 

✔ No problems found.
...$...

赞(0）回复(0）举报 2023-10-21

我来回答

python-3.x 从不调用spaCy自定义组件函数

1条答案

输出

相关问题

热门标签

最新问答