I'm using stanfordNLP to get date entities from text. Here's the code that i tried:-
import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
public class StanfordNLP_POC
{
public static void main(String[] args) throws IOException
{
// TODO Auto-generated method stub
String classifierPath = "src//main//resources//classifiers//english.muc.7class.distsim.crf.ser.gz";
String inputString = "Appointment Facility: ABC Medicine Clinic 05/07/2020 Progress Notes: Niel Armstrong, DO Current Medications Reason for Appointment";
AbstractSequenceClassifier classifier = CRFClassifier.getClassifierNoExceptions(classifierPath);
List<List<CoreLabel>> out = classifier.classify(inputString);
System.out.println(out.toString());
for (List<CoreLabel> sentence : out)
{
for (CoreLabel word : sentence)
{
if (word.getString(CoreAnnotations.AnswerAnnotation.class).equals("O"))
continue;
System.out.println(word.word() + " = " + word.get(CoreAnnotations.AnswerAnnotation.class));
}
}
}
}
I didn't get why it's not extracting Date even though it's very clearly identifiable in the text.
Also when trying with pipeline it extracts date but takes a bit longer to do so.
1条答案
按热度按时间vwhgwdsa1#
统计模型没有识别该特定日期格式的经验。如果你只使用模型运行CoreNLP,你会发现它也无法识别:java edu.stanford.nlp.pipeline.StanfordCoreNLP -ner.statisticalOnly CoreNLP使用了一些硬编码的表达式,可以识别这种格式的日期。