java 有人能解释一下如何使用StanfordNLP创建PTB数据集和/或训练我自己的模型吗?

pn9klfpd  于 2023-03-16  发布在  Java
关注(0)|答案(1)|浏览(164)

我正在学习情感分析,但我似乎找不到任何关于如何创建PTB数据集的在线概述。我正在使用带有Java的StanfordNLP。我下载了他们使用的测试、开发和验证数据,但我无法理解这些数据是如何概述的:
test.txt:

(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2 (2 even) (3 greater)))) (2 (2 than) (2 (2 (2 (2 (1 (2 Arnold) (2 Schwarzenegger)) (2 ,)) (2 (2 Jean-Claud) (2 (2 Van) (2 Damme)))) (2 or)) (2 (2 Steven) (2 Segal))))))))))))) (2 .)))

我认为数字与情绪值是一致的,但我仍然不知道它是如何工作的。
TLDR;我正在尝试开发我自己的新闻分析模型,并且已经看到StanfordNLP模型已经在电影评论上进行了训练,这导致了糟糕的情绪分析,所以,我想尝试开发我自己的模型,但是我在网上找不到任何东西来教授每个元素是什么或者如何做。
充其量;在这一页上概述:https://nlp.stanford.edu/sentiment/code.html
数据集和要训练的代码是否可用。

Models can be retrained using the following command using the PTB format dataset:

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz

我已经准备好了需要解析的数据。

dddzy1tm

dddzy1tm1#

好吧..所以我做了一些挖掘,并已开始最终了解(一些什么)作为如何创建一个数据集树,并将尝试打破它的任何人谁偶然发现这个职位与同样的麻烦,因为我一直在.
第1步。

  • 找到你的数据(我的例子是关于英国房地产市场的新闻文章)
UK renters: are you living with someone you’ve fallen out with?
UK property asking prices stagnating, lifting hopes of softer landing for housing market

第二步。

  • 注解数据
2 UK renters: are you living with someone you’ve fallen out with?
1 fallen out with
1 fallen out
2 UK renters
2 living with someone
3 fallen
2 :
2 ?
2 living with
2 someone

3 UK property asking prices stagnating, lifting hopes of softer landing for housing market
2 UK property
3 asking prices stagnating
2 asking prices
4 lifting hopes
2 hopes
4 lifting hopes of softer landing
3 softer landing for housing market
2 housing market
2 lifting
2 landing
2 ,
  • 注解含义 *
Very Positive= 4
Positive = 3
Neutral = 2
Negative = 1
Very Negative = 0
  • 结构 *
2 UK renters: are you living with someone you’ve fallen out with?
   //Overall sentiment

1 fallen out with
   // Negative

1 fallen out
   // Negative

2 UK renters
   // Neutral

...etc..
  • 将注解数据保存为.txt(sample.txt)

第三步:

  • 找到您的stanford-corenlp-4.5.2.jar
    • 示例 * ~/.m2/repository/edu/stanford/nlp/stanford-corenlp/4.5.2

第四步:

  • 打开Bash并运行
  • java -cp "*" -mx5g edu.stanford.nlp.sentiment.BuildBinarizedDataset -input /c/Users/rusku/Desktop/StanfordNPL/rusSample/sample.txt
    • 替换上述数据位置 *

第五步:

  • 结果
(2 (2 (2 (2 UK) (2 renters)) (2 :)) (2 (2 (2 (2 are) (2 you)) (2 (2 living) (2 (2 with) (2 (2 someone) (2 (2 you) (2 (2 ▒ve) (1 (1 (3 fallen) (2 out)) (2 with)))))))) (2 ?)))
(3 (3 (2 (3 UK) (3 property)) (2 (3 asking) (3 prices))) (3 (3 (3 stagnating) (3 (2 ,) (4 (2 lifting) (2 hopes)))) (3 (3 of) (3 (3 (3 softer) (2 landing)) (3 (3 for) (2 (3 housing) (3 market)))))))

资源:Train Stanford CoreNLP about the sentiment of domain-specific phrases
这是我目前所得到的。
希望这个有用。

相关问题