java 有人能解释一下如何使用StanfordNLP创建PTB数据集和/或训练我自己的模型吗？

pn9klfpd 于 2023-03-16 发布在 Java

关注(0)|答案(1)|浏览(164)

我正在学习情感分析，但我似乎找不到任何关于如何创建PTB数据集的在线概述。我正在使用带有Java的StanfordNLP。我下载了他们使用的测试、开发和验证数据，但我无法理解这些数据是如何概述的：
test.txt:

(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2 (2 even) (3 greater)))) (2 (2 than) (2 (2 (2 (2 (1 (2 Arnold) (2 Schwarzenegger)) (2 ,)) (2 (2 Jean-Claud) (2 (2 Van) (2 Damme)))) (2 or)) (2 (2 Steven) (2 Segal))))))))))))) (2 .)))

我认为数字与情绪值是一致的，但我仍然不知道它是如何工作的。
TLDR;我正在尝试开发我自己的新闻分析模型，并且已经看到StanfordNLP模型已经在电影评论上进行了训练，这导致了糟糕的情绪分析，所以，我想尝试开发我自己的模型，但是我在网上找不到任何东西来教授每个元素是什么或者如何做。
充其量;在这一页上概述：https://nlp.stanford.edu/sentiment/code.html
数据集和要训练的代码是否可用。

Models can be retrained using the following command using the PTB format dataset:

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz

我已经准备好了需要解析的数据。

Java

来源：https://stackoverflow.com/questions/75744401/can-someone-explain-how-to-create-a-ptb-dataset-and-or-train-my-own-model-using

1条答案

按热度按时间

dddzy1tm1#

好吧..所以我做了一些挖掘，并已开始最终了解（一些什么）作为如何创建一个数据集树，并将尝试打破它的任何人谁偶然发现这个职位与同样的麻烦，因为我一直在.
第1步。

找到你的数据（我的例子是关于英国房地产市场的新闻文章）

UK renters: are you living with someone you’ve fallen out with?
UK property asking prices stagnating, lifting hopes of softer landing for housing market

第二步。

注解数据

2 UK renters: are you living with someone you’ve fallen out with?
1 fallen out with
1 fallen out
2 UK renters
2 living with someone
3 fallen
2 :
2 ?
2 living with
2 someone

3 UK property asking prices stagnating, lifting hopes of softer landing for housing market
2 UK property
3 asking prices stagnating
2 asking prices
4 lifting hopes
2 hopes
4 lifting hopes of softer landing
3 softer landing for housing market
2 housing market
2 lifting
2 landing
2 ,

注解含义 *

Very Positive= 4
Positive = 3
Neutral = 2
Negative = 1
Very Negative = 0

结构 *

2 UK renters: are you living with someone you’ve fallen out with?
   //Overall sentiment

1 fallen out with
   // Negative

1 fallen out
   // Negative

2 UK renters
   // Neutral

...etc..

将注解数据保存为.txt（sample.txt）

第三步：

找到您的stanford-corenlp-4.5.2.jar
- 示例 * ~/.m2/repository/edu/stanford/nlp/stanford-corenlp/4.5.2

第四步：

打开Bash并运行
java -cp "*" -mx5g edu.stanford.nlp.sentiment.BuildBinarizedDataset -input /c/Users/rusku/Desktop/StanfordNPL/rusSample/sample.txt
- 替换上述数据位置 *

第五步：

结果

(2 (2 (2 (2 UK) (2 renters)) (2 :)) (2 (2 (2 (2 are) (2 you)) (2 (2 living) (2 (2 with) (2 (2 someone) (2 (2 you) (2 (2 ▒ve) (1 (1 (3 fallen) (2 out)) (2 with)))))))) (2 ?)))
(3 (3 (2 (3 UK) (3 property)) (2 (3 asking) (3 prices))) (3 (3 (3 stagnating) (3 (2 ,) (4 (2 lifting) (2 hopes)))) (3 (3 of) (3 (3 (3 softer) (2 landing)) (3 (3 for) (2 (3 housing) (3 market)))))))

资源：Train Stanford CoreNLP about the sentiment of domain-specific phrases
这是我目前所得到的。
希望这个有用。

赞(0）回复(0）举报 2023-03-16

我来回答

java 有人能解释一下如何使用StanfordNLP创建PTB数据集和/或训练我自己的模型吗？

1条答案

相关问题

热门标签

最新问答