文章的Python可读性分数(使用Spacy)

zpjtge22 于 2023-05-08 发布在 Python

关注(0)|答案(1)|浏览(176)

我循环浏览了100篇文章的文本，并使用spacy进行了评分：我使用的评分方法是：

Dale Chall可读性指数
Coleman-Liau指数
自动可读性指数

但是，我不知道如何解释结果。
例如，Coleman-Liau指数应根据年级水平在14或15左右。
然而，我得到了1245和1633的分数
另一个例子，Dale-Chall可读性公式应该是4.5或5.6。
然而，我的分数是245和340
我需要对这些分数做些什么吗？
这是我的代码

import pandas as pd
import spacy
from spacy_readability import Readability

articles = pd.read_csv('articles.csv')

# get the title and body of text from the articles
text = articles[['title','body']]
df = pd.DataFrame(text)

for i in range(1, 100):
    # select the body of text to score on the readability index
    text = df.iloc[i, 1]
    doc = nlp(text)
    # Print the cell containing the article title
    print(df.iloc[i,0])
    # score the text
    print("dale chall", doc._.dale_chall)
    print("coleman", doc._.coleman_liau_index)
    print("readability", doc._.automated_readability_index)

示例输出：
机器学习如何帮助人们创新的10种方式
Dale-Chall：255.843042857
Coleman-Liau：1245.025714285
自动可读性指数：998.556428571

python-3.x

来源：https://stackoverflow.com/questions/58215681/python-readability-score-for-an-article-using-spacy

1条答案

按热度按时间

pbpqsu0x1#

我会研究使用预实现的库或查看源代码。这是我的回购，但我会分享它，因为我认为它可以帮助你。https://github.com/brucewlee/lftk

# bash
pip install lftk
pip install spacy
python -m spacy download en_core_web_sm

# python
import spacy
import lftk

nlp = spacy.load("en_core_web_sm")
doc = nlp("Your text")
LFTK = lftk.Extractor(docs = doc)

# extract readability features
extracted_features = LFTK.extract(features = ["cole", "auto"])

赞(0）回复(0）举报 2023-05-08

我来回答

文章的Python可读性分数(使用Spacy)

1条答案

相关问题

热门标签

最新问答