Python NLTK文本离散图的y垂直轴为向后/反向顺序

kmbjn2e3 于 2023-10-14 发布在 Python

关注(0)|答案(2)|浏览(125)

从上个月开始，NLTK dispersion_plot似乎在我的机器上有y（垂直）轴的颠倒顺序。这可能与我的软件版本有关（我在学校的虚拟机上）。
版本：nltk 3.8.1 matplotlib 3.7.2 Python 3.9.13
验证码：

from nltk.draw.dispersion import dispersion_plot
words=['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets=['aa','bbb', 'f', 'cccc']
dispersion_plot(words, targets)

预期：在开始时，有一个人在场，在结束时，有一个人在场。实际：它是向后的！还注意，F应该完全不存在，而不是BBB不存在。
结论：Y轴向后。

python

来源：https://stackoverflow.com/questions/77262318/python-nltk-text-dispersion-plot-has-y-vertical-axis-is-in-backwards-reversed

2条答案

按热度按时间

46qrfjad1#

我发现源代码nltk.draw.dispersion，似乎有错误。

def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
    """
    Generate a lexical dispersion plot.
    :param text: The source text
    :type text: list(str) or iter(str)
    :param words: The target words
    :type words: list of str
    :param ignore_case: flag to set if case should be ignored when searching text
    :type ignore_case: bool
    :return: a matplotlib Axes object that may still be modified before plotting
    :rtype: Axes
    """
    try:
        import matplotlib.pyplot as plt
    except ImportError as e:
        raise ImportError(
            "The plot function requires matplotlib to be installed. "
            "See https://matplotlib.org/"
        ) from e
    word2y = {
        word.casefold() if ignore_case else word: y
        for y, word in enumerate(reversed(words))  # <--- HERE
    }
    xs, ys = [], []
    for x, token in enumerate(text):
        token = token.casefold() if ignore_case else token
        y = word2y.get(token)
        if y is not None:
            xs.append(x)
            ys.append(y)
    _, ax = plt.subplots()
    ax.plot(xs, ys, "|")
    ax.set_yticks(list(range(len(words))), words, color="C0")  # <--- HERE
    ax.set_ylim(-1, len(words))
    ax.set_title(title)
    ax.set_xlabel("Word Offset")
    return ax
if __name__ == "__main__":
    import matplotlib.pyplot as plt
    from nltk.corpus import gutenberg
    words = ["Elinor", "Marianne", "Edward", "Willoughby"]
    dispersion_plot(gutenberg.words("austen-sense.txt"), words)
    plt.show()

它使用reversed(words)计算word2y

for y, word in enumerate(reversed(words))

但后来它使用ax.set_yticks()使用words但它应该使用reversed(words)

ax.set_yticks(list(range(len(words))), words, color="C0")

(or它应该计算word2y而不使用reversed()）。
我在上面的代码中添加了# <--- HERE来显示这些地方。
它可能需要将其作为一个问题报告。
此时，您可以获取ax并使用set_yticks和reversed来纠正它。
在您的代码中，它将是targets而不是words

ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")

完整工作代码

import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot
words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
plt.show()

**编辑：**我似乎这个问题是几个月前报告的，他们在GitHub上的代码中添加了reversed()-可能它会在下一个版本中工作

dispersion plot not working properly · Issue #3133 · nltk/nltk
dispersion plot not working properly by Apros7 · Pull Request #3134 · nltk/nltk

展开查看全部

赞(0）回复(0）举报 2023-10-14

lnvxswe22#

基于@furas的答案️，我进一步添加了一个if条件，只有当y ticks确实被打破/向后时才反转它们。这意味着一旦他们修复了库错误（这意味着很快），代码仍然可以工作。

from nltk.draw.dispersion import dispersion_plot
targets=['a', 'b']
filtered_text = ["a","a","b"]
my_plot = dispersion_plot(filtered_text, targets, ignore_case=True)
# THIS IS NEW: if targets are wrong, fix them (reverse them)
if [label.get_text() for label in my_plot.get_yticklabels()] != reversed(targets):
    my_plot.set_yticks(list(range(len(targets))), reversed(targets))
plt.show()

（我在本地修复了图形库，并使用新版本进行了测试，代码可以在旧的坏库和新修复的库上工作）x1c 0d1x

赞(0）回复(0）举报 2023-10-14

我来回答

Python NLTK文本离散图的y垂直轴为向后/反向顺序

2条答案

相关问题

热门标签

最新问答