nltk 离线测试

pxyaymoc  于 9个月前  发布在  其他
关注(0)|答案(2)|浏览(87)

(对不起,在问题跟踪器上打扰你关于一个可能不是bug的问题,但我没有其他与项目进行功能性沟通的方式)

在为openSUSE打包NLTK时,我想进行测试。问题是我们的构建系统(以及所有发行版的构建系统)与互联网隔离,所以我需要在不接触网络的情况下使运行测试套件成为可能。因此,我已经下载了所有的ntlk_data,并相应地设置了NTLK_DATA变量。不幸的是,结果并不好:

  1. [ 78s] + cd /home/abuild/rpmbuild/BUILD
  2. [ 78s] + cd nltk-3.7
  3. [ 78s] ++ readlink -f ./ntlk_data/
  4. [ 78s] + export NLTK_DATA=/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data
  5. [ 78s] + NLTK_DATA=/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data
  6. [ 78s] ++ '[' -f _current_flavor ']'
  7. [ 78s] ++ cat _current_flavor
  8. [ 78s] + last_flavor=python38
  9. [ 78s] + '[' -z python38 ']'
  10. [ 78s] + '[' python38 '!=' python39 ']'
  11. [ 78s] + '[' -d build ']'
  12. [ 78s] + mv build _build.python38
  13. [ 78s] + '[' -d _build.python39 ']'
  14. [ 78s] + mv _build.python39 build
  15. [ 78s] + echo python39
  16. [ 78s] + python_flavor=python39
  17. [ 78s] + PYTHONPATH=/home/abuild/rpmbuild/BUILDROOT/python-nltk-3.7-0.x86_64/usr/lib/python3.9/site-packages
  18. [ 78s] + PYTHONDONTWRITEBYTECODE=1
  19. [ 78s] + pytest-3.9 --ignore=_build.python39 --ignore=_build.python310 --ignore=_build.python38 -v
  20. [ 79s] ============================= test session starts ==============================
  21. [ 79s] platform linux -- Python 3.9.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /usr/bin/python3.9
  22. [ 79s] cachedir: .pytest_cache
  23. [ 79s] rootdir: /home/abuild/rpmbuild/BUILD/nltk-3.7
  24. [ 79s] plugins: cov-3.0.0, mock-3.6.1
  25. [ 95s] collecting ... collected 424 items / 3 errors / 421 selected
  26. [ 95s]
  27. [ 95s] ==================================== ERRORS ====================================
  28. [ 95s] _______________ ERROR collecting nltk/test/unit/test_corpora.py ________________
  29. [ 95s] nltk/corpus/util.py:84: in __load
  30. [ 95s] root = nltk.data.find(f"{self.subdir}/{zip_name}")
  31. [ 95s] nltk/data.py:583: in find
  32. [ 95s] raise LookupError(resource_not_found)
  33. [ 95s] E LookupError:
  34. [ 95s] E **********************************************************************
  35. [ 95s] E Resource ptb not found.
  36. [ 95s] E Please use the NLTK Downloader to obtain the resource:
  37. [ 95s] E
  38. [ 95s] E >>> import nltk
  39. [ 95s] E >>> nltk.download('ptb')
  40. [ 95s] E
  41. [ 95s] E For more information see: https://www.nltk.org/data.html
  42. [ 95s] E
  43. [ 95s] E Attempted to load corpora/ptb.zip/ptb/
  44. [ 95s] E
  45. [ 95s] E Searched in:
  46. [ 95s] E - '/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data'
  47. [ 95s] E - '/home/abuild/nltk_data'
  48. [ 95s] E - '/usr/nltk_data'
  49. [ 95s] E - '/usr/share/nltk_data'
  50. [ 95s] E - '/usr/lib/nltk_data'
  51. [ 95s] E - '/usr/share/nltk_data'
  52. [ 95s] E - '/usr/local/share/nltk_data'
  53. [ 95s] E - '/usr/lib/nltk_data'
  54. [ 95s] E - '/usr/local/lib/nltk_data'
  55. [ 95s] E **********************************************************************
  56. [ 95s]
  57. [ 95s] During handling of the above exception, another exception occurred:
  58. [ 95s] nltk/test/unit/test_corpora.py:186: in <module>
  59. [ 95s] ???
  60. [ 95s] nltk/corpus/util.py:121: in __getattr__
  61. [ 95s] self.__load()
  62. [ 95s] nltk/corpus/util.py:86: in __load
  63. [ 95s] raise e
  64. [ 95s] nltk/corpus/util.py:81: in __load
  65. [ 95s] root = nltk.data.find(f"{self.subdir}/{self.__name}")
  66. [ 95s] nltk/data.py:583: in find
  67. [ 95s] raise LookupError(resource_not_found)
  68. [ 95s] E LookupError:
  69. [ 95s] E **********************************************************************
  70. [ 95s] E Resource ptb not found.
  71. [ 95s] E Please use the NLTK Downloader to obtain the resource:
  72. [ 95s] E
  73. [ 95s] E >>> import nltk
  74. [ 95s] E >>> nltk.download('ptb')
  75. [ 95s] E
  76. [ 95s] E For more information see: https://www.nltk.org/data.html
  77. [ 95s] E
  78. [ 95s] E Attempted to load corpora/ptb
  79. [ 95s] E
  80. [ 95s] E Searched in:
  81. [ 95s] E - '/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data'
  82. [ 95s] E - '/home/abuild/nltk_data'
  83. [ 95s] E - '/usr/nltk_data'
  84. [ 95s] E - '/usr/share/nltk_data'
  85. [ 95s] E - '/usr/lib/nltk_data'
  86. [ 95s] E - '/usr/share/nltk_data'
  87. [ 95s] E - '/usr/local/share/nltk_data'
  88. [ 95s] E - '/usr/lib/nltk_data'
  89. [ 95s] E - '/usr/local/lib/nltk_data'
  90. [ 95s] E **********************************************************************
  91. [ 95s] _______________ ERROR collecting nltk/test/unit/test_nombank.py ________________
  92. [ 95s] nltk/corpus/util.py:84: in __load
  93. [ 95s] root = nltk.data.find(f"{self.subdir}/{zip_name}")
  94. [ 95s] nltk/data.py:583: in find
  95. [ 95s] raise LookupError(resource_not_found)
  96. [ 95s] E LookupError:
  97. [ 95s] E **********************************************************************
  98. [ 95s] E Resource nombank.1.0 not found.
  99. [ 95s] E Please use the NLTK Downloader to obtain the resource:
  100. [ 95s] E
  101. [ 95s] E >>> import nltk
  102. [ 95s] E >>> nltk.download('nombank.1.0')
  103. [ 95s] E
  104. [ 95s] E For more information see: https://www.nltk.org/data.html
  105. [ 95s] E
  106. [ 95s] E Attempted to load corpora/nombank.1.0.zip/nombank.1.0/
  107. [ 95s] E
  108. [ 95s] E Searched in:
  109. [ 95s] E - '/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data'
  110. [ 95s] E - '/home/abuild/nltk_data'
  111. [ 95s] E - '/usr/nltk_data'
  112. [ 95s] E - '/usr/share/nltk_data'
  113. [ 95s] E - '/usr/lib/nltk_data'
  114. [ 95s] E - '/usr/share/nltk_data'
  115. [ 95s] E - '/usr/local/share/nltk_data'
  116. [ 95s] E - '/usr/lib/nltk_data'
  117. [ 95s] E - '/usr/local/lib/nltk_data'
  118. [ 95s] E **********************************************************************
  119. [ 95s]
  120. [ 95s] During handling of the above exception, another exception occurred:
  121. [ 95s] nltk/test/unit/test_nombank.py:10: in <module>
  122. [ 95s] nombank.nouns()
  123. [ 95s] nltk/corpus/util.py:121: in __getattr__
  124. [ 95s] self.__load()
  125. [ 95s] nltk/corpus/util.py:86: in __load
  126. [ 95s] raise e
  127. [ 95s] nltk/corpus/util.py:81: in __load
  128. [ 95s] root = nltk.data.find(f"{self.subdir}/{self.__name}")
  129. [ 95s] nltk/data.py:583: in find
  130. [ 95s] raise LookupError(resource_not_found)
  131. [ 95s] E LookupError:
  132. [ 95s] E **********************************************************************
  133. [ 95s] E Resource nombank.1.0 not found.
  134. [ 95s] E Please use the NLTK Downloader to obtain the resource:
  135. [ 95s] E
  136. [ 95s] E >>> import nltk
  137. [ 95s] E >>> nltk.download('nombank.1.0')
  138. [ 95s] E
  139. [ 95s] E For more information see: https://www.nltk.org/data.html
  140. [ 95s] E
  141. [ 95s] E Attempted to load corpora/nombank.1.0
  142. [ 95s] E
  143. [ 95s] E Searched in:
  144. [ 95s] E - '/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data'
  145. [ 95s] E - '/home/abuild/nltk_data'
  146. [ 95s] E - '/usr/nltk_data'
  147. [ 95s] E - '/usr/share/nltk_data'
  148. [ 95s] E - '/usr/lib/nltk_data'
  149. [ 95s] E - '/usr/share/nltk_data'
  150. [ 95s] E - '/usr/local/share/nltk_data'
  151. [ 95s] E - '/usr/lib/nltk_data'
  152. [ 95s] E - '/usr/local/lib/nltk_data'
  153. [ 95s] E **********************************************************************
  154. [ 95s] _______________ ERROR collecting nltk/test/unit/test_wordnet.py ________________
  155. [ 95s] nltk/corpus/util.py:84: in __load
  156. [ 95s] root = nltk.data.find(f"{self.subdir}/{zip_name}")
  157. [ 95s] nltk/data.py:583: in find
  158. [ 95s] raise LookupError(resource_not_found)
  159. [ 95s] E LookupError:
  160. [ 95s] E **********************************************************************
  161. [ 95s] E Resource wordnet not found.
  162. [ 95s] E Please use the NLTK Downloader to obtain the resource:
  163. [ 95s] E
  164. [ 95s] E >>> import nltk
  165. [ 95s] E >>> nltk.download('wordnet')
  166. [ 95s] E
  167. [ 95s] E For more information see: https://www.nltk.org/data.html
  168. [ 95s] E
  169. [ 95s] E Attempted to load corpora/wordnet.zip/wordnet/
  170. [ 95s] E
  171. [ 95s] E Searched in:
  172. [ 95s] E - '/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data'
  173. [ 95s] E - '/home/abuild/nltk_data'
  174. [ 95s] E - '/usr/nltk_data'
  175. [ 95s] E - '/usr/share/nltk_data'
  176. [ 95s] E - '/usr/lib/nltk_data'
  177. [ 95s] E - '/usr/share/nltk_data'
  178. [ 95s] E - '/usr/local/share/nltk_data'
  179. [ 95s] E - '/usr/lib/nltk_data'
  180. [ 95s] E - '/usr/local/lib/nltk_data'
  181. [ 95s] E **********************************************************************
  182. [ 95s]
  183. [ 95s] During handling of the above exception, another exception occurred:
  184. [ 95s] nltk/test/unit/test_wordnet.py:10: in <module>
  185. [ 95s] wn.ensure_loaded()
  186. [ 95s] nltk/corpus/util.py:121: in __getattr__
  187. [ 95s] self.__load()
  188. [ 95s] nltk/corpus/util.py:86: in __load
  189. [ 95s] raise e
  190. [ 95s] nltk/corpus/util.py:81: in __load
  191. [ 95s] root = nltk.data.find(f"{self.subdir}/{self.__name}")
  192. [ 95s] nltk/data.py:583: in find
  193. [ 95s] raise LookupError(resource_not_found)
  194. [ 95s] E LookupError:
  195. [ 95s] E **********************************************************************
  196. [ 95s] E Resource wordnet not found.
  197. [ 95s] E Please use the NLTK Downloader to obtain the resource:
  198. [ 95s] E
  199. [ 95s] E >>> import nltk
  200. [ 95s] E >>> nltk.download('wordnet')
  201. [ 95s] E
  202. [ 95s] E For more information see: https://www.nltk.org/data.html
  203. [ 95s] E
  204. [ 95s] E Attempted to load corpora/wordnet
  205. [ 95s] E
  206. [ 95s] E Searched in:
  207. [ 95s] E - '/home/abuild/rpmbuild/BUILD/nltk-3.7/ntlk_data'
  208. [ 95s] E - '/home/abuild/nltk_data'
  209. [ 95s] E - '/usr/nltk_data'
  210. [ 95s] E - '/usr/share/nltk_data'
  211. [ 95s] E - '/usr/lib/nltk_data'
  212. [ 95s] E - '/usr/share/nltk_data'
  213. [ 95s] E - '/usr/local/share/nltk_data'
  214. [ 95s] E - '/usr/lib/nltk_data'
  215. [ 95s] E - '/usr/local/lib/nltk_data'
  216. [ 95s] E **********************************************************************
  217. [ 95s] =============================== warnings summary ===============================
  218. [ 95s] nltk/test/unit/test_tokenize.py:22
  219. [ 95s] /home/abuild/rpmbuild/BUILD/nltk-3.7/nltk/test/unit/test_tokenize.py:22: DeprecationWarning:
  220. [ 95s] The StanfordTokenizer will be deprecated in version 3.2.5.
  221. [ 95s] Please use nltk.parse.corenlp.CoreNLPTokenizer instead.'
  222. [ 95s] seg = StanfordSegmenter()
  223. [ 95s]
  224. [ 95s] -- Docs: https://docs.pytest.org/en/stable/warnings.html
  225. [ 95s] =========================== short test summary info ============================
  226. [ 95s] ERROR nltk/test/unit/test_corpora.py - LookupError:
  227. [ 95s] ERROR nltk/test/unit/test_nombank.py - LookupError:
  228. [ 95s] ERROR nltk/test/unit/test_wordnet.py - LookupError:
  229. [ 95s] !!!!!!!!!!!!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!!!!!!!!!!!!
  230. [ 95s] ======================== 1 warning, 3 errors in 16.17s =========================
  231. [ 95s] error: Bad exit status from /var/tmp/rpm-tmp.xNuAZW (%check)

Complete log
有什么想法吗?
感谢您的任何回复,
Matěj

https://matej.ceplovi.cz/blog/ , Jabber: mcepl@ceplovi.cz
GPG指纹:3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8
清晰的想法和厚厚的巧克力。
(主意应该清晰且巧克力厚。)
--西班牙谚语

8ljdwjyq

8ljdwjyq1#

@mcepl 这个库叫做NLTK(不是"NTLK"):看起来你不小心颠倒了"L"和"T" 😊
请告诉我们这是否解决了问题。我感觉你可能会遇到其他一些问题(例如test_downloader.py测试),但请先尝试一下:)

zwghvu4y

zwghvu4y2#

是的,这是一个错误,但要使测试在离线状态下工作需要更多的努力:

  1. 比仅仅 all 大得多(至少需要添加 comtrans, conll2007, jeita, knbc, machado, masc_tagged, nombank.1.0, panlex_swadesh, perluniprops, propbank, reuters, semcor, universal_treebanks_v20 )
  2. rm tools/nltk_term_index.py tools/run_doctests.py nltk_data/corpora/semcor/semcor.py ,这似乎过时且不可移植(或者使用过时的库 ... epydoc ?真的吗?)
  3. 这两个补丁:
    a. 允许跳过网络测试:
  1. ---
  2. nltk/test/unit/test_downloader.py | 4 ++++
  3. setup.cfg | 4 ++++
  4. 2 files changed, 8 insertions(+)
  5. --- a/nltk/test/unit/test_downloader.py
  6. +++ b/nltk/test/unit/test_downloader.py
  7. @@ -1,6 +1,9 @@
  8. from nltk import download
  9. +import pytest
  10. +
  11. +@pytest.mark.network
  12. def test_downloader_using_existing_parent_download_dir(tmp_path):
  13. """Test that download works properly when the parent folder of the download_dir exists"""
  14. @@ -9,6 +12,7 @@ def test_downloader_using_existing_paren
  15. assert download_status is True
  16. +@pytest.mark.network
  17. def test_downloader_using_non_existing_parent_download_dir(tmp_path):
  18. """Test that download works properly when the parent folder of the download_dir does not exist"""
  19. --- a/setup.cfg
  20. +++ b/setup.cfg
  21. @@ -1,3 +1,7 @@
  22. +[tool:pytest]
  23. +markers =
  24. + network: test case requires network connection
  25. +
  26. [metadata]
  27. license_files =
  28. LICENSE.txt
  1. b. to make at least syntactically correct all those ancient scripts (some built for Python 2.6!):
  1. ---
  2. nltk_data/corpora/pl196x/splitter.py | 4 ++--
  3. nltk_data/taggers/universal_tagset/universal_tags.py | 5 -----
  4. tools/find_deprecated.py | 2 +-
  5. 3 files changed, 3 insertions(+), 8 deletions(-)
  6. --- a/nltk_data/corpora/pl196x/splitter.py
  7. +++ b/nltk_data/corpora/pl196x/splitter.py
  8. @@ -1,4 +1,4 @@
  9. -#!/usr/bin/python
  10. +#!/usr/bin/python3
  11. import sys, re
  12. @@ -7,7 +7,7 @@ TEXTID = re.compile(r'<text id="(.*)">')
  13. if __name__ == '__main__':
  14. if len(sys.argv) != 2:
  15. - print 'One argument required: a pl196x corpus to split.'
  16. + print('One argument required: a pl196x corpus to split.')
  17. sys.exit()
  18. inputFileName = sys.argv[1]
  19. --- a/nltk_data/taggers/universal_tagset/universal_tags.py
  20. +++ b/nltk_data/taggers/universal_tagset/universal_tags.py
  21. @@ -22,11 +22,6 @@ X - other: foreign words, typos, abbrevi
  22. @author: Nathan Schneider (nschneid)
  23. @since: 2011-05-06
  24. '''
  25. -
  26. -# Strive towards Python 3 compatibility
  27. -from __future__ import print_function, unicode_literals, division
  28. -from future_builtins import map, filter
  29. -
  30. import re, glob
  31. from collections import defaultdict
  32. --- a/tools/find_deprecated.py
  33. +++ b/tools/find_deprecated.py
  34. @@ -29,7 +29,7 @@ import textwrap
  35. import tokenize
  36. from doctest import DocTestParser, register_optionflag
  37. -from cStringIO import StringIO
  38. +from io import StringIO
  39. import nltk.corpus
  40. from nltk import defaultdict
  1. 即使这样还不够让doctest正常工作。当我添加doctest运行以使 %check 部分生效时,它显示为:
  1. %check
  2. export NLTK_DATA=$(readlink -f ./nltk_data/)
  3. export PYTEST_ADDOPTS="--doctest-modules"
  4. %pytest -k 'not network'

( %pytest 在这里展开为:

  1. PYTHONPATH=/home/abuild/rpmbuild/BUILDROOT/python-nltk-3.7-0.x86_64/usr/lib/python3.8/site-packages
  2. PYTHONDONTWRITEBYTECODE=1
  3. pytest-3.8 --ignore=_build.python38 --ignore=_build.python39 --ignore=_build.python310 -v -k 'not network'

),所以当我这样做时,我得到:

  1. [ 68s] + pytest-3.8 --ignore=_build.python38 --ignore=_build.python39 --ignore=_build.python310 -v -k 'not network'
  2. [ 68s] ============================= test session starts ==============================
  3. [ 68s] platform linux -- Python 3.8.16, pytest-7.1.2, pluggy-1.0.0 -- /usr/bin/python3.8
  4. [ 68s] cachedir: .pytest_cache
  5. [ 68s] rootdir: /home/abuild/rpmbuild/BUILD/nltk-3.7, configfile: setup.cfg
  6. [ 68s] plugins: mock-3.6.1, cov-4.0.0
  7. [ 112s] collecting ... collected 726 items / 1 error / 2 deselected / 724 selected
  8. [ 112s]
  9. [ 112s] ==================================== ERRORS ====================================
  10. [ 112s] ______________________ ERROR collecting nltk/__init__.py _______________________
  11. [ 112s] /usr/lib64/python3.8/doctest.py:939: in find
  12. [ 112s] self._find(tests, obj, name, module, source_lines, globs, {})
  13. [ 112s] /usr/lib/python3.8/site-packages/_pytest/doctest.py:533: in _find
  14. [ 112s] super()._find( # type:ignore[misc]
  15. [ 112s] /usr/lib64/python3.8/doctest.py:995: in _find
  16. [ 112s] for valname, val in obj.__dict__.items():
  17. [ 112s] E RuntimeError: dictionary changed size during iteration
  18. [ 112s] =============================== warnings summary ===============================
  19. [ 112s] nltk/test/unit/test_tokenize.py:22
  20. [ 112s] /home/abuild/rpmbuild/BUILD/nltk-3.7/nltk/test/unit/test_tokenize.py:22: DeprecationWarning:
  21. [ 112s] The StanfordTokenizer will be deprecated in version 3.2.5.
  22. [ 112s] Please use nltk.parse.corenlp.CoreNLPTokenizer instead.'
  23. [ 112s] seg = StanfordSegmenter()
  24. [ 112s]
  25. [ 112s] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
  26. [ 112s] =========================== short test summary info ============================
  27. [ 112s] ERROR nltk/__init__.py - RuntimeError: dictionary changed size during iteration
  28. [ 112s] !!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
  29. [ 112s] ================== 2 deselected, 1 warning, 1 error in 43.62s ==================

我现在暂时跳过doctest。
圣诞快乐,2023年万事如意!

展开查看全部

相关问题