linux 如何在Docker容器中下载具有适当安全证书的NLTK包?

gjmwrych  于 2023-05-22  发布在  Linux
关注(0)|答案(1)|浏览(181)

我已经尝试了所有提到的here和其他地方的组合,但我不断得到相同的错误。
下面是我的Dockerfile

FROM python:3.9

RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt

RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2

# Install dependencies
RUN apt-get update && apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]

COPY . /app

# Run the application:
CMD ["python", "-u", "app.py"]

Docker镜像构建得很好(我使用平台参数构建在Linux中运行的镜像,但我构建镜像的本地机器是Windows,并且detectron库没有安装在Windows中):

>>> docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => [internal] load build definition from Dockerfile                                                               0.0s
 => => transferring dockerfile: 634B                                                                               0.0s
 => [internal] load metadata for docker.io/library/python:3.9                                                      0.9s
 => [internal] load build context                                                                                  0.0s
 => => transferring context: 1.85kB                                                                                0.0s
 => [ 1/11] FROM docker.io/library/python:3.9@sha256:6ea9dafc96d7914c5c1d199f1f0195c4e05cf017b10666ca84cb7ce8e269  0.0s
 => CACHED [ 2/11] RUN pip install virtualenv && virtualenv venv -p python3                                        0.0s
 => CACHED [ 3/11] WORKDIR /app                                                                                    0.0s
 => CACHED [ 4/11] COPY requirements.txt ./                                                                        0.0s
 => CACHED [ 5/11] RUN pip install -r requirements.txt                                                             0.0s
 => CACHED [ 6/11] RUN git clone https://github.com/facebookresearch/detectron2.git                                0.0s
 => CACHED [ 7/11] RUN python -m pip install -e detectron2                                                         0.0s
 => CACHED [ 8/11] RUN apt-get update && apt-get install libgl1 -y                                                 0.0s
 => CACHED [ 9/11] RUN pip install -U nltk                                                                         0.0s
 => [10/11] RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]   22.1s
 => [11/11] COPY . /app                                                                                            0.0s
 => exporting to image                                                                                             0.1s
 => => exporting layers                                                                                            0.1s
 => => writing image sha256:83e2495addbc4cdf9b0885e1bb4c5b0fb0777177956eda56950bbf59c095d23b                       0.0s
 => => naming to docker.io/library/my_app

但我在尝试运行图像时一直得到下面的错误:

>>> docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     EOF occurred in violation of protocol (_ssl.c:1129)>
Traceback (most recent call last):
  File "/app/app.py", line 16, in <module>
    index = VectorstoreIndexCreator().from_loaders(loaders)
  File "/venv/lib/python3.9/site-packages/langchain/indexes/vectorstore.py", line 72, in from_loaders
    docs.extend(loader.load())
  File "/venv/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
    elements = self._get_elements()
  File "/venv/lib/python3.9/site-packages/langchain/document_loaders/pdf.py", line 37, in _get_elements
    return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 75, in partition_pdf
    return partition_pdf_or_image(
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 137, in partition_pdf_or_image
    return _partition_pdf_with_pdfminer(
  File "/venv/lib/python3.9/site-packages/unstructured/utils.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 248, in _partition_pdf_with_pdfminer
    elements = _process_pdfminer_pages(
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 293, in _process_pdfminer_pages
    _elements = partition_text(text=text)
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text.py", line 89, in partition_text
    elif is_possible_narrative_text(ctext):
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 76, in is_possible_narrative_text
    if exceeds_cap_ratio(text, threshold=cap_threshold):
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 273, in exceeds_cap_ratio
    if sentence_count(text, 3) > 1:
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 222, in sentence_count
    sentences = sent_tokenize(text)
  File "/venv/lib/python3.9/site-packages/unstructured/nlp/tokenize.py", line 38, in sent_tokenize
    return _sent_tokenize(text)
  File "/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/venv/lib/python3.9/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/venv/lib/python3.9/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/root/nltk_data'
    - '/venv/nltk_data'
    - '/venv/share/nltk_data'
    - '/venv/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************
ozxc1zmp

ozxc1zmp1#

我断开了我的机器与WiFi的连接,并将其连接到我的手机的热点,然后它运行没有任何错误,因为它现在能够下载NLTK包。非常奇怪(和愚蠢)的问题。我想知道是否有更好的解决方案,因为没有其他方法对我有效。

相关问题