python-3.x Unicode编码错误:"charmap"编解码器无法对位置276445中的字符"\u25c4"进行编码:字符Map到< undefined>

9bfwbjaz  于 2023-02-01  发布在  Python
关注(0)|答案(1)|浏览(175)

在这个项目中,我尝试在scikit-learn包的帮助下,利用pycaret包来分析一些时间序列,具体来说,我导入了一些模块,如下所示:

from pycaret.regression import (setup, compare_models, predict_model, plot_model, finalize_model, load_model)

# setting up the stage to initialize the training environment
s = setup(
            data=train,
            target=target_var,
            ignore_features = ['Series'],
            numeric_features=involved_numerics,
            categorical_features = categorics,
            silent=True,
            log_experiment=True,
         )

 # Now, to train machine learning models, we need to compare models and find the best one
 best_model = compare_models(sort='MAE') 

 # Making some plots
 for id, name in zip(ids, names):
     plot_model(best_model, plot=id, scale=3, save=True)
 .
 .
 .

我能够成功地运行文档中提到的可用模型列表中的一些模型的代码,但不是所有模型的代码。然而,对于一些特定的模型(如 Recursive Feat. Selection),有一个错误消息:

Traceback (most recent call last):
  File "c:/Users/username/Desktop/project/project.py", line 55, 
in <module>
    main()
  File "c:/Users/username/Desktop/project/project.py", line 48, 
in main
    ml_modelling(data, train, test)
  File "c:\Users\username\Desktop\project\utilities.py", line 1070, in ml_modelling
    plot_model(best_model, plot=id, scale=3, save=True)
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\regression.py", line 1601, in plot_model
    return pycaret.internal.tabular.plot_model(
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 7712, in plot_model
    ret = locals()[plot]()
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 6293, in residuals_interactive
    resplots.write_html(plot_filename)
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\plots\residual_plots.py", line 673, in write_html
    f.write(html)
  File "C:\Users\username\anaconda3\envs\py38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]        
UnicodeEncodeError: 'charmap' codec can't encode character '\u25c4' in position 276445: character maps to <undefined>

火车来了:

火车

Series  x     y   z    ID    var1  var2  var3  var4  var5  var6
0         1  2     1   3   True    -3    -4     6     7     4    6
1         2  2     1   7   False   22     0     3     5     2    8
2         3  2     1   0   True     3    -6     3     5     4    4
3         4  2     1   4   False   27    -4     8     3    -3    2
.
.
.

我使用VSCode在Windows 10机器上运行我的python工具,下面是安装在conda环境中的所有软件包的列表:

name: py38
channels:
  - conda-forge
  - defaults
dependencies:
  - bzip2=1.0.8=h8ffe710_4
  - ca-certificates=2022.12.7=h5b45459_0
  - et_xmlfile=1.1.0=pyhd8ed1ab_0
  - libffi=3.4.2=h8ffe710_5
  - libsqlite=3.40.0=hcfcfb64_0
  - libzlib=1.2.13=hcfcfb64_4
  - openpyxl=3.0.10=py38h91455d4_2
  - openssl=3.0.7=hcfcfb64_2
  - pip=22.3.1=pyhd8ed1ab_0
  - python=3.8.15=h4de0772_1_cpython
  - python_abi=3.8=3_cp38
  - setuptools=66.1.1=pyhd8ed1ab_0
  - tk=8.6.12=h8ffe710_0
  - ucrt=10.0.22621.0=h57928b3_0
  - vc=14.3=hb6edc58_10
  - vs2015_runtime=14.34.31931=h4c5c07a_10
  - wheel=0.38.4=pyhd8ed1ab_0
  - xz=5.2.6=h8d14728_0
  - pip:
      - alembic==1.9.2
      - asttokens==2.2.1
      - attrs==22.2.0
      - backcall==0.2.0
      - blis==0.7.9
      - boruta==0.3
      - catalogue==1.0.2
      - certifi==2022.12.7
      - charset-normalizer==3.0.1
      - click==8.1.3
      - cloudpickle==2.2.1
      - colorama==0.4.6
      - colorlover==0.3.0
      - comm==0.1.2
      - contourpy==1.0.7
      - cufflinks==0.17.3
      - cycler==0.11.0
      - cymem==2.0.7
      - cython==0.29.14
      - databricks-cli==0.17.4
      - debugpy==1.6.6
      - decorator==5.1.1
      - docker==6.0.1
      - entrypoints==0.4
      - executing==1.2.0
      - flask==2.2.2
      - fonttools==4.38.0
      - funcy==1.18
      - future==0.18.3
      - gensim==3.8.3
      - gitdb==4.0.10
      - gitpython==3.1.30
      - greenlet==2.0.2
      - htmlmin==0.1.12
      - idna==3.4
      - imagehash==4.3.1
      - imbalanced-learn==0.7.0
      - importlib-metadata==5.2.0
      - importlib-resources==5.10.2
      - ipykernel==6.20.2
      - ipython==8.9.0
      - ipywidgets==8.0.4
      - itsdangerous==2.1.2
      - jedi==0.18.2
      - jinja2==3.1.2
      - joblib==1.2.0
      - jupyter-client==8.0.1
      - jupyter-core==5.1.5
      - jupyterlab-widgets==3.0.5
      - kiwisolver==1.4.4
      - kmodes==0.12.2
      - lightgbm==3.3.5
      - llvmlite==0.37.0
      - mako==1.2.4
      - markdown==3.4.1
      - markupsafe==2.1.2
      - matplotlib==3.6.3
      - matplotlib-inline==0.1.6
      - mlflow==2.1.1
      - mlxtend==0.19.0
      - multimethod==1.9.1
      - murmurhash==1.0.9
      - nest-asyncio==1.5.6
      - networkx==3.0
      - nltk==3.8.1
      - numba==0.54.1
      - numexpr==2.8.4
      - numpy==1.20.3
      - oauthlib==3.2.2
      - packaging==22.0
      - pandas==1.5.3
      - pandas-profiling==3.6.3
      - parso==0.8.3
      - patsy==0.5.3
      - phik==0.12.3
      - pickleshare==0.7.5
      - pillow==9.4.0
      - plac==1.1.3
      - platformdirs==2.6.2
      - plotly==5.13.0
      - preshed==3.0.8
      - prompt-toolkit==3.0.36
      - protobuf==4.21.12
      - psutil==5.9.4
      - pure-eval==0.2.2
      - pyarrow==10.0.1
      - pycaret==2.3.10
      - pydantic==1.10.4
      - pygments==2.14.0
      - pyjwt==2.6.0
      - pyldavis==3.3.1
      - pynndescent==0.5.8
      - pyod==1.0.7
      - pyparsing==3.0.9
      - python-dateutil==2.8.2
      - pytz==2022.7.1
      - pywavelets==1.4.1
      - pywin32==305
      - pyyaml==5.4.1
      - pyzmq==25.0.0
      - querystring-parser==1.2.4
      - regex==2022.10.31
      - requests==2.28.2
      - scikit-learn==0.23.2
      - scikit-plot==0.3.7
      - scipy==1.5.4
      - seaborn==0.12.2
      - shap==0.41.0
      - six==1.16.0
      - sklearn==0.0.post1
      - slicer==0.0.7
      - smart-open==6.3.0
      - smmap==5.0.0
      - spacy==2.3.9
      - sqlalchemy==1.4.46
      - sqlparse==0.4.3
      - srsly==1.0.6
      - stack-data==0.6.2
      - statsmodels==0.13.5
      - tabulate==0.9.0
      - tangled-up-in-unicode==0.2.0
      - tenacity==8.1.0
      - textblob==0.17.1
      - thinc==7.4.6
      - threadpoolctl==3.1.0
      - tornado==6.2
      - tqdm==4.64.1
      - traitlets==5.8.1
      - typeguard==2.13.3
      - typing-extensions==4.4.0
      - umap-learn==0.5.3
      - urllib3==1.26.14
      - visions==0.7.5
      - waitress==2.1.2
      - wasabi==0.10.1
      - wcwidth==0.2.6
      - websocket-client==1.5.0
      - werkzeug==2.2.2
      - widgetsnbextension==4.0.5
      - wordcloud==1.8.2.2
      - yellowbrick==1.2.1
      - zipp==3.12.0
prefix: C:\Users\username\anaconda3\envs\py38
0lvr5msh

0lvr5msh1#

这可能是库中的一个问题,正在加载的数据在unicode中有破折号。
下面是pycaret的源代码:

def write_html(self, plot_filename):
        """
        Write the current plots to a file in HTML format.
        Parameters
        ----------
        plot_filename: str
            name of the file
        """

        html = self.get_html()

        with open(plot_filename, "w") as f:
            f.write(html)

this stackoverflow question中所述,可以通过在打开文件时提及编码来解决此问题

with open(plot_filename, "w", encoding='utf-8') as f:
            f.write(html)

但是,由于您无法更改库的代码,请在运行this answer中提到的脚本之前,尝试在控制台中运行以下命令

chcp 65001
set PYTHONIOENCODING=utf-8

相关问题