在这个项目中,我尝试在scikit-learn包的帮助下,利用pycaret
包来分析一些时间序列,具体来说,我导入了一些模块,如下所示:
from pycaret.regression import (setup, compare_models, predict_model, plot_model, finalize_model, load_model)
# setting up the stage to initialize the training environment
s = setup(
data=train,
target=target_var,
ignore_features = ['Series'],
numeric_features=involved_numerics,
categorical_features = categorics,
silent=True,
log_experiment=True,
)
# Now, to train machine learning models, we need to compare models and find the best one
best_model = compare_models(sort='MAE')
# Making some plots
for id, name in zip(ids, names):
plot_model(best_model, plot=id, scale=3, save=True)
.
.
.
我能够成功地运行文档中提到的可用模型列表中的一些模型的代码,但不是所有模型的代码。然而,对于一些特定的模型(如 Recursive Feat. Selection),有一个错误消息:
Traceback (most recent call last):
File "c:/Users/username/Desktop/project/project.py", line 55,
in <module>
main()
File "c:/Users/username/Desktop/project/project.py", line 48,
in main
ml_modelling(data, train, test)
File "c:\Users\username\Desktop\project\utilities.py", line 1070, in ml_modelling
plot_model(best_model, plot=id, scale=3, save=True)
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\regression.py", line 1601, in plot_model
return pycaret.internal.tabular.plot_model(
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 7712, in plot_model
ret = locals()[plot]()
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 6293, in residuals_interactive
resplots.write_html(plot_filename)
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\plots\residual_plots.py", line 673, in write_html
f.write(html)
File "C:\Users\username\anaconda3\envs\py38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25c4' in position 276445: character maps to <undefined>
火车来了:
火车
Series x y z ID var1 var2 var3 var4 var5 var6
0 1 2 1 3 True -3 -4 6 7 4 6
1 2 2 1 7 False 22 0 3 5 2 8
2 3 2 1 0 True 3 -6 3 5 4 4
3 4 2 1 4 False 27 -4 8 3 -3 2
.
.
.
我使用VSCode在Windows 10机器上运行我的python工具,下面是安装在conda环境中的所有软件包的列表:
name: py38
channels:
- conda-forge
- defaults
dependencies:
- bzip2=1.0.8=h8ffe710_4
- ca-certificates=2022.12.7=h5b45459_0
- et_xmlfile=1.1.0=pyhd8ed1ab_0
- libffi=3.4.2=h8ffe710_5
- libsqlite=3.40.0=hcfcfb64_0
- libzlib=1.2.13=hcfcfb64_4
- openpyxl=3.0.10=py38h91455d4_2
- openssl=3.0.7=hcfcfb64_2
- pip=22.3.1=pyhd8ed1ab_0
- python=3.8.15=h4de0772_1_cpython
- python_abi=3.8=3_cp38
- setuptools=66.1.1=pyhd8ed1ab_0
- tk=8.6.12=h8ffe710_0
- ucrt=10.0.22621.0=h57928b3_0
- vc=14.3=hb6edc58_10
- vs2015_runtime=14.34.31931=h4c5c07a_10
- wheel=0.38.4=pyhd8ed1ab_0
- xz=5.2.6=h8d14728_0
- pip:
- alembic==1.9.2
- asttokens==2.2.1
- attrs==22.2.0
- backcall==0.2.0
- blis==0.7.9
- boruta==0.3
- catalogue==1.0.2
- certifi==2022.12.7
- charset-normalizer==3.0.1
- click==8.1.3
- cloudpickle==2.2.1
- colorama==0.4.6
- colorlover==0.3.0
- comm==0.1.2
- contourpy==1.0.7
- cufflinks==0.17.3
- cycler==0.11.0
- cymem==2.0.7
- cython==0.29.14
- databricks-cli==0.17.4
- debugpy==1.6.6
- decorator==5.1.1
- docker==6.0.1
- entrypoints==0.4
- executing==1.2.0
- flask==2.2.2
- fonttools==4.38.0
- funcy==1.18
- future==0.18.3
- gensim==3.8.3
- gitdb==4.0.10
- gitpython==3.1.30
- greenlet==2.0.2
- htmlmin==0.1.12
- idna==3.4
- imagehash==4.3.1
- imbalanced-learn==0.7.0
- importlib-metadata==5.2.0
- importlib-resources==5.10.2
- ipykernel==6.20.2
- ipython==8.9.0
- ipywidgets==8.0.4
- itsdangerous==2.1.2
- jedi==0.18.2
- jinja2==3.1.2
- joblib==1.2.0
- jupyter-client==8.0.1
- jupyter-core==5.1.5
- jupyterlab-widgets==3.0.5
- kiwisolver==1.4.4
- kmodes==0.12.2
- lightgbm==3.3.5
- llvmlite==0.37.0
- mako==1.2.4
- markdown==3.4.1
- markupsafe==2.1.2
- matplotlib==3.6.3
- matplotlib-inline==0.1.6
- mlflow==2.1.1
- mlxtend==0.19.0
- multimethod==1.9.1
- murmurhash==1.0.9
- nest-asyncio==1.5.6
- networkx==3.0
- nltk==3.8.1
- numba==0.54.1
- numexpr==2.8.4
- numpy==1.20.3
- oauthlib==3.2.2
- packaging==22.0
- pandas==1.5.3
- pandas-profiling==3.6.3
- parso==0.8.3
- patsy==0.5.3
- phik==0.12.3
- pickleshare==0.7.5
- pillow==9.4.0
- plac==1.1.3
- platformdirs==2.6.2
- plotly==5.13.0
- preshed==3.0.8
- prompt-toolkit==3.0.36
- protobuf==4.21.12
- psutil==5.9.4
- pure-eval==0.2.2
- pyarrow==10.0.1
- pycaret==2.3.10
- pydantic==1.10.4
- pygments==2.14.0
- pyjwt==2.6.0
- pyldavis==3.3.1
- pynndescent==0.5.8
- pyod==1.0.7
- pyparsing==3.0.9
- python-dateutil==2.8.2
- pytz==2022.7.1
- pywavelets==1.4.1
- pywin32==305
- pyyaml==5.4.1
- pyzmq==25.0.0
- querystring-parser==1.2.4
- regex==2022.10.31
- requests==2.28.2
- scikit-learn==0.23.2
- scikit-plot==0.3.7
- scipy==1.5.4
- seaborn==0.12.2
- shap==0.41.0
- six==1.16.0
- sklearn==0.0.post1
- slicer==0.0.7
- smart-open==6.3.0
- smmap==5.0.0
- spacy==2.3.9
- sqlalchemy==1.4.46
- sqlparse==0.4.3
- srsly==1.0.6
- stack-data==0.6.2
- statsmodels==0.13.5
- tabulate==0.9.0
- tangled-up-in-unicode==0.2.0
- tenacity==8.1.0
- textblob==0.17.1
- thinc==7.4.6
- threadpoolctl==3.1.0
- tornado==6.2
- tqdm==4.64.1
- traitlets==5.8.1
- typeguard==2.13.3
- typing-extensions==4.4.0
- umap-learn==0.5.3
- urllib3==1.26.14
- visions==0.7.5
- waitress==2.1.2
- wasabi==0.10.1
- wcwidth==0.2.6
- websocket-client==1.5.0
- werkzeug==2.2.2
- widgetsnbextension==4.0.5
- wordcloud==1.8.2.2
- yellowbrick==1.2.1
- zipp==3.12.0
prefix: C:\Users\username\anaconda3\envs\py38
1条答案
按热度按时间0lvr5msh1#
这可能是库中的一个问题,正在加载的数据在unicode中有破折号。
下面是pycaret的源代码:
如this stackoverflow question中所述,可以通过在打开文件时提及编码来解决此问题
但是,由于您无法更改库的代码,请在运行this answer中提到的脚本之前,尝试在控制台中运行以下命令