在sklearn中进行随机搜索时,如何加快google colab的速度?

nnsrf1az  于 2021-09-08  发布在  Java
关注(0)|答案(0)|浏览(274)

下面的代码在GoogleColab上执行需要5.0分钟,而在我的机器上执行大约需要3.0分钟。在我测试的所有其他任务(机器学习或其他)中,colab以50-100%的优势击败了我的机器。我尝试安装不同的sklearn版本,使用gpu运行,还尝试使用n_作业值,但时间要么变慢了,要么保持不变。

  1. from sklearn.datasets import load_breast_cancer
  2. from sklearn.linear_model import LogisticRegression
  3. from sklearn.preprocessing import StandardScaler
  4. from sklearn.ensemble import GradientBoostingClassifier
  5. from sklearn.feature_selection import RFE
  6. from sklearn.pipeline import Pipeline
  7. from sklearn.model_selection import RandomizedSearchCV, KFold
  8. from datetime import datetime
  9. param_grid = [
  10. {'feature_selection': [RFE(estimator=GradientBoostingClassifier(random_state=0))],
  11. 'feature_selection__n_features_to_select': [2],
  12. 'scaling': [StandardScaler()],
  13. 'classification': [GradientBoostingClassifier(random_state=0)],
  14. 'classification__n_estimators': [100, 500],
  15. 'classification__max_features': ['auto', 'log2'],
  16. 'classification__max_depth': [2, 4],
  17. 'classification__learning_rate': [0.01, ],
  18. 'classification__loss': ['exponential'],
  19. 'classification__min_samples_split': [2, 200],
  20. 'classification__min_samples_leaf': [1, 20]},
  21. {'feature_selection': [RFE(estimator=LogisticRegression(random_state=0))],
  22. 'feature_selection__n_features_to_select': [2],
  23. 'scaling': [StandardScaler()],
  24. 'classification': [LogisticRegression(random_state=0)],
  25. 'classification__C': [0.1, 100, 1000],
  26. 'classification__penalty': ['l1'],
  27. 'classification__solver': ['liblinear']}
  28. ]
  29. pipe = Pipeline(steps=[('scaling', StandardScaler()),
  30. ('feature_selection', RFE(estimator=GradientBoostingClassifier())),
  31. ('classification', GradientBoostingClassifier())])
  32. grid_obj = RandomizedSearchCV(estimator=pipe, param_distributions=param_grid,
  33. scoring='neg_brier_score', cv=KFold(shuffle=True), random_state=0,
  34. return_train_score=True, n_jobs=-1, verbose=10)
  35. X, y = load_breast_cancer(return_X_y=True)
  36. grid_obj.fit(X, y)

google colab结果:

  1. # [Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
  2. # [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 13.3s
  3. # [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 25.8s
  4. # [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0min
  5. # [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 1.4min
  6. # [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 2.2min
  7. # [Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 2.8min
  8. # [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 3.8min
  9. # [Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 4.6min
  10. # [Parallel(n_jobs=-1)]: Done 50 out of 50 | elapsed: 5.0min finished

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题