使用pyspark连接很慢

qybjjes1  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(745)

我正在使用pyspark玩以下代码:

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession.builder.appName("Scoring System").getOrCreate()
  3. df = spark.read.csv('output.csv')
  4. df.show()

我在命令行上运行python trial.py之后,大约有5到10分钟,没有进展:

  1. To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  2. 2019-05-05 22:58:31 WARN Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
  3. 2019-05-05 22:58:32 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
  4. [Stage 0:> (0 + 0) / 1]2019-05-05 23:00:08 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  5. 2019-05-05 23:00:23 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  6. 2019-05-05 23:00:38 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  7. 2019-05-05 23:00:53 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  8. [Stage 0:> (0 + 0) / 1]2019-05-05 23:01:08 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  9. 2019-05-05 23:01:23 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  10. 2019-05-05 23:01:38 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我预感在我的worker节点(?)中缺少资源,或者我遗漏了什么?

5f0d552i

5f0d552i1#

尝试增加执行器和内存的数量pyspark--num executors 5--executor memory 1g

相关问题