pyspark-importerror:没有名为

14ifxucb  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(563)

我正在做一个pyspark项目,下面是我的项目目录结构。

  1. project_dir/
  2. src/
  3. etl/
  4. __init__.py
  5. etl_1.py
  6. spark.py
  7. config/
  8. __init__.py
  9. utils/
  10. __init__.py
  11. test/
  12. test_etl_1.py
  13. setup.py
  14. README.md
  15. requirements.txt

当我运行低于单元测试代码时

  1. python test_etl_1.py
  2. Traceback (most recent call last):
  3. File "test_etl_1.py", line 1, in <module>
  4. from src.etl.spark import get_spark
  5. ImportError: No module named src.etl.spark

这是我的单元测试文件:

  1. from src.etl.spark import get_spark
  2. from src.etl.addcol import with_status
  3. class TestAppendCol(object):
  4. def test_with_status(self):
  5. source_data = [
  6. ("p", "w", "pw@sample.com"),
  7. ("j", "b", "jb@sample.com")
  8. ]
  9. source_df = get_spark().createDataFrame(
  10. source_data,
  11. ["first_name", "last_name", "email"]
  12. )
  13. actual_df = with_status(source_df)
  14. expected_data = [
  15. ("p", "w", "pw@sample.com", "added"),
  16. ("j", "b", "jb@sample.com", "added")
  17. ]
  18. expected_df = get_spark().createDataFrame(
  19. expected_data,
  20. ["first_name", "last_name", "email", "status"]
  21. )
  22. assert(expected_df.collect() == actual_df.collect())

我需要以pytest的形式运行此文件,但由于模块错误,它无法工作。你能帮我解决这个错误吗。

tcbh2hod

tcbh2hod1#

您的源代码是src,模块是etl、config和util。所以像下面这样更新导入。

  1. from etl.spark import get_spark
  2. from etl.addcol import with_status

确保pythonpath指向projectdir/src目录

相关问题