我正在做一个pyspark项目,下面是我的项目目录结构。
project_dir/
src/
etl/
__init__.py
etl_1.py
spark.py
config/
__init__.py
utils/
__init__.py
test/
test_etl_1.py
setup.py
README.md
requirements.txt
当我运行低于单元测试代码时
python test_etl_1.py
Traceback (most recent call last):
File "test_etl_1.py", line 1, in <module>
from src.etl.spark import get_spark
ImportError: No module named src.etl.spark
这是我的单元测试文件:
from src.etl.spark import get_spark
from src.etl.addcol import with_status
class TestAppendCol(object):
def test_with_status(self):
source_data = [
("p", "w", "pw@sample.com"),
("j", "b", "jb@sample.com")
]
source_df = get_spark().createDataFrame(
source_data,
["first_name", "last_name", "email"]
)
actual_df = with_status(source_df)
expected_data = [
("p", "w", "pw@sample.com", "added"),
("j", "b", "jb@sample.com", "added")
]
expected_df = get_spark().createDataFrame(
expected_data,
["first_name", "last_name", "email", "status"]
)
assert(expected_df.collect() == actual_df.collect())
我需要以pytest的形式运行此文件,但由于模块错误,它无法工作。你能帮我解决这个错误吗。
1条答案
按热度按时间tcbh2hod1#
您的源代码是src,模块是etl、config和util。所以像下面这样更新导入。
确保pythonpath指向projectdir/src目录