错误是:
属性错误:“function”对象没有属性“_get_object_id”
- test_functions.py* 中的相关代码为:
import urllib.request as urllib
import os
import pandas as pd
import pyspark.sql.functions as psf
from pyspark.sql import SparkSession
from src.functions import save, write, query
import pytest
global spark
spark = SparkSession.builder \
.master("local") \
.appName("load_parquet") \
.config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
.getOrCreate()
@pytest.fixture(scope="session")
def cache_dir(tmp_path_factory):
return tmp_path_factory.mktemp("files") / "test.parquet"
def test_save(url='https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet', filename=cache_dir):
print(cache_dir)
assert save(url, filename), "Error saving file"
def test_write(write_method='psycopg2'):
df = spark.read.parquet(cache_dir)
if 'filename' not in df.columns:
df = df.withColumn('filename', psf.lit('20-03'))
.....................
我做错了什么?我花了很多时间想让这件事成功。
我已经尝试了几个谷歌搜索的页面和我的不同方法,使这项工作的价值。没有一个。我希望在第一个上传递Assert,如果第二个可以读取文件,它也会通过。
代码以如下结尾:
#%% Run tests
if __name__ == "__main__":
print(cache_dir)
test_save()
test_write()
test_query()
test_percentile()
print("Everything passed")
# %%
在Magus回复后,测试保存不工作,只有 test_write 有问题:
def test_write(cache_dir):
write_method = 'psycopg2'
url = 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet'
save(url, cache_dir)
df = spark.read.parquet(cache_dir)
if 'filename' not in df.columns:
df = df.withColumn('filename', psf.lit('20-03'))
2023-05-03 13:42:13 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2023-05-03 13:42:13
2023-05-03 13:42:13 parameter = PosixPath('/tmp/pytest-of-root/pytest-0/files0/test.parquet')
2023-05-03 13:42:13 python_proxy_pool = <py4j.java_gateway.PythonProxyPool object at 0x7ff238c2b580>
2023-05-03 13:42:13
2023-05-03 13:42:13 def get_command_part(parameter, python_proxy_pool=None):
2023-05-03 13:42:13 """Converts a Python object into a string representation respecting the
2023-05-03 13:42:13 Py4J protocol.
2023-05-03 13:42:13
2023-05-03 13:42:13 For example, the integer `1` is converted to `u"i1"`
2023-05-03 13:42:13
2023-05-03 13:42:13 :param parameter: the object to convert
2023-05-03 13:42:13 :rtype: the string representing the command part
2023-05-03 13:42:13 """
2023-05-03 13:42:13 command_part = ""
2023-05-03 13:42:13
2023-05-03 13:42:13 if parameter is None:
2023-05-03 13:42:13 command_part = NULL_TYPE
2023-05-03 13:42:13 elif isinstance(parameter, bool):
2023-05-03 13:42:13 command_part = BOOLEAN_TYPE + smart_decode(parameter)
2023-05-03 13:42:13 elif isinstance(parameter, Decimal):
2023-05-03 13:42:13 command_part = DECIMAL_TYPE + smart_decode(parameter)
2023-05-03 13:42:13 elif isinstance(parameter, int) and parameter <= JAVA_MAX_INT\
2023-05-03 13:42:13 and parameter >= JAVA_MIN_INT:
2023-05-03 13:42:13 command_part = INTEGER_TYPE + smart_decode(parameter)
2023-05-03 13:42:13 elif isinstance(parameter, long) or isinstance(parameter, int):
2023-05-03 13:42:13 command_part = LONG_TYPE + smart_decode(parameter)
2023-05-03 13:42:13 elif isinstance(parameter, float):
2023-05-03 13:42:13 command_part = DOUBLE_TYPE + encode_float(parameter)
2023-05-03 13:42:13 elif isbytearray(parameter):
2023-05-03 13:42:13 command_part = BYTES_TYPE + encode_bytearray(parameter)
2023-05-03 13:42:13 elif ispython3bytestr(parameter):
2023-05-03 13:42:13 command_part = BYTES_TYPE + encode_bytearray(parameter)
2023-05-03 13:42:13 elif isinstance(parameter, basestring):
2023-05-03 13:42:13 command_part = STRING_TYPE + escape_new_line(parameter)
2023-05-03 13:42:13 elif is_python_proxy(parameter):
2023-05-03 13:42:13 command_part = PYTHON_PROXY_TYPE + python_proxy_pool.put(parameter)
2023-05-03 13:42:13 for interface in parameter.Java.implements:
2023-05-03 13:42:13 command_part += ";" + interface
2023-05-03 13:42:13 else:
2023-05-03 13:42:13 > command_part = REFERENCE_TYPE + parameter._get_object_id()
2023-05-03 13:42:13 E AttributeError: 'PosixPath' object has no attribute '_get_object_id'
2023-05-03 13:42:13
2023-05-03 13:42:13 /usr/local/lib/python3.10/dist-packages/py4j/protocol.py:298: AttributeError
2023-05-03 13:42:13 ----------------------------- Captured stdout call -----------------------------
经过一些测试,似乎在spark
使用临时文件夹的配置中存在一些问题。切换到Pandas读取文件夹中的文件工作。
1条答案
按热度按时间xe55xuns1#
以下是你做错的地方:
对tmp_path fixture使用 yield: