如何使用spark使impala的元数据无效?

rkttyhzu  于 2021-07-13  发布在  Spark
关注(0)|答案(0)|浏览(271)

我在用 pyspark 首先将数据插入到一个空表中,然后我将不得不自动化这个过程。使用 pyspark ,如何使元数据失效或刷新数据,以便在impala中正确读取?
以下是我的代码示例:

spark.sql("""
select
 gps_data_adj.trip_duration
 , gps_data_adj.geometry
 , trip_summary.TRIP_HAVERSINE_DISTANCE
 , trip_summary.TRIP_GPS_DURATION
 , gps_data_adj.HAVERSINE_DISTANCE
 , gps_data_adj.GPS_INTERVAL
 , gps_data_adj.HAVERSINE_DISTANCE/trip_summary.TRIP_HAVERSINE_DISTANCE AS HAVERSINE_DISTANCE_FRACTION
 , gps_data_adj.GPS_INTERVAL/trip_summary.TRIP_GPS_DURATION AS GPS_INTERVAL_FRACTION
 , (gps_data_adj.HAVERSINE_DISTANCE/trip_summary.TRIP_HAVERSINE_DISTANCE)*gps_data_adj.trip_distance_travelled AS HAVERSINE_DISTANCE_ADJ
 , (gps_data_adj.GPS_INTERVAL/trip_summary.TRIP_GPS_DURATION)*gps_data_adj.trip_duration AS GPS_INTERVAL_ADJ
    FROM
        gps_data_adj
    INNER JOIN
        (
            SELECT
                trip_id 
                , sum(COSINES_DISTANCE) as TRIP_COSINES_DISTANCE
                , sum(HAVERSINE_DISTANCE) as TRIP_HAVERSINE_DISTANCE
                , sum(GPS_INTERVAL) AS TRIP_GPS_DURATION
            FROM
                gps_data_adj
            GROUP BY
                trip_id
        ) trip_summary
on gps_data_adj.trip_id = trip_summary.trip_id
""")write.format('parquet').mode('append').insertInto('driving_data_TEST')

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题