我是spark的新手。我在pyspark中使用sql查询创建了一个Dataframe。我想把它做成永久的table,以便在今后的工作中占优势。我用了下面的代码
spark.sql("select b.ENTITYID as ENTITYID, cm.BLDGID as BldgID,cm.LEASID as LeaseID,coalesce(l.SUITID,(select EmptyDefault from EmptyDefault)) as SuiteID,(select CurrDate from CurrDate) as TxnDate,cm.INCCAT as IncomeCat,'??' as SourceCode,(Select CurrPeriod from CurrPeriod)as Period,coalesce(case when cm.DEPARTMENT ='@' then 'null' else cm.DEPARTMENT end, null) as Dept,'Lease' as ActualProjected ,fnGetChargeInd(cm.EFFDATE,cm.FRQUENCY,cm.BEGMONTH,(select CurrPeriod from CurrPeriod))*coalesce (cm.AMOUNT,0) as ChargeAmt,0 as OpenAmt,null as Invoice,cm.CURRCODE as CurrencyCode,case when ('PERIOD.DATACLSD') is null then 'Open' else 'Closed' end as GLClosedStatus,'Unposted'as GLPostedStatus ,'Unpaid' as PaidStatus,cm.FRQUENCY as Frequency,0 as RetroPD from CMRECC cm join BLDG b on cm.BLDGID =b.BLDGID join LEAS l on cm.BLDGID =l.BLDGID and cm.LEASID =l.LEASID and (l.VACATE is null or l.VACATE >= ('select CurrDate from CurrDate')) and (l.EXPIR >= ('select CurrDate from CurrDate') or l.EXPIR < ('select RunDate from RunDate')) left outer join PERIOD on b.ENTITYID = PERIOD.ENTITYID and ('select CurrPeriod from CurrPeriod')=PERIOD.PERIOD where ('select CurrDate from CurrDate')>=cm.EFFDATE and (select CurrDate from CurrDate) <= coalesce(cm.EFFDATE,cast(date_add(( select min(cm2.EFFDATE) from CMRECC cm2 where cm2.BLDGID = cm.BLDGID and cm2.LEASID = cm.LEASID and cm2.INCCAT = cm.INCCAT and 'cm2.EFFDATE' > 'cm.EFFDATE'),-1) as timestamp) ,case when l.EXPIR <(select RunDate from RunDate)then (Select RunDate from RunDate) else l.EXPIR end)").write.saveAsTable('FactChargeTempTable')
但我得到了这个错误
Job aborted due to stage failure: Task 11 in stage 73.0 failed 1 times, most recent failure: Lost task 11.0 in stage 73.0 (TID 2464, localhost): java.lang.RuntimeException: Unsupported data type NullType.
我不知道为什么会这样,我怎么才能解决它。请引导我谢谢你,卡扬
3条答案
按热度按时间ev7lccsx1#
我在运行sparksql应用程序时遇到了这个错误。你可以投
NULL
先串入,如下所示:uajslkp62#
@丹尼李是对的。有人为你的问题开了一个jira,得到了类似的回应。其中一条评论建议如下:
michael:yeah,parquet没有空类型的概念。如果他们真的想这样做的话,我可能会建议他们将null大小写为类型转换(null为int),但实际上你应该忽略这个列。
dldeef673#
你的错误
Unsupported data type NullType
指示要保存的表的其中一列具有空列。要解决此问题,可以对表中的列执行null检查,并确保其中一列不是全部为null。请注意,如果一列中只有一行(大部分为null),spark通常能够识别数据类型(例如stringtype、integertype等),而不是nulltype的数据类型。