如何解决hawq投诉时的错误:“soldtoaddr2”列缺少数据

cnwbcb6i  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(348)

我们有一个关键的hadoop hawq系统的小集群。我们必须读一张外表。
ie从ext\ U表格中选择*
但当我在hawq中发出查询时,我抱怨了以下错误:

Error Hawq complaints for : missing data for column "SoldToAddr2"

我们尝试了以下方法:
我们尝试在ext\u表定义的format子句中使用不同的特殊字符:

for ex:
CREATE READABLE EXTERNAL TABLE ext_table
(
 "ID" INTEGER,
 time timestamp,
 "Customer" char(7),
 "Name" varchar,
 "ShortName" char(10),
 "ExemptFinChg" char(1),
 "MaintHistory" char(1),
 "CustomerType" char(1),
 "MasterAccount" char(7),
 "StoreNumber" char(7),
 "PrtMasterAdd" char(1),
 "CreditStatus" char(1),
 "CreditLimit" decimal(14),
 "InvoiceCount" decimal(7),
 "Salesperson" char(3),
 "Salesperson1" char(3),
 "Salesperson2" char(3),
 "Salesperson3" char(3),
 "PriceCode" char(2),
 "CustomerClass" char(2),
 "Branch" char(2),
 "TermsCode" char(2),
 "InvDiscCode" char(2),
 "BalanceType" char(1),
 "Area" char(2),
 "LineDiscCode" char(2),
 "TaxStatus" char(1),
 "TaxExemptNumber" char(30),
 "SpecialInstrs" char(30),
 "PriceCategoryTable" char(52),
 "DateLastSale" date,
 "DateLastPay" date,
 "OutstOrdVal" decimal(16),
 "NumOutstOrd" decimal(6),
 "Telephone" char(20),
 "Contact" varchar,
 "AddTelephone" char(20),
 "Fax" char(20),
 "Telex" char(10),
 "TelephoneExtn" char(5),
 "Currency" text,
 "UserField1" char(10),
 "UserField2" decimal(16),
 "GstExemptFlag" char(1),
 "GstExemptNum" char(15),
 "GstLevel" char(1),
 "DetailMoveReqd" char(1),
 "InterfaceFlag" char(1),
 "ContractPrcReqd" char(1),
 "BuyingGroup1" char(2),
 "BuyingGroup2" char(2),
 "BuyingGroup3" char(2),
 "BuyingGroup4" char(2),
 "BuyingGroup5" char(2),
 "StatementReqd" char(1),
 "BackOrdReqd" char(1),
 "ShippingInstrs" char(30),
 "StateCode" char(3),
 "DateCustAdded" date,
 "StockInterchange" char(1),
 "MaintLastPrcPaid" char(1),
 "IbtCustomer" char(1),
 "SoDefaultDoc" char(1),
 "CounterSlsOnly" char(1),
 "PaymentStatus" char(1),
 "Nationality" char(3),
 "HighestBalance" decimal(16),
 "CustomerOnHold" char(1),
 "InvCommentCode" char(3),
 "EdiSenderCode" char(40),
 "RelOrdOsValue" decimal(16),
 "EdiFlag" char(1),
 "SoDefaultType" char(1),
 "Email" char(50),
 "ApplyOrdDisc" char(1),
 "ApplyLineDisc" char(1),
 "FaxInvoices" char(1),
 "FaxStatements" char(1),
 "HighInvDays" decimal(5),
 "HighInv" char(6),
 "DocFax" char(20),
 "DocFaxContact" char(40),
 "SoldToAddr1" char(150),
 "SoldToAddr2" char(80),
 "SoldToAddr3" char(40),
 "SoldToAddr4" char(40),
 "SoldToAddr5" char(40),
 "SoldPostalCode" char(9),
 "ShipToAddr1" char(40),
 "ShipToAddr2" char(40),
 "ShipToAddr3" char(40),
 "ShipToAddr4" char(40),
 "ShipToAddr5" char(40),
 "ShipPostalCode" char(9),
 "State" char(2),
 "CountyZip" char(5),
 "City" char(3),
 "State1" char(2),
 "CountyZip1" char(5),
 "City1" char(3),
 "DefaultOrdType" char(2),
 "PoNumberMandatory" char(1),
 "CreditCheckFlag" char(1),
 "CompanyTaxNumber" char(15),
 "DeliveryTerms" char(3),
 "TransactionNature" decimal(5),
 "DeliveryTermsC" char(3),
 "TransactionNatureC" decimal(5),
 "RouteCode" char(10),
 "FaxQuotes" char(1),
 "RouteDistance" decimal(6),
 "TpmCustomerFlag" char(1),
 "SalesWarehouse" text,
 "TpmPricingFlag" char(1),
 "ArStatementNo" char(2),
 "TpmCreditCheck" char(1),
 "WholeOrderShipFlag" char(1),
 "MinimumOrderValue" decimal(12),
 "MinimumOrderChgCod" char(6),
 "UkVatFlag" char(1),
 "UkCurrency" char(3),
 "TimeStamp" bytea
 )
LOCATION (
    'pxf://hostname/path/to/hdfs?profile=HdfsTextSimple')
FORMAT 'CSV' (delimiter '^' null 'null' quote '"')
ENCODING 'UTF8';

错误详细信息:
即在第20行遇到坏行

error detail :DETAIL:  External table tablename, line 20 of pxf://hostname/path/to/hdfs?profile=HdfsTextSimple: "23020^2015-12-02 11:14:26.52^0023482^Carlos iglesias               ^          ^N^Y^ ^       ^       ..."

解决关键hadoop hawq系统中错误的更好方法是什么?
如有任何帮助将不胜感激?

z8dt9xmd

z8dt9xmd1#

尝试使用一个文本列的pxf外部表,然后使用regex解析(调试)hawq中的外部数据。
此外,错误表可能会有所帮助。csv(“逗号”分隔向量)很棘手。在你的陈述中,你有空的'null'。如果有空字段,例如“”,pxf将抛出错误。
我的建议是:使用“”作为null,然后对“null”进行后期处理。前任。

CREATE READABLE EXTERNAL TABLE ext_table
(raw_text text)
LOCATION ('pxf://hostname/path/to/hdfs?profile=HdfsTextSimple')
FORMAT 'text' (delimiter '^' null '' quote '"')
LOGS ERRORS INTO pxf_errors SEGMENT REJECT LIMIT 1000 ROWS
ENCODING 'UTF8';
dhxwm5r4

dhxwm5r42#

由于您使用的是csv格式,并带有“作为字符串终止符,因此您的数据中很可能有一个额外的”错误地终止字段。您需要删除多余的引号或用“\”转义。
您可以定义外部表来记录错误,然后使用gp\u read\u error\u log()来读取这些错误。
有关记录错误的详细信息,请参阅《管理指南》:http://gpdb.docs.pivotal.io/4360/admin_guide/load/topics/g-define-an-external-table-with-single-row-error-isolation.html

相关问题