我正在尝试将数据从mysql表摄取到hdfs。但它给了我以下的错误
IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582873318919_0] 504 - Processing record incurs an unexpected exception:
java.lang.RuntimeException: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record:
{"id":"1","name":"abc","password":"abc","derivedwatermarkcolumn":"abc"}
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:647)
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
at org.apache.gobblin.runtime.Task.run(Task.java:362)
at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
... 22 more
IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582893709536_0] 567 - Task task_GobblinMySql_1582893709536_0 failed
java.lang.RuntimeException: java.lang.RuntimeException: Failed to parse the date
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:505)
at org.apache.gobblin.runtime.Task.run(Task.java:362)
at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
... 12 more
下面是记录模式
IST INFO [JobScheduler-0] org.apache.gobblin.source.jdbc.JdbcExtractor [demo_user_1582893709536_0] 361 - Schema:[
{"columnName":"id","dataType":{"type":"int"},"isWaterMark":false,"primaryKey":1,"length":0,"precision":10,"scale":0,"isNullabl
e":false,"format":"","comment":"","isUnique":false},
{"columnName":"name","dataType":"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},
{"columnName":"password","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},
{"columnName":"derivedwatermarkcolumn","dataType":{"type":"timestamp"},"isWaterMark":true,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNul
lable":false,"comment":"Default watermark column","isUnique":false}]
水印派生的水印列的数据类型是timestamp,但在记录中是字符串“”。
作业和属性文件如下所示。
mysql.pull文件
# Job properties
job.name=GobblinMySql
job.group=MySql
job.description=Data pull from MySql
job.lock.enabled=False
# Extract properties
extract.namespace=demo
extract.table.type=snapshot_only
extract.table.name=user
extract.delta.fields=name,password
extract.primary.key.fields=id
# Property to consider the extract as full dump
extract.is.full=true
# Source properties
source.querybased.schema=demo
source.entity=user
source.querybased.extract.type=snapshot
mysql.properties属性
# Source properties - source class to extract data from Mysql Source
source.class=org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource
# Source properties
source.max.number.of.partitions=1
source.querybased.partition.interval=1
source.querybased.is.compression=false
source.querybased.watermark.type=timestamp
# Source connection properties
source.conn.driver=com.mysql.jdbc.Driver
source.conn.username=root
source.conn.password=root
source.conn.host=localhost
source.conn.port=3306
source.conn.timeout=1500
# Converter properties - Record from mysql source will be processed by the below series of converters
converter.classes=org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter
# date columns format
converter.avro.timestamp.format=YYYY-MM-DD HH:MM:SS
converter.avro.date.format=yyyy-MM-dd
converter.avro.time.format=HH:mm:ss
# Qualitychecker properties
qualitychecker.task.policies=org.apache.gobblin.policies.count.RowCountPolicy,org.apache.gobblin.policies.schema.SchemaCompatibilityPolicy
qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
# Publisher properties
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
是什么导致配置文件中出现此错误?如果有人知道,请帮忙。
1条答案
按热度按时间xbp102n01#
水印列的名称似乎来自extract.delta.fields属性。在您的示例中,它被设置为“name,password”,因此名称被视为水印。尝试将其设置为“derivedwatermarkcolumn”。
我是如何发现的:我查看了mysqlsource类的代码,找到了提到水印的地方,然后使用intellij的inspector找出了数据的来源。您可以通过上下文菜单->分析->分析数据流来获取它。