我在hbase表中有一些爬网内容(通过nutch)。我已经编写了处理一个表并通过mapreduce作业将其统计信息输出到一个新表中。以下是mr job的代码片段。
NutchJob job = NutchJob.getInstance(getConf(), "customJob");
// === Map ===
DataStore<String, WebPage> pageStore = StorageUtils.createWebStore(
job.getConfiguration(), String.class, WebPage.class);
Query<String, WebPage> query = pageStore.newQuery();
query.setFields(StorageUtils.toStringArray(FIELDS)); // Note: pages without
// these fields are
// skipped
LOG.info( "Table before mapper: " + job.getConfiguration().get(Nutch.CRAWL_ID_KEY ) );
GoraMapper.initMapperJob(job, pageStore, Text.class, WebPage.class,
TableCopy.Mapper2.class, true);
job.setNumReduceTasks(1);
job.getConfiguration().set(Nutch.CRAWL_ID_KEY, "txt" );
LOG.info( "Table before reducer: " + job.getConfiguration().get(Nutch.CRAWL_ID_KEY ) );
DataStore<String, WebPage> hostStore = StorageUtils.createWebStore(
job.getConfiguration(), String.class, WebPage.class);
GoraReducer.initReducerJob(job, hostStore, MarkerUpdateReducer2.class);
job.waitForCompletion(true);
这里有两个表,一个在公共行给出,第二个是硬编码的(“txt”)。我的目的是用一些新的表名创建reducer数据存储,以便在那里存储数据。但是发生的是,在mapper中,表“txt”被处理,并且由于表中没有数据,所以作业不需要注意。下面是日志片段
019-10-15 15:38:05,007 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-10-15 15:38:07,028 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'a_webpage'Assuming they are the same.
2019-10-15 15:38:07,647 INFO marker.TableCopy - Table before mapper: a
2019-10-15 15:38:07,738 INFO marker.TableCopy - Table before reducer: txt
2019-10-15 15:38:07,775 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:08,316 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,401 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,453 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,491 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
2019-10-15 15:38:09,604 INFO marker.TableCopy - map table: txt
2019-10-15 15:38:09,869 WARN store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'txt_webpage'Assuming they are the same.
我已经在setup方法中打印了表名。如上述日志“map table:txt”所示。实际表格ins“a”
暂无答案!
目前还没有任何答案,快来回答吧!