我在玩geomesa(使用hbase)bbox查询osm节点数据。我发现对于一个特定的区域geomesa并没有返回边界框中的所有节点。
例如,我触发了3个查询:
bbox(-122.0,47.4,-122.01,47.5)-输出有5477个独特的特性
bbox(-122.0,47.5,-122.01,47.6)-输出具有9879个独特功能
bbox(-122.0,47.4,-122.01,47.6)-输出有13374个独特的特性
查看这些边界框,我认为查询1+查询2的特性应该等于查询3。但实际上,它们并不相同。sad部分是query1和query2的总和,query2有一些元素不在query3数据本身中。
下面是在Kepler上绘制的图像。有谁能帮助我们了解问题所在,以及如何找到问题的根源?
.
我看到以下例外:
19/09/27 14:57:34 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38583 ms ago, cancelled=false, msg=java.io.FileNotFoundException: File not present on S3
at com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.read(S3FSInputStream.java:133)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:738)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1493)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1770)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1596)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:454)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:302)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:201)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:391)
at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:224)
at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2208)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6112)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6086)
at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2841)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2821)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2803)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2797)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2697)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3012)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
2条答案
按热度按时间1tu0hz3e1#
编辑:提供有关s3异常的附加信息,此建议不再适用。
我将尝试禁用“松散边界框”,如这里所述。如果这不能解决差异,请在geomesa jira上提交一份缺陷报告,最好有可复制的步骤。
谢谢,
r3i60tvu2#
这看起来像是s3一致性问题。尝试跑步:
emrfs sync -m <your DynamoDB catalog table> s3://<your bucket>/<your hbase root dir>
然后重新运行查询。s3和dynamodb表很常见,用于管理hbase的s3一致性模型,以使其不同步。将此sync命令作为cron作业运行有助于避免此问题或在发生此问题时自动解决此问题。