xpath,用于在存在多个匹配项时提取最大值

2fjabf4q  于 2021-06-28  发布在  Hive
关注(0)|答案(1)|浏览(371)

我正在从xml创建一个配置单元外部表。我想提取时间戳最大的元素的值。如何在CREATETABLE语句中编写该语句?
我的xml:

<Parent>
    <Child>
        <Purchase value ="100" id ="350" timestamp="2016-10-08T14:22:31.0000000">
    </Child>
    <Child>
        <Purchase value ="110" id ="350" timestamp="2016-10-08T14:22:32.0000000">
    </Child>
    <Child>
        <Purchase value ="105" id ="350" timestamp="2016-10-09T14:22:32.0000000">
    </Child>
    <Child>
        <Purchase value ="75" id ="350" timestamp="2016-10-10T14:22:32.0000000">
    </Child>
</Parent>

下面的查询给了我所有的4个价格。但我只想要最新时间戳的价格?在 hive 里怎么办?

CREATE EXTERNAL TABLE Recommended_StagingTable (

 ItemPrice INT
 )
 ROW FORMAT SERDE 
  'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES ( 
  "column.xpath.id" ="/Parent/Child/Purchase[@id='350']/@value"
  )
tvmytwxo

tvmytwxo1#

将purchase\u timestamp列添加到recommended\u stagingtable,然后使用sql row number分析函数查找最新的by timestamp:

select ItemPrice 
  from 
      (
      select 
            ItemPrice ,
            purchase_timestamp,
            row_number() over(order by purchase_timestamp desc ) rn
                              --add partition by if necessary 
        from Recommended_StagingTable
      )s
 where rn = 1; --the latest by timestamp

相关问题