xpath,用于在存在多个匹配项时提取最大值

2fjabf4q  于 2021-06-28  发布在  Hive
关注(0)|答案(1)|浏览(429)

我正在从xml创建一个配置单元外部表。我想提取时间戳最大的元素的值。如何在CREATETABLE语句中编写该语句?
我的xml:

  1. <Parent>
  2. <Child>
  3. <Purchase value ="100" id ="350" timestamp="2016-10-08T14:22:31.0000000">
  4. </Child>
  5. <Child>
  6. <Purchase value ="110" id ="350" timestamp="2016-10-08T14:22:32.0000000">
  7. </Child>
  8. <Child>
  9. <Purchase value ="105" id ="350" timestamp="2016-10-09T14:22:32.0000000">
  10. </Child>
  11. <Child>
  12. <Purchase value ="75" id ="350" timestamp="2016-10-10T14:22:32.0000000">
  13. </Child>
  14. </Parent>

下面的查询给了我所有的4个价格。但我只想要最新时间戳的价格?在 hive 里怎么办?

  1. CREATE EXTERNAL TABLE Recommended_StagingTable (
  2. ItemPrice INT
  3. )
  4. ROW FORMAT SERDE
  5. 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
  6. WITH SERDEPROPERTIES (
  7. "column.xpath.id" ="/Parent/Child/Purchase[@id='350']/@value"
  8. )
tvmytwxo

tvmytwxo1#

将purchase\u timestamp列添加到recommended\u stagingtable,然后使用sql row number分析函数查找最新的by timestamp:

  1. select ItemPrice
  2. from
  3. (
  4. select
  5. ItemPrice ,
  6. purchase_timestamp,
  7. row_number() over(order by purchase_timestamp desc ) rn
  8. --add partition by if necessary
  9. from Recommended_StagingTable
  10. )s
  11. where rn = 1; --the latest by timestamp

相关问题