如何存档配置单元表？

mm5n2pyu 于 2021-06-24 发布在 Hive

关注(0)|答案(1)|浏览(370)

是否有方法检查在90天前创建的配置单元外部表，并将这些表与底层hdfs数据一起删除。这可以在unix脚本中实现吗？

Hive unix

来源：https://stackoverflow.com/questions/56353321/how-to-archive-hive-tables

1条答案

按热度按时间

yhuiod9q1#

如果配置单元表路径为 /path/your_hive_table_path/ 具体如下：

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/
drwxrwxrwx+  - h_mifi supergroup          0 2019-01-24 10:33 /path/your_hive_table_path//mifidw_car_insurance_expire_month_data
drwxrwxrwx+  - h_mifi supergroup          0 2019-01-24 10:39 /path/your_hive_table_path//mifidw_car_owner
drwxr-xr-x+  - h_mifi supergroup          0 2019-05-30 03:01 /path/your_hive_table_path//push_credit_card_mine_result_new
drwxr-xr-x+  - h_mifi supergroup          0 2019-05-30 03:41 /path/your_hive_table_path//push_live_payment_bill_mine_result_new

我们可以得到表文件的最新更新日期，如下所示：

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/ | awk -F'[ ]+' '{print $6}'
2019-01-24
2019-01-24
2019-05-30
2019-05-30

我们需要一个 loop 检查每个表是否超过90天并执行 remove 以及 drop 操作。下面是完整的shell脚本，我已经测试过了，效果不错，希望对你有所帮助。

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/ | grep '/path/your_hive_table_path/' | while read line
do
   #Get the update date of hive table
   date_str=`echo $line | awk -F'[ ]+' '{print $6}'`
   #get the path of hive table
   table_path=`echo $line | awk -F'[ ]+' '{print $8}'`
   #Get the table name of hive table
   table_name=`echo $table_path | awk -F'/' '{print $7}' `
   today_date_stamp=`date +%s`
   table_date_stamp=`date -d $date_str +%s`
   stamp_diff=`expr $today_date_stamp - $table_date_stamp`
   #Get the diff days from now
   days_diff=`expr $stamp_diff / 86400`
   #if diff days is greater than 90, rm and drop.
   if [ $days_diff -gt 90 ];then
      #remove the hdfs file
      hadoop --cluster your-hadoop-cluster fs -rm $table_path
      #drop the hive table
      hive -e"drop table $table_name"
   fi
done

展开查看全部

赞(0）回复(0）举报 2021-06-24

我来回答

如何存档配置单元表？

1条答案

相关问题

热门标签

最新问答