测试\u0000(ascii 00),不带`chr()`

mhd8tkvw  于 2021-06-24  发布在  Hive
关注(0)|答案(2)|浏览(407)

我有一个带有一些损坏数据的数据集-一个字符串列有一些包含 \u0000 . 我需要把它们全部过滤掉,我唯一能支配的就是 where 条款。
我试过了 WHERE field NOT LIKE concat('%', chr(00), '%') ,但我的hive发行版(即aws emr)无法识别 chr() . 填写我的表格还有别的选择吗 where 子句以筛选出包含 \u0000 ,不使用 chr() ?

wwtsj6pe

wwtsj6pe1#

请尝试以下操作:

WHERE field NOT LIKE '%\000%'
7hiiyaii

7hiiyaii2#

你可以试试看

SELECT '\u0000' AS text;

+-------+--+
| text  |
+-------+--+
|      |
+-------+--+
-- NOT EMPTY
SELECT '\u0000abc' AS text;

+-------+--+
| text  |
+-------+--+
| abc  |
+-------+--+
-- NOT EMPTY

所以

SELECT text 
FROM(SELECT '\u0000abc' AS text) AS t  
WHERE text NOT LIKE('\u0000%');

+-------+--+
| text  |
+-------+--+
+-------+--+
-- EMPTY
SELECT text 
FROM(SELECT '\u0000abc' AS text) AS t  
WHERE text LIKE('\u0000%');

+-------+--+
| text  |
+-------+--+
| abc  |
+-------+--+
-- NOT EMPTY

相关问题