本机impala udf(cpp)为同一个查询中的多个调用在同一个表中的相同输入随机给出结果为null

k5hmc34c  于 2021-06-26  发布在  Impala
关注(0)|答案(1)|浏览(497)

我有一个本地的 Impala 自定义项(cpp)与两个功能,这两个功能是互补的。

  1. String myUDF(BigInt)
  2. BigInt myUDFReverso(String)
  3. ``` `myUDF("myInput")` 当 `myUDFReverso(myUDF("myInput"))` 应该回馈 `myInput` 当我在这样的Parquet桌上运行 Impala 查询时, `select column1,myUDF(column1),length(myUDF(column1)),myUDFreverso(myUDF(column1)) from my_parquet_table order by column1 LIMIT 10;` 输出随机为空。
  4. 第一次运行时的输出为,

+------------+----------------------+------------------------+-------------------------------------+
| column1 | myDB.myUDF(column1) | length(myUDF(column1)) | myDB.myUDFReverso(myUDF(column1)) |
+------------+----------------------+------------------------+-------------------------------------+
| 27011991 | 1.0.128.9 | 9 | 27011991 |
| 27011991 | 1.0.128.9 | 9 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | NULL |
+------------+----------------------+------------------------+-------------------------------------+

  1. 假设在第二轮,

+------------+----------------------+------------------------+-------------------------------------+
| column1 | myDB.myUDF(column1) | length(myUDF(column1)) | myDB.myUDFReverso(myUDF(column1)) |
+------------+----------------------+------------------------+-------------------------------------+
| 27011991 | 1.0.128.9 | 9 | 27011991 |
| 27011991 | 1.0.128.9 | 9 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | NULL |
+------------+----------------------+------------------------+-------------------------------------+

  1. 有时它也会为所有行提供正确的值。
  2. 我已经在Impalav1.2.4v2.1上进行了测试,原因是什么?一些记忆问题?
  3. 编辑1

BigIntVal myUDF(FunctionContext* context, const StringVal& myInput)
{
if (myInput.is_null) return BigIntVal::null();

unsigned int temp_op= 0;
unsigned long result= 0;
uint8_t *p;
char c= '.';

p=myInput.ptr;

while (*p != '\0')
{
c= p++;
int digit= c
2;

  1. if (digit >= 22 && digit <= 31)
  2. {
  3. if ((temp_op= temp_op * 10 - digit) > 493)
  4. {
  5. return BigIntVal::null();
  6. }
  7. }
  8. else if (c == '.')
  9. {
  10. result= (result << 8) + (unsigned long) temp_op;
  11. temp_op= 0;
  12. }
  13. else
  14. {
  15. return BigIntVal::null();
  16. }

}

return BigIntVal((result << 8) + (unsigned long) temp_op);
}

In .h file the macro lowerbytify is defined as

define lowerbytify(T,A) { *(T)= (char)((A));\

  1. *((T)+1)= (char)(((A) >> 8));\
  2. *((T)+2)= (char)(((A) >> 16));\
  3. *((T)+3)= (char)(((A) >> 24)); }

StringVal myUDFReverso(FunctionContext* context, const BigIntVal& origMyInput)
{
if (origMyInput.is_null)
return StringVal::null();

int64_t myInput=origMyInput.val;
char myInputArr[16];
unsigned int l=0;

unsigned char temp[8];
lowerbytify(temp, myInput);

char calc[4];
calc[3]= '.';

for (unsigned char *p= temp + 4; p-- > temp;)
{
unsigned int c= *p;
unsigned int n1, n2;
n1= c / 100;
c-= n1 * 100;
n2= c / 10;
c-= n2 * 10;
calc[0]= (char) n1 + '0';
calc[1]= (char) n2 + '0';
calc[2]= (char) c + '0';
unsigned int length= (n1 ? 4 : (n2 ? 3 : 2));
unsigned int point= (p <= temp) ? 1 : 0;

  1. char * begin = &calc[4-length];
  2. for(int step = length - point;step>0;step--,l++,begin++)
  3. {
  4. myInputArr[l]=*begin;
  5. }

}

myInputArr[l]='\0';

StringVal result(context,l);
memcpy(result.ptr, myInputArr,l);

  1. return result;

}

pb3skfrl

pb3skfrl1#

我不认为你可以假设这个字符串是以null结尾的。你应该使用 StringVal::len 迭代字符而不是 while (*p != '\0') . 另外,我建议在impala udf samples github中使用udf测试框架编写一些单元测试,参见这个示例。

相关问题