将pig关系中的数据存储到cassandra中

zzlelutf  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(371)

我有下面的Cassandra表:

CREATE TABLE segments (
  b text,
  s int,
  c int,
  PRIMARY KEY (b)
)

以及以下关系:

data: {b: chararray,s: long,c: long}

我正在从一个存储在pigstorage中的文件加载它

data = LOAD 'some_file' as (b:chararray,s:long,c:long);

我试图将pig关系存储到cassandra表中,但没有成功。我试过:

to_cassandra = FOREACH (GROUP data ALL) 
  GENERATE 
    TOTUPLE(TOTUPLE('b',data.b)),
    TOTUPLE('s',data.s),
    TOTUPLE('c',data.c);
STORE to_cassandra INTO 
  'cql://pv/segments?
    output_query=UPDATE%20pv.segments%20SET%20s%3D%3F%2Cc%3D%3F'
  USING CqlStorage();

其中解码输出查询是:

UPDATE pv.segments SET s=?,c=?

但我得到了以下信息:

[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - 
  ERROR: java.lang.ClassCastException: 
    org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.DataByteArray

有点神秘。哪一个是违规领域?我该怎么解决这个问题?
编辑
我跑了 illustrate to_cassandra; 得到:

-----------------------------------------------------------------------------------------------------
| data     | b:chararray                                                  | s:long     | c:long     | 
-----------------------------------------------------------------------------------------------------
|          | 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB | 1          | 1          | 
|          | 0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG | 1          | 1          | 
-----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1-3     | group:chararray     | data:bag{:tuple(b:chararray,s:long,c:long)}                                                                                                  | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         | all                 | {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB, 1, 1), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG, 1, 1)} | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| to_cassandra     | org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_29_30:tuple(org.apache.pig.builtin.totuple_29:tuple(:chararray,:bag{:tuple(b:chararray)}))                         | org.apache.pig.builtin.totuple_31:tuple(:chararray,:bag{:tuple(s:long)})                     | org.apache.pig.builtin.totuple_32:tuple(:chararray,:bag{:tuple(c:long)})                     | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                  | ((b, {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG)}))                                          | (s, {(1), (1)})                                                                              | (c, {(1), (1)})                                                                              | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gmxoilav

gmxoilav1#

您的分组有问题,因为它为每个字段生成数组,而不是单个值,这正是cassandra所期望的。您的输出最终应该如下所示:

((b, 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB)), (s, 1), (c, 1)

... 为了匹配你的模式。由于输出模式与输入直接匹配,因此分组的目的并不明确。

相关问题