当我试图通过pig从cassandra加载表数据集时,遇到了一些问题。目前,Cassandra的版本是2.0.3。
下面是我的数据集的两行
>the format is "user_name","tweet","user_id':
>chaaiinzz | RT @Luis_Cortes35: @3_chaaiinzz @jonaski720 @sarajanellxo @skylalopez man I love this Spanish class | 408845338091343872
>Jessicaokelley | Absolutely love the movie "The Mortal Instruments: City Of Bones!! | 408845337965907968
创建键空间并将数据集复制到表twitter。
cqlsh:pxh130430> CREATE KEYSPACE cql3ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
cqlsh:pxh130430> use cql3ks;
cqlsh:cql3ks> CREATE TABLE twitters ( user_id varchar PRIMARY KEY, tweet varchar, user varchar);
cqlsh:cql3ks> COPY twitters (user, tweet, user_id) FROM '/tmp/nameT.csv' with delimiter = '|';
3625 rows imported in 2.142 seconds.
cqlsh:cql3ks> select count(*) from twitters;
count
-------
3620
(1 rows)
清管器负荷数据
grunt> moretestvalues= LOAD 'cql://cql3ks/twitters/' USING CqlStorage;
grunt> describe moretestvalues;
moretestvalues: {user_id: chararray,tweet: chararray,user: chararray,user_id: chararray}
grunt> dump moretestvalues;
2013-12-08 22:09:19,337 [main] ERROR org.apache.pig.tools.grunt.Grunt -ERROR 1108: Duplicate schema alias: user_id in "moretestvalues"
Details at logfile: /Users/pengyuhou/apache- cassandra/examples/pig/bin/pig_1386562141091.log
实际上,我只有三个列和一个“userid”列。我不知道为什么pig会导致两个“user\u id”列。
你们有什么想法吗?谢谢!!!
1条答案
按热度按时间9vw9lbht1#
这是一个已知的错误,应该用cassandra-6309修复