本文整理了Java中org.apache.spark.rdd.RDD.collect
方法的一些代码示例,展示了RDD.collect
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。RDD.collect
方法的具体详情如下:
包路径:org.apache.spark.rdd.RDD
类名称:RDD
方法名:collect
暂无
代码示例来源:origin: org.apache.spark/spark-core
@Test
public void collectUnderlyingScalaRDD() {
List<SomeCustomClass> data = new ArrayList<>();
for (int i = 0; i < 100; i++) {
data.add(new SomeCustomClass());
}
JavaRDD<SomeCustomClass> rdd = sc.parallelize(data);
SomeCustomClass[] collected =
(SomeCustomClass[]) rdd.rdd().retag(SomeCustomClass.class).collect();
assertEquals(data.size(), collected.length);
}
代码示例来源:origin: org.apache.spark/spark-core_2.11
@Test
public void collectUnderlyingScalaRDD() {
List<SomeCustomClass> data = new ArrayList<>();
for (int i = 0; i < 100; i++) {
data.add(new SomeCustomClass());
}
JavaRDD<SomeCustomClass> rdd = sc.parallelize(data);
SomeCustomClass[] collected =
(SomeCustomClass[]) rdd.rdd().retag(SomeCustomClass.class).collect();
assertEquals(data.size(), collected.length);
}
代码示例来源:origin: org.apache.spark/spark-core_2.10
@Test
public void collectUnderlyingScalaRDD() {
List<SomeCustomClass> data = new ArrayList<>();
for (int i = 0; i < 100; i++) {
data.add(new SomeCustomClass());
}
JavaRDD<SomeCustomClass> rdd = sc.parallelize(data);
SomeCustomClass[] collected =
(SomeCustomClass[]) rdd.rdd().retag(SomeCustomClass.class).collect();
assertEquals(data.size(), collected.length);
}
代码示例来源:origin: uber/uberscriptquery
Tuple2<String, String>[] tuples = (Tuple2<String, String>[]) sparkSession.sparkContext().wholeTextFiles(query, 1).collect();
query = tuples[0]._2();
System.out.println("Query: " + query);
代码示例来源:origin: com.stratio.deep/deep-cassandra
public static <W> void doCql3SaveToCassandra(RDD<W> rdd, ICassandraDeepJobConfig<W> writeConfig,
Function1<W, Tuple2<Cells, Cells>> transformer) {
if (!writeConfig.getIsWriteConfig()) {
throw new IllegalArgumentException("Provided configuration object is not suitable for writing");
}
Tuple2<Map<String, ByteBuffer>, Map<String, ByteBuffer>> tuple = new Tuple2<>(null, null);
RDD<Tuple2<Cells, Cells>> mappedRDD = rdd.map(transformer,
ClassTag$.MODULE$.<Tuple2<Cells, Cells>>apply(tuple.getClass()));
((CassandraDeepJobConfig) writeConfig).createOutputTableIfNeeded(mappedRDD.first());
final int pageSize = writeConfig.getBatchSize();
int offset = 0;
List<Tuple2<Cells, Cells>> elements = Arrays.asList((Tuple2<Cells, Cells>[]) mappedRDD.collect());
List<Tuple2<Cells, Cells>> split;
do {
split = elements.subList(pageSize * (offset++), Math.min(pageSize * offset, elements.size()));
Batch batch = QueryBuilder.batch();
for (Tuple2<Cells, Cells> t : split) {
Tuple2<String[], Object[]> bindVars = Utils.prepareTuple4CqlDriver(t);
Insert insert = QueryBuilder
.insertInto(quote(writeConfig.getKeyspace()), quote(writeConfig.getTable()))
.values(bindVars._1(), bindVars._2());
batch.add(insert);
}
writeConfig.getSession().execute(batch);
} while (!split.isEmpty() && split.size() == pageSize);
}
内容来源于网络,如有侵权,请联系作者删除!