找不到我需要的东西。scala和python中的大量代码。以下是我所拥有的:
import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class CassandraWriter {
private transient Logger logger = Logger.getLogger(CassandraWriter.class);
private Dataset<Row> hdfsDF;
public CassandraWriter(Dataset<Row> dataFrame) {
hdfsDF = dataFrame;
}
public void writeToCassandra(String tableName, String keyspace) {
logger.info("Writing DataFrame to table: " + tableName);
hdfsDF.write().format("org.apache.spark.sql.cassandra").mode("overwrite")
.option("table",tableName)
.option("keyspace",keyspace)
.save();
logger.info("Inserted DataFrame to Cassandra successfully");
}
}
运行时出现的错误是:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark.apache.org/third-party-projects.html
你知道吗?
1条答案
按热度按时间tpxzln5u1#
您需要确保spark cassandra连接器包含在您提交的结果jar中。
这可以通过构建所谓的fatjar来完成,并提交它。例如,这里是示例(这里是完整的pom):
或者您可以指定spark cassandra连接器作为封装通过
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2