通过eclipse在hive上使用spark

mwg9r5ms  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(296)

我们使用jdbc API通过eclipse程序连接到hive来访问hive表,下面是代码:

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.Statement;
import org.testng.annotations.Test;

public class FetchHiveData_test {

private static String driverName = "org.apache.hive.jdbc.HiveDriver";

  @Test
  public void  FetchHiveDataMethod() {
  ResultSet hiveres=null;
  try {
      System.out.println("Inside Hive Method");
      Class.forName(driverName);

      /***********Hive***************/
    Connection con = DriverManager.getConnection("jdbc:hive2://XXXXX:20000", "xxxxxx", "yyyyy");
    Statement stmt = con.createStatement();
    String sql="select count(*) from table";
    hiveres = stmt.executeQuery(sql);

    ResultSetMetaData rsmd = hiveres.getMetaData();
    int numCols = rsmd.getColumnCount();
   for(int j=1;j<=numCols;j++){
        System.out.print(rsmd.getColumnName(j)+" ");
    }

    while (hiveres.next()) {

        //Print one row          
        for(int i = 1 ; i <= numCols; i++){

              System.out.print(hiveres.getString(i) + " "); //Print one element of a row

        }

          System.out.println();//Move to the next line to print the next row.           

            }

  } catch (Exception e) {

  e.printStackTrace();

}

}}

这里的方法使用spark上下文,但是不确定它将服务器名和凭据带到哪里,如何修改上面的程序来使用spark?这是为了让我们的查询运行得更快,因为jdbcapis有点慢。
Spark代码:

import java.util.List; 
import org.apache.spark.SparkConf; 
import org.apache.spark.api.java.JavaRDD; 
import org.apache.spark.api.java.JavaSparkContext; 
import org.apache.spark.api.java.function.Function; 
import org.apache.spark.sql.api.java.JavaSchemaRDD; 
import org.apache.spark.sql.api.java.Row; 
import org.apache.spark.sql.hive.api.java.JavaHiveContext;
import org.apache.spark.rdd.*;
import org.testng.annotations.Test; 

public class SparkTest {

@SuppressWarnings("serial")
@Test
  public void f() {
    final SparkConf sparkConf = new SparkConf().setMaster("xxxxx:20000").setAppName("HiveConnector");
    final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
    JavaHiveContext hiveCtx = new JavaHiveContext(sparkContext); 
    JavaSchemaRDD rdd = hiveCtx.sql("Select count(*) from table"); 
        JavaRDD<Integer> keys = rdd.map(new Function<Row, Integer>() { 
            public Integer call(Row row) { return row.getInt(0); } 
            }); 
    List<Integer> res= keys.collect(); 
    for(Integer val:res){ 
            System.out.println("val "+val); 
   } 
    }
}
gr8qqesn

gr8qqesn1#

试着跑步http://spark.apache.org/docs/latest/sql-programming-guide.html#running-thrift jdbcodbc服务器,您可以像在hive中一样通过jdbc访问它。
注意:这可能不支持所有hiveql。

相关问题