我正在尝试读取配置单元查询中gcp存储桶中的文件。
基本上,我想做的就是
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.StorageOptions;
Storage storage = StorageOptions.getDefaultInstance().getService();
Blob blob = storage.get(BlobId.of(bucketName, srcFilename));
String fileContent = new String(blob.getContent());
return fileContent;
现在,当我在mac上运行这个程序时,它就可以工作了(我有一个可以访问bucket的gcloud设置)
现在,我想有相同的功能,但在一个Hive自定义项。所以,我做了一个非常简单的jar
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.StorageOptions;
@UDFType(deterministic = true)
public class MyAwesomeUDF extends GenericUDF{
@Override
public String process(String srcFilename, String bucketName) throws IOException {
Storage storage = StorageOptions.getDefaultInstance().getService();
Blob blob = storage.get(BlobId.of(bucketName, srcFilename));
String fileContent = new String(blob.getContent());
return fileContent;
}
}
这是我的pom.xml
<dependencies>
<!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-storage -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.71.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>1.2.1</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>apache-log4j-extras</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>hive-exe-jar-with-dependencies</finalName>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>repackaged.com.google.common</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
接下来,我构建了这个jar,并且可以在vm上运行它。
最后,这里是我要运行的配置单元查询
add jar /path/to/my/awesome/jar;
use myDb;
create temporary function awesome_fun as 'package.path.to.my.MyAwesomeUDF';
select
awesome_fun('bucketName','srcFileName');
但我得到了
Exception in thread "main" java.lang.NoSuchMethodError: com.google.api.services.storage.Storage$Objects$Get.setUserProject(Ljava/lang/String;)Lcom/google/api/services/storage/Storage$Objects$Get;
at com.google.cloud.storage.spi.v1.HttpStorageRpc.getCall(HttpStorageRpc.java:403)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:411)
at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:198)
at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:195)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:195)
at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:209)
错误发生在 Storage storage = StorageOptions.getDefaultInstance().getService();
此外,在构建jar之后,我可以看到(使用 jar -tf
)那个 com.google.api.services.storage.Storage$Objects$Get
存在。
我做错什么了?
1条答案
按热度按时间9q78igpj1#
问题是缺少方法,请确保编译时实际运行的类文件已更新,或者验证编译的类和库是否在同一版本中。