HDFS 不带数据块的Delta Lake运行时

yebdmbv4 于 2022-12-09 发布在 HDFS

关注(0)|答案(4)|浏览(176)

可以使用Delta Lake而不依赖于Databricks Runtime吗？（我的意思是，有没有可能只在内部使用带有hdfs和spark的Delta Lake？）如果没有，你能从技术的Angular 详细说明为什么会这样吗？

hdfs

来源：https://stackoverflow.com/questions/60817234/delta-lake-without-databricks-runtime

4条答案

按热度按时间

ttisahbt1#

是的，delta lake已经由databricks（https://delta.io/）开源。我正在使用deltalake（0.6.1）沿着apache spark（2.4.5）和S3。许多其他集成也可用于适应现有的技术堆栈，例如集成hive、presto、athena等。连接器：https://github.com/delta-io/connectors集成：https://docs.delta.io/latest/presto-integration.html与https://docs.delta.io/latest/integrations.html

赞(0）回复(0）举报 2022-12-09

rhfm7lfc2#

According to this https://vimeo.com/338100834 , it is possible to use Delta Lake without Databricks Runtime. Delta Lake is just a lib which "knows" how to write and read transactionally into the table (a collection of parquet files) by maintaining a special transaction log besides each table. Of course, a special connector for external applications (e.g. hive) is needed in order to work with such tables. Otherwise, transactional and consistency guarantees cannot be enforced.

赞(0）回复(0）举报 2022-12-09

qco9c6ql3#

根据文件：https：//docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake，delta lake已开放源代码，可与Apache Spark搭配使用。只要将delta lake jar新增至程式码或将程式库新增至Spark安装路径，即可轻松完成整合。Hive整合可使用下列方式完成：https://github.com/delta-io/connectors。

赞(0）回复(0）举报 2022-12-09

taor4pac4#

Delta lake是一个开源项目，支持在现有存储系统（如S3、ADLS、GCS和HDFS）之上构建Lakehouse架构。
您可以在这里找到增量的GitHub Repo：https://github.com/delta-io/delta
简而言之，您可以使用Delta lake而不使用Databricks运行时，因为它是开源的，但使用Databricks，您可以获得作为托管商业产品的一些优化，而这些优化是默认情况下无法获得的。

赞(0）回复(0）举报 2022-12-09

我来回答

HDFS 不带数据块的Delta Lake运行时

4条答案

相关问题

热门标签

最新问答