According to this https://vimeo.com/338100834 , it is possible to use Delta Lake without Databricks Runtime. Delta Lake is just a lib which "knows" how to write and read transactionally into the table (a collection of parquet files) by maintaining a special transaction log besides each table. Of course, a special connector for external applications (e.g. hive) is needed in order to work with such tables. Otherwise, transactional and consistency guarantees cannot be enforced.
根据文件:https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake,delta lake已开放源代码,可与Apache Spark搭配使用。只要将delta lake jar新增至程式码或将程式库新增至Spark安装路径,即可轻松完成整合。Hive整合可使用下列方式完成:https://github.com/delta-io/connectors。
4条答案
按热度按时间ttisahbt1#
是的,delta lake已经由databricks(https://delta.io/)开源。我正在使用deltalake(0.6.1)沿着apache spark(2.4.5)和S3。许多其他集成也可用于适应现有的技术堆栈,例如集成hive、presto、athena等。连接器:https://github.com/delta-io/connectors集成:https://docs.delta.io/latest/presto-integration.html与https://docs.delta.io/latest/integrations.html
rhfm7lfc2#
According to this https://vimeo.com/338100834 , it is possible to use Delta Lake without Databricks Runtime. Delta Lake is just a lib which "knows" how to write and read transactionally into the table (a collection of parquet files) by maintaining a special transaction log besides each table. Of course, a special connector for external applications (e.g. hive) is needed in order to work with such tables. Otherwise, transactional and consistency guarantees cannot be enforced.
qco9c6ql3#
根据文件:https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake,delta lake已开放源代码,可与Apache Spark搭配使用。只要将delta lake jar新增至程式码或将程式库新增至Spark安装路径,即可轻松完成整合。Hive整合可使用下列方式完成:https://github.com/delta-io/connectors。
taor4pac4#
Delta lake是一个开源项目,支持在现有存储系统(如S3、ADLS、GCS和HDFS)之上构建Lakehouse架构。
您可以在这里找到增量的GitHub Repo:https://github.com/delta-io/delta
简而言之,您可以使用Delta lake而不使用Databricks运行时,因为它是开源的,但使用Databricks,您可以获得作为托管商业产品的一些优化,而这些优化是默认情况下无法获得的。