使用apache spark将数据存储导出到gcs bucket中的datastore\u backup

des4xlb0 于 2021-05-18 发布在 Spark

关注(0)|答案(1)|浏览(449)

我想每天将我的数据存储导出到gcs bucket中，格式为datastore\u backup。目前我正在使用curl命令通过gcp数据存储导出服务进行导出，如下所示：

-X POST \
-H "Authorization: Bearer $access_token" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1/projects/viu-data-warehouse-prod:export \
-d '{
  "labels": {
    "exportVersion": "'"$BUILD_ID"'"
  },
  "outputUrlPrefix": "'"$output_url"'",
  "entityFilter": {
    "namespaceIds": ["customer_one_view"],
    "kinds": ["user_view"]
  },
}') ```
I want it to be done by Apache Spark to make it faster. My Problem is it takes 5 to 6 hrs to finish and as Data is growing it is increasing,
I need suggestion to optimize this process by achieving Parallel processing. I would like to do it via Apache Spark as it is very Fast. Please suggest me how can I do it.

apache-spark export google-cloud-datastore

来源：https://stackoverflow.com/questions/64566206/datastore-export-to-datastore-backup-in-gcs-bucket-using-apache-spark

1条答案

按热度按时间

jucafojl1#

如果您不受spark或特定导出格式的限制。您可以从gcs apache beam（dataflow）模板的datastore开始https://cloud.google.com/dataflow/docs/guides/templates/provided-batch#datastore-云存储文本，然后分叉以满足您的需要。

赞(0）回复(0）举报 2021-05-19

我来回答

使用apache spark将数据存储导出到gcs bucket中的datastore\u backup

1条答案

相关问题

热门标签

最新问答