我正在对sparksql查询执行计划进行一些分析。explain()api打印的执行计划可读性不强。如果我们看到SparkWebUI,就会创建一个dag图,它分为作业、阶段和任务,可读性更高。有没有任何方法可以从执行计划或代码中的任何api创建该图?如果没有,是否有任何api可以从ui读取grap?
bxjv4tth1#
在我看来,这个项目(https://github.com/absaoss/spline-spark-agent)能够解释执行计划并以可读的方式生成它。这个spark作业是读取一个文件,将其转换为csv文件,然后写入本地。json中的示例输出如下
{ "id": "3861a1a7-ca31-4fab-b0f5-6dbcb53387ca", "operations": { "write": { "outputSource": "file:/output.csv", "append": false, "id": 0, "childIds": [ 1 ], "params": { "path": "output.csv" }, "extra": { "name": "InsertIntoHadoopFsRelationCommand", "destinationType": "csv" } }, "reads": [ { "inputSources": [ "file:/Users/liajiang/Downloads/spark-onboarding-demo-application/src/main/resources/wikidata.csv" ], "id": 2, "schema": [ "6742cfd4-d8b6-4827-89f2-4b2f7e060c57", "62c022d9-c506-4e6e-984a-ee0c48f9df11", "26f1d7b5-74a4-459c-87f3-46a3df781400", "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0", "2e019926-3adf-4ece-8ea7-0e01befd296b" ], "params": { "inferschema": "true", "header": "true" }, "extra": { "name": "LogicalRelation", "sourceType": "csv" } } ], "other": [ { "id": 1, "childIds": [ 2 ], "params": { "name": "`source`" }, "extra": { "name": "SubqueryAlias" } } ] }, "systemInfo": { "name": "spark", "version": "2.4.2" }, "agentInfo": { "name": "spline", "version": "0.5.5" }, "extraInfo": { "appName": "spark-spline-demo-application", "dataTypes": [ { "_typeHint": "dt.Simple", "id": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a", "name": "timestamp", "nullable": true }, { "_typeHint": "dt.Simple", "id": "dbe1d206-3d87-442c-837d-dfa47c88b9c1", "name": "string", "nullable": true }, { "_typeHint": "dt.Simple", "id": "0d786d1e-030b-4997-b005-b4603aa247d7", "name": "integer", "nullable": true } ], "attributes": [ { "id": "6742cfd4-d8b6-4827-89f2-4b2f7e060c57", "name": "date", "dataTypeId": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a" }, { "id": "62c022d9-c506-4e6e-984a-ee0c48f9df11", "name": "domain_code", "dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1" }, { "id": "26f1d7b5-74a4-459c-87f3-46a3df781400", "name": "page_title", "dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1" }, { "id": "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0", "name": "count_views", "dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7" }, { "id": "2e019926-3adf-4ece-8ea7-0e01befd296b", "name": "total_response_size", "dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7" } ] } }
1条答案
按热度按时间bxjv4tth1#
在我看来,这个项目(https://github.com/absaoss/spline-spark-agent)能够解释执行计划并以可读的方式生成它。这个spark作业是读取一个文件,将其转换为csv文件,然后写入本地。
json中的示例输出如下