如何获取spark sql查询执行计划的dag?

yfwxisqw  于 2021-05-24  发布在  Spark
关注(0)|答案(1)|浏览(1141)

我正在对sparksql查询执行计划进行一些分析。explain()api打印的执行计划可读性不强。如果我们看到SparkWebUI,就会创建一个dag图,它分为作业、阶段和任务,可读性更高。有没有任何方法可以从执行计划或代码中的任何api创建该图?如果没有,是否有任何api可以从ui读取grap?

bxjv4tth

bxjv4tth1#

在我看来,这个项目(https://github.com/absaoss/spline-spark-agent)能够解释执行计划并以可读的方式生成它。这个spark作业是读取一个文件,将其转换为csv文件,然后写入本地。
json中的示例输出如下

{
    "id": "3861a1a7-ca31-4fab-b0f5-6dbcb53387ca",
    "operations": {
        "write": {
            "outputSource": "file:/output.csv",
            "append": false,
            "id": 0,
            "childIds": [
                1
            ],
            "params": {
                "path": "output.csv"
            },
            "extra": {
                "name": "InsertIntoHadoopFsRelationCommand",
                "destinationType": "csv"
            }
        },
        "reads": [
            {
                "inputSources": [
                    "file:/Users/liajiang/Downloads/spark-onboarding-demo-application/src/main/resources/wikidata.csv"
                ],
                "id": 2,
                "schema": [
                    "6742cfd4-d8b6-4827-89f2-4b2f7e060c57",
                    "62c022d9-c506-4e6e-984a-ee0c48f9df11",
                    "26f1d7b5-74a4-459c-87f3-46a3df781400",
                    "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0",
                    "2e019926-3adf-4ece-8ea7-0e01befd296b"
                ],
                "params": {
                    "inferschema": "true",
                    "header": "true"
                },
                "extra": {
                    "name": "LogicalRelation",
                    "sourceType": "csv"
                }
            }
        ],
        "other": [
            {
                "id": 1,
                "childIds": [
                    2
                ],
                "params": {
                    "name": "`source`"
                },
                "extra": {
                    "name": "SubqueryAlias"
                }
            }
        ]
    },
    "systemInfo": {
        "name": "spark",
        "version": "2.4.2"
    },
    "agentInfo": {
        "name": "spline",
        "version": "0.5.5"
    },
    "extraInfo": {
        "appName": "spark-spline-demo-application",
        "dataTypes": [
            {
                "_typeHint": "dt.Simple",
                "id": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a",
                "name": "timestamp",
                "nullable": true
            },
            {
                "_typeHint": "dt.Simple",
                "id": "dbe1d206-3d87-442c-837d-dfa47c88b9c1",
                "name": "string",
                "nullable": true
            },
            {
                "_typeHint": "dt.Simple",
                "id": "0d786d1e-030b-4997-b005-b4603aa247d7",
                "name": "integer",
                "nullable": true
            }
        ],
        "attributes": [
            {
                "id": "6742cfd4-d8b6-4827-89f2-4b2f7e060c57",
                "name": "date",
                "dataTypeId": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a"
            },
            {
                "id": "62c022d9-c506-4e6e-984a-ee0c48f9df11",
                "name": "domain_code",
                "dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
            },
            {
                "id": "26f1d7b5-74a4-459c-87f3-46a3df781400",
                "name": "page_title",
                "dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
            },
            {
                "id": "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0",
                "name": "count_views",
                "dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
            },
            {
                "id": "2e019926-3adf-4ece-8ea7-0e01befd296b",
                "name": "total_response_size",
                "dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
            }
        ]
    }
}

相关问题