apache flink 1.11无法通过java flink Streaming作业中的sql函数ddl使用python udf

rjjhvcjd 于 2021-06-21 发布在 Flink

关注(0)|答案(1)|浏览(489)

在 Flip-106 这里有一个示例，说明如何通过sql函数ddl在批处理作业java应用程序中调用用户定义的python函数。。。

BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
tEnv.getConfig().getConfiguration().setString("python.files", "/home/my/test1.py");
tEnv.getConfig().getConfiguration().setString("python.client.executable", "python3");

tEnv.sqlUpdate("create temporary system function func1 as 'test1.func1' language python");
Table table = tEnv.fromDataSet(env.fromElements("1", "2", "3")).as("str").select("func1(str)");
tEnv.toDataSet(table, String.class).collect();

我一直在尝试在一个流式作业java应用程序中重现同样的示例，下面是我的代码：

final StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(EnvironmentConfiguration.getEnv(), fsSettings);
fsTableEnv.getConfig().getConfiguration().setString("python.files", "/Users/jf/Desktop/flink/fca/test.py");
fsTableEnv.getConfig().getConfiguration().setString("python.client.executable", "/Users/jf/opt/anaconda3/bin/python");

fsTableEnv.sqlUpdate("CREATE TEMPORARY SYSTEM FUNCTION func1 AS 'test.func1' LANGUAGE PYTHON");
Table table = fsTableEnv.fromValues("1", "2", "3").as("str").select("func1(str)");
/* Missing line */

对于批处理作业中的此特定行：

tEnv.toDataSet(table, String.class).collect();

我还没有找到一个同等的流媒体工作
1.你能帮我把这个flip-106例子从批Map到流吗？
最后，我想用flink 1.11在流化作业java flink应用程序中调用python函数，如下所示：

final StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(EnvironmentConfiguration.getEnv(), fsSettings);
fsTableEnv.getConfig().getConfiguration().setString("python.files", "/Users/jf/Desktop/flink/fca/test.py");
fsTableEnv.getConfig().getConfiguration().setString("python.client.executable", "/Users/jf/opt/anaconda3/bin/python");

fsTableEnv.sqlUpdate("CREATE TEMPORARY SYSTEM FUNCTION func1 AS 'test.func1' LANGUAGE PYTHON");
final Table table = fsTableEnv.fromDataStream(stream_filtered.map(x->x.idsUmid)).select("func1(f0)").as("umid");
System.out.println("Result --> " + table.select($("umid")) + " --> End of Result");

并将该自定义项的结果用于进一步的处理（不必在控制台中打印）
我已经编辑了 test.py 文件，以查看是否至少不管未命名的表在python中正在执行某些操作。

from pyflink.table.types import DataTypes
from pyflink.table.udf import udf
from os import getcwd

@udf(input_types=[DataTypes.STRING()], result_type=DataTypes.STRING())
def func1(line):
    print(line)
    print(getcwd())
    with open("test.txt", "a") as myfile:
        myfile.write(line)
    return line

并且不会打印任何内容，也不会创建test.txt文件，也不会将值返回到流式处理作业。所以基本上这个python函数没有被调用。
2.我错过了什么？
感谢david、wei和xingbo到目前为止的支持，因为每一个细节都对我有用。
致以最诚挚的问候，
乔纳森

python apache-flink flink-cep pyflink flink-table-api

来源：https://stackoverflow.com/questions/63237415/apache-flink-1-11-unable-to-use-python-udf-through-sql-function-ddl-in-java-flin

1条答案

按热度按时间

tkqqtvp11#

你可以试试这个：

final StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(EnvironmentConfiguration.getEnv(), fsSettings);
fsTableEnv.getConfig().getConfiguration().setString("python.files", "/Users/jf/Desktop/flink/fca/test.py");
fsTableEnv.getConfig().getConfiguration().setString("python.client.executable", "/Users/jf/opt/anaconda3/bin/python");

// You need to specify the python interpreter used to run the python udf on cluster.
// I assume this is a local program so it is the same as the "python.client.executable".
fsTableEnv.getConfig().getConfiguration().setString("python.executable", "/Users/jf/opt/anaconda3/bin/python");

fsTableEnv.sqlUpdate("CREATE TEMPORARY SYSTEM FUNCTION func1 AS 'test.func1' LANGUAGE PYTHON");
final Table table = fsTableEnv.fromDataStream(stream_filtered.map(x->x.idsUmid)).select("func1(f0)").as("umid");

// 'table.select($("umid"))' will not trigger job execution. You need to call the "execute()" method explicitly.
table.execute().print();

赞(0）回复(0）举报 2021-06-21

我来回答

apache flink 1.11无法通过java flink Streaming作业中的sql函数ddl使用python udf

1条答案

相关问题

热门标签

最新问答