我正在运行一个概念验证,以连接气流和oracle数据库来存储气流元数据。从文档(https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#database-uri)中,我看到大多数PostgreSQL,MySQL和MSSQL都受到支持,PorsgreSQL是最受欢迎的。然而,由于SQLAlchemy支持Oracle连接,我将airflow docker镜像配置为使用oracle db初始化。
首先,我想澄清的是,我正在Linux Ubuntu 20.4 LTS版本上使用Docker,所有内容都部署在那里。
我将从docker-compose.yml文件开始,它类似于官方airflow文档(https://airflow.apache.org/docs/apache-airflow/2.6.1/docker-compose.yaml)中的文件:
version: '3.8'
x-airflow-common:
&airflow-common
#image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.1-python3.8}
build: .
env_file: .env
user: "${AIRFLOW_UID:-50000}:0"
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: oracle+cx_oracle://NIKOS:Nikos_123@oracle-db:1521/?service_name=ORCLPDB1
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'False'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
_PIP_ADDITIONAL_REQUIREMENTS: ''
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
depends_on:
&airflow-common-depends-on
oracle-db:
condition: service_healthy
services:
oracle-db:
image: container-registry.oracle.com/database/enterprise:21.3.0.0
container_name: oracle-db-airflow
environment:
ORACLE_SID: ORCLCDB
ORACLE_PDB: ORCLPDB1
ORACLE_PWD: Oracle_123
#ORACLE_EDITION: standard
ports:
- 1521:1521
volumes:
- oracle-data:/opt/oracle/oradata
- oracle-backup:/opt/oracle/backup
- ./create_oracle_user.sql:/create_oracle_user.sql
healthcheck:
test: [ "CMD", "/opt/oracle/checkDBStatus.sh"]
interval: 30s
timeout: 10s
retries: 2
restart: unless-stopped
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: unless-stopped
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
env_file:
- .env
user: root
entrypoint:
- /bin/bash
- -c
- |
mkdir -p /opt/oracle
apt-get update
apt-get install -y unzip
unzip /opt/instantclient-basic-linux.x64-21.10.0.0.0dbru.zip -d /opt/oracle/
apt-get install -y libaio1
sh -c "echo /opt/oracle/instantclient_21_10 > /etc/ld.so.conf.d/oracle-instantclient.conf"
ldconfig
ls /opt/oracle/instantclient_21_10 | grep "libclntsh.so"
/usr/bin/dumb-init -- /entrypoint version
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/instantclient-basic-linux.x64-21.10.0.0.0dbru.zip:/opt/instantclient-basic-linux.x64-21.10.0.0.0dbru.zip
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
_PIP_ADDITIONAL_REQUIREMENTS: ''
LD_LIBRARY_PATH: /opt/oracle/instantclient_21_10
volumes:
oracle-data:
external: false
name: oracla_store
oracle-backup:
external: false
name: oracla_backup
networks:
default:
driver: bridge
name: network_airflow_poc
字符串
oracle-db服务是从container.registry of Oracle获取的官方oracle数据库。数据库运行良好。我可以使用Oracle SQL Developer免费应用程序检查与数据库的连接沿着用户凭据。此外,用户NIKOS
也是有效的,并且具有以下权限GRANT CONNECT, CREATE SESSION, CREATE TABLE, CREATE VIEW TO NIKOS;
我的问题是在执行服务airflow-init
时发现的。你会注意到,我在docker-compose文件中的这个服务下添加了一些额外的bash命令。这是因为默认情况下未安装cx_Oracle
软件包和Oracle Instant Client。因此,我必须执行以下步骤:
步骤1:分别使用以下配置创建Dockerfile
和requirements.txt
(文档):
FROM apache/airflow:2.6.1-python3.8
ADD requirements.txt .
RUN pip install -r requirements.txt
sqlalchemy==1.4.48
cx_Oracle==8.3.0
的数据
并通过取消注解docker service的build参数(build: .
)来重建docker airflow镜像。
步骤2:在气流容器内安装Oracle Instant Client。因此,按照官方documentation_1,documentation_2的步骤,我设法克服了以下错误
DPI-1047:找不到64位Oracle客户端库
但是,由于以下日志,整个解决方案无法完成:
x1c 0d1x的数据
完整错误日志here
这个错误基本上说的是表log
,我假设是一个airflow metadata
表,没有找到。因为从来没有创造过。
奇怪的是,当我使用PostgreSQL
数据库作为元数据后端执行整个docker-compose文件并在PGAdmin中检查结果时,我可以清楚地看到初始化的表,如log
和其他表。
的
因此,我的问题是如何克服当前在错误日志中遇到的问题?即使Airflow具有有效的SQLAlchemy连接字符串并安装了必要的依赖库,它也无法在oracle db中正确启动其元数据表。
1条答案
按热度按时间7tofc5zh1#
我已经找到了解决这个问题的办法。根据完整的堆栈跟踪,问题位于
db.py
文件下,该文件位于 /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/在此文件中,在开始构建元数据表之前调用了函数
check()
。check
函数实际上是检查数据库是否处于活动状态。调用函数时执行了以下命令。session.execute("SELECT 1 AS IS_ALIVE")
个SQL语句不遵循Oracle SQL系统。因此,我将命令更改为
session.execute("SELECT 1 AS IS_ALIVE FROM DUAL;")
个这就是我发布的问题的解决方案。