如何在impala查询中使用python变量使用epoch时间查找前一天?

nbysray5  于 2021-06-26  发布在  Impala
关注(0)|答案(1)|浏览(355)

我的目标是使用unix时间戳字段使用impala只查询昨天的数据。我不想硬编码日期,因为我希望这个脚本每天运行,只查询前一天。我正在使用python,并为开始和结束时间创建了字符串。
endtime存储为bigint,如下所示: 1561996779000 .

yesterday = dt.date.fromordinal(dt.date.today().toordinal()-1).strftime("%F")
yesterday_start = yesterday + ' 00:00:00'
yesterday_end = yesterday + ' 23:59:59'

yesterday_start
'2019-07-28 00:00:00'
yesterday_end
'2019-07-28 23:59:59'

我试过以下方法,但似乎都不管用:

cursor.execute('select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between unix_timestamp("+yesterday_start+") and unix_timestamp("+yesterday_end+")')
cursor.execute("select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between unix_timestamp("+yesterday_start+") and unix_timestamp("+yesterday_end+")")
cursor.execute("select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between unix_timestamp('yesterday_start') and unix_timestamp('yesterday_end')")
cursor.execute("SELECT * from proxy where endtime between unix_timestamp('"+yesterday_start+"') and unix_timestamp('"+yesterday_end+"')")

以下是 Impala 文档中的一个示例:

select unix_timestamp('2015-05-15 12:00:00');
+---------------------------------------+
| unix_timestamp('2015-05-15 12:00:00') |
+---------------------------------------+
| 1431691200                            |
+---------------------------------------+
30byixjq

30byixjq1#

仍在寻找更好的方法来实现这一点。不过,这是可行的。


# Date pattern

date_pattern = '%Y-%m-%d %H:%M:%S'

# Yesterday system date

yesterday = dt.date.fromordinal(dt.date.today().toordinal()-1).strftime("%F")

# Start datetime

yesterday_start = yesterday + ' 00:00:00'
yesterday_start_epoch = int(time.mktime(time.strptime(yesterday_start, date_pattern)))
yesterday_start_epoch_str = str(yesterday_start_epoch)

# End datetime

yesterday_end = yesterday + ' 23:59:59'
yesterday_end_epoch = int(time.mktime(time.strptime(yesterday_end, date_pattern)))
yesterday_end_epoch_str = str(yesterday_end_epoch)

# Start timer

start_time = timeit.default_timer()

# Connection and query

IMPALA_HOST = os.getenv('HOST', 'server')
conn = connect(host=HOST, port=port, auth_mechanism='', use_ssl=True)
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
tables = as_pandas(cursor)
cursor.execute("select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between cast('"+yesterday_start_epoch_str+"' AS INT) and cast('"+yesterday_end_epoch_str+"' AS INT)")
df = as_pandas(cursor)

# End timer

end_time = timeit.default_timer()

# Print time it took

print("Elapsed time: {}".format(end_time - start_time))

相关问题