我有一个约133m行和16列的表。我想在同一服务器上的另一个数据库上为第3-16列中的每一列创建14个表（第1列和第2列是相同的） id 以及 timestamp 它们也将出现在最后14个表中，但不会有自己的表），其中每个表都有原始列的名称。这是否可以仅用sql脚本来实现？在我看来，这是一种更可取的、最快的方法，这似乎是合乎逻辑的。
目前，我有一个python脚本，它通过解析原始表的csv转储（使用50行进行测试）、创建新表和添加相关值来“工作”，但速度非常慢（我估计传输所有1.33亿行需要将近1年时间，这显然是不可接受的）。这是我第一次以任何能力使用sql，我确信我的代码可以加快速度，但我不确定如何加快速度，因为我不熟悉sql。中间的大sql字符串命令是从我们的代码库中的其他代码复制过来的。我试过使用如下所示的事务，但它似乎对速度没有任何显著影响。

import re
import mysql.connector
import time

# option flags

debug = False  # prints out information during runtime
timing = True  # times the execution time of the program

# save start time for timing. won't be used later if timing is false

start_time = time.time()

# open file for reading

path = 'test_vaisala_sql.csv'
file = open(path, 'r')

# read in column values

column_str = file.readline().strip()
columns = re.split(',vaisala_|,', column_str)  # parse columns with regex to remove commas and vasiala_
if debug:
    print(columns)

# open connection to MySQL server

cnx = mysql.connector.connect(user='root', password='<redacted>',
                              host='127.0.0.1',
                              database='measurements')
cursor = cnx.cursor()

# create the table in the MySQL database if it doesn't already exist

for i in range(2, len(columns)):
    table_name = 'vaisala2_' + columns[i]
    sql_command = "CREATE TABLE IF NOT EXISTS " + \
                  table_name + "(`id` BIGINT(20) NOT NULL AUTO_INCREMENT, " \
                               "`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, " \
                               "`milliseconds` BIGINT(20) NOT NULL DEFAULT '0', " \
                               "`value` varchar(255) DEFAULT NULL, " \
                               "PRIMARY KEY (`id`), " \
                               "UNIQUE KEY `milliseconds` (`milliseconds`)" \
                               "COMMENT 'Eliminates duplicate millisecond values', " \
                               "KEY `timestamp` (`timestamp`)) " \
                               "ENGINE=InnoDB DEFAULT CHARSET=utf8;"

    if debug:
        print("Creating table", table_name, "in database")

    cursor.execute(sql_command)

# read in rest of lines in CSV file

for line in file.readlines():
    cursor.execute("START TRANSACTION;")
    line = line.strip()
    values = re.split(',"|",|,', line)  # regex split along commas, or commas and quotes
    if debug:
        print(values)

    # iterate of each data column. Starts at 2 to eliminate `id` and `timestamp`
    for i in range(2, len(columns)):
        table_name = "vaisala2_" + columns[i]
        timestamp = values[1]

        # translate timestamp back to epoch time
        try:
            pattern = '%Y-%m-%d %H:%M:%S'
            epoch = int(time.mktime(time.strptime(timestamp, pattern)))
            milliseconds = epoch * 1000  # convert seconds to ms
        except ValueError:  # errors default to 0
            milliseconds = 0

        value = values[i]

        # generate SQL command to insert data into destination table
        sql_command = "INSERT IGNORE INTO {} VALUES (NULL,'{}',{},'{}');".format(table_name, timestamp,
                                                                                 milliseconds, value)
        if debug:
            print(sql_command)

        cursor.execute(sql_command)
cnx.commit()  # commits changes in destination MySQL server

# print total execution time

if timing:
    print("Completed in %s seconds" % (time.time() - start_time))

这不需要进行难以置信的优化；如果机器必须运行几天才能运行，这是完全可以接受的。但是一年太长了。

2条答案

按热度按时间

vhipe2zx1#

当使用脚本与数据库对话时，您希望尽量减少发送的消息数量，因为每条消息会进一步延迟执行时间。目前，您似乎正在发送（按您的近似值）1.33亿条消息，因此，您的脚本速度减慢了1.33亿次。一个简单的优化方法是解析电子表格并将数据拆分到表中（在内存中或保存到磁盘中），然后将数据发送到新的数据库。
正如您所暗示的，编写sql脚本来重新分发数据要快得多。

赞(0）回复(0）举报 2021-06-20

uurv41yg2#

可以从 SELECT 比如：

CREATE TABLE <other database name>.<column name>
             AS
             SELECT <column name>
                    FROM <original database name>.<table name>;

（更换 <...> 或者用其他列或 WHERE 条款或……）
这也会将查询中的数据插入到新表中。这可能是最快的方法。
您可以使用动态sql和目录中的信息（即 information_schema.columns )创建 CREATE 语句或手动创建它们，这很烦人，但我想对于14列是可以接受的。

python—基于另一个表中列的名称在mysql中创建表

2条答案

相关问题

热门标签

最新问答