我最近编写了一个python脚本,从json文件中提取一些数据,并使用它为以下语句生成一些sql插入值:
INSERT INTO `card`(`artist`,`class_pid`,`collectible`,`cost`, `dbfid`, `api_db_id`, `name`, `rarity`, `cardset_pid`, `cardtype`, `attack`, `health`, `race`, `durability`, `armor`,`multiclassgroup`, `text`) VALUES ("generated entry goes here")
在我的sql表中,某些属性的名称不同,但使用相同的值(json文件/python脚本中的示例cardclass在sql表中称为class\u pid)。从脚本生成的值是有效的sql,可以成功地插入到数据库中,但是我注意到,在生成的export.txt文件中,一些值与原来的值不同。例如,utf-8编码的json文件中的以下json条目:
[{"artist":"Arthur Bozonnet","attack":3,"cardClass":8,"collectible":1,"cost":2,"dbfId":2545,"flavor":"And he can't get up.","health":2,"id":"AT_003","mechanics":["HEROPOWER_DAMAGE"],"name":"Fallen Hero","rarity":"RARE","set":1,"text":"Your Hero Power deals 1 extra damage.","type":"MINION"},{"artist":"Ivan Fomin","attack":2,"cardClass":11,"collectible":1,"cost":2,"dbfId":54256,"flavor":"Were you expectorating another bad pun?","health":4,"id":"ULD_182","mechanics":["TRIGGER_VISUAL"],"name":"Spitting Camel","race":"BEAST","rarity":"COMMON","set":22,"text":"[x]At the end of your turn,\n deal 1 damage to another \nrandom friendly minion.","type":"MINION"}]
生成此输出:
('Arthur Bozonnet',8,1,2,'2545','AT_003','Fallen Hero','RARE',1,'MINION',3,2,'NULL',0,0,'NULL','Your Hero Power deals 1\xa0extra damage.'),('Ivan Fomin',11,1,2,'54256','ULD_182','Spitting Camel','COMMON',22,'MINION',2,4,'BEAST',0,0,'NULL','[x]At the end of your turn,\n\xa0\xa0deal 1 damage to another\xa0\xa0\nrandom friendly minion.')
如您所见,json条目中的一些值以某种方式被更改,就好像文本编码在某个地方被更改一样,尽管在我的脚本中,我确保json文件是用utf-8编码打开的,并且生成的文本文件也是用utf-8打开并写入的,以匹配json文件。我的目标是保持值与json文件中的值完全相同,并将这些值与json中的值完全相同地传输到生成的sql值条目中。例如,在生成的sql中,我希望第二个条目的“text”值为:
"[x]At the end of your turn,\n deal 1 damage to another \nrandom friendly minion."
而不是:
"[x]At the end of your turn,\n\xa0\xa0deal 1 damage to another\xa0\xa0\nrandom friendly minion."
我尝试使用unicodedata.normalize()等函数,但不幸的是,它似乎没有以任何方式更改输出。这是我编写的用于生成sql值的脚本:
import json
import io
chosen_keys = ['artist','cardClass','collectible','cost',
'dbfId','id','name','rarity','set','type','attack','health',
'race','durability','armor',
'multiClassGroup','text']
defaults = ['NULL','0','0','0',
'NULL','NULL','NULL','NULL','0','NULL','0','0',
'NULL','0','0',
'NULL','NULL']
def saveChangesString(dataList, filename):
with io.open(filename, 'w', encoding='utf-8') as f:
f.write(dataList)
f.close()
def generateSQL(json_dict):
count = 0
endCount = 1
records = ""
finalState = ""
print('\n'+str(len(json_dict))+' records will be processed\n')
for i in json_dict:
entry = "("
jcount = 0
for j in chosen_keys:
if j in i.keys():
if str(i.get(j)).isdigit() and j != 'dbfId':
entry = entry + str(i.get(j))
else:
entry = entry + repr(str(i.get(j)))
else:
if str(defaults[jcount]).isdigit() and j != 'dbfId':
entry = entry + str(defaults[jcount])
else:
entry = entry + repr(str(defaults[jcount]))
if jcount != len(chosen_keys)-1:
entry = entry+","
jcount = jcount + 1
entry = entry + ")"
if count != len(json_dict)-1:
entry = entry+","
count = count + 1
if endCount % 100 == 0 and endCount >= 100 and endCount < len(json_dict):
print('processed records '+str(endCount - 99)+' - '+str(endCount))
if endCount + 100 > len(json_dict):
finalState = 'processed records '+str(endCount+1)+' - '+str(len(json_dict))
if endCount == len(json_dict):
print(finalState)
records = records + entry
endCount = endCount + 1
saveChangesString(records,'export.txt')
print('done')
with io.open('cards.collectible.sample.example.json', 'r', encoding='utf-8') as f:
json_to_dict = json.load(f)
f.close()
generateSQL(json_to_dict)
任何帮助都将不胜感激,因为我实际使用的json文件包含2000多个条目,所以我更希望避免手动编辑。谢谢您。
sql表结构代码也是:
-- phpMyAdmin SQL Dump
CREATE TABLE `card` (
`pid` int(10) NOT NULL,
`api_db_id` varchar(50) NOT NULL,
`dbfid` varchar(50) NOT NULL,
`name` varchar(50) NOT NULL,
`cardset_pid` int(10) NOT NULL,
`cardtype` varchar(50) NOT NULL,
`rarity` varchar(20) NOT NULL,
`cost` int(3) NOT NULL,
`attack` int(10) NOT NULL,
`health` int(10) NOT NULL,
`artist` varchar(50) NOT NULL,
`collectible` tinyint(1) NOT NULL,
`class_pid` int(10) NOT NULL,
`race` varchar(50) NOT NULL,
`durability` int(10) NOT NULL,
`armor` int(10) NOT NULL,
`multiclassgroup` varchar(50) NOT NULL,
`text` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE `card`
ADD PRIMARY KEY (`pid`);
ALTER TABLE `card`
MODIFY `pid` int(10) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
COMMIT;
1条答案
按热度按时间f5emj3cl1#
\xa0
是太空的变种。它是来自文字吗?但是,更相关的是,它不是utf8;它是
latin1
或其他非utf8编码。你需要回到它的来源,把它改成utf8。或者,如果下一步只是将其放入mysql表中,那么您需要告诉客户机的真相——即它是用拉丁文1(而不是utf8)编码的。一旦您完成了这一点,mysql将在测试期间为您处理转换
INSERT
.