I am almost new in database. I want to organize my data base I really have no idea how to do it. I want to get the data or insert it in the fastest way. So organizing the table is important for me.
My main table consist more than 6 or 6.3 million records. So the time will be valuable. I should mention that this data will store in mysql and I want to use it in my web application. So the speed is important. every week I renew the main data. Could you do me favor and tell me how to broke it to several tables which could be efficient and fast?
the main data which will import in mysql which is in text file.
index | maket_indicator|country|commodity|year|month|Thousand Ton| sign
1 STOCK U.K. Soybeans 2013 1 141 *
2 STOCK U.K. Soybeans 2013 2 129 *
3 STOCK U.K. Soybeans 2013 3 82.2 *
4 STOCK U.K. Soybeans 2013 4 127.6 *
5 STOCK U.K. Soybeans 2013 5 120.2 *
6 STOCK U.K. Soybeans 2013 6 136 *
7 STOCK U.K. Soybeans 2013 7 134.6 *
8 STOCK U.K. Soybeans 2013 8 113.2 *
9 STOCK U.K. Soybeans 2013 9 135 *
4186 STOCK U.S.A. Soybeans 2017 8 14852 *
4187 STOCK U.S.A. Soybeans 2017 9 8208.1
4188 STOCK U.S.A. Soybeans 2017 10 99900 *
4189 STOCK U.S.A. Soybeans 2017 11 99900 *
4190 STOCK U.S.A. Soybeans 2017 12 86019.6
4191 STOCK U.S.A. Soybeans 2018 1 78400 *
4192 STOCK U.S.A. Soybeans 2018 2 67500 *
4193 STOCK U.S.A. Soybeans 2018 3 57405.8
4194 STOCK U.S.A. Soybeans 2018 4 50540 *
4195 STOCK U.S.A. Soybeans 2018 5 43722.8 *
4196 STOCK U.S.A. Soybeans 2018 6 33184.7
- this sign is something that we know what is it. maybe the table has it or maybe not.
market_indicator,country and commodity repeat a lot. I think that maybe I should separated each of them to 3 separated tables and has a separated index. like this:
table 1
index | maket_indicator
a1 STOCK
a2 STOCK
a3 STOCK
a4 STOCK
table 2
index | country
b1 U.K.
b2 U.K.
b3 U.K.
b4 U.K.
table 3
index | commodity
c1 Soybeans
c2 Soybeans
c3 Soybeans
c4 Soybeans
then I refer them to the main table
index | mcc |year|month|Thousand Ton| sign
1 a1b1c1 2013 1 141 *
2 a1b1c1 2013 2 129 *
3 a1b1c1 2013 3 82.2 *
4 a1b1c1 2013 4 127.6 *
5 a1b1c1 2013 5 120.2 *
6 a1b1c1 2013 6 136 *
7 a1b1c1 2013 7 134.6 *
8 a1b1c1 2013 8 113.2 *
9 a1b1c1 2013 9 135 *
please guide me.
1条答案
按热度按时间7gcisfzg1#
every week I renew the main data
The best way to do that is to build a new table then do
This avoids downtime. And it gives you a chance to validate and cleanse the data before putting it in place.
broke into several tables
Don't. This is usually a bad idea. (Provide more details; you may have an exception.)
several tables
Your layout implies "normalization". Yes, this is often a good idea. However, since your strings are quite short, there is not much advantage.
Based on the data indicated implies a single table would be under 1GB. With suitable indexes, most activity will be quite efficient.
Packing "a1b2c1" will be a nightmare if you need to split it apart; do not do that.
Suggestions:
ENUM
for "STOCK", etc. It works like a string, but takes only 1 byte.country_code CHAR(2) CHARACTER SET ascii
with 'US', 'UK', etc. Only 2 bytes -- same as your "b1", while providing the entire information.TINYINT
,SMALLINT
, etc as appropriate for integers.DECIMAL
forTon
. But first, discover how many decimal places and leading digits you need.To do the Normalization,
LOAD DATA
into a temp table, then do some SQL to convert to the ids. Then do theRENAME
mentioned above. The LOAD and normalization might add up to a minute; the RENAME should take less than a second.We have not looked at your SELECTs; they may need some INDEXes and/or Summary Tables.