How to organise million of records in mysql?

ar5n3qh5  于 2022-12-22  发布在  Mysql
关注(0)|答案(1)|浏览(136)

I am almost new in database. I want to organize my data base I really have no idea how to do it. I want to get the data or insert it in the fastest way. So organizing the table is important for me.
My main table consist more than 6 or 6.3 million records. So the time will be valuable. I should mention that this data will store in mysql and I want to use it in my web application. So the speed is important. every week I renew the main data. Could you do me favor and tell me how to broke it to several tables which could be efficient and fast?
the main data which will import in mysql which is in text file.

index | maket_indicator|country|commodity|year|month|Thousand Ton| sign
1   STOCK   U.K.    Soybeans    2013    1   141     *
2   STOCK   U.K.    Soybeans    2013    2   129     *
3   STOCK   U.K.    Soybeans    2013    3   82.2    *
4   STOCK   U.K.    Soybeans    2013    4   127.6   *
5   STOCK   U.K.    Soybeans    2013    5   120.2   *
6   STOCK   U.K.    Soybeans    2013    6   136     *
7   STOCK   U.K.    Soybeans    2013    7   134.6   *
8   STOCK   U.K.    Soybeans    2013    8   113.2   *
9   STOCK   U.K.    Soybeans    2013    9   135     *
4186    STOCK   U.S.A.  Soybeans    2017    8   14852   *
4187    STOCK   U.S.A.  Soybeans    2017    9   8208.1   
4188    STOCK   U.S.A.  Soybeans    2017    10  99900   *
4189    STOCK   U.S.A.  Soybeans    2017    11  99900   *
4190    STOCK   U.S.A.  Soybeans    2017    12  86019.6  
4191    STOCK   U.S.A.  Soybeans    2018    1   78400   *
4192    STOCK   U.S.A.  Soybeans    2018    2   67500   *
4193    STOCK   U.S.A.  Soybeans    2018    3   57405.8  
4194    STOCK   U.S.A.  Soybeans    2018    4   50540   *
4195    STOCK   U.S.A.  Soybeans    2018    5   43722.8 *
4196    STOCK   U.S.A.  Soybeans    2018    6   33184.7
  • this sign is something that we know what is it. maybe the table has it or maybe not.

market_indicator,country and commodity repeat a lot. I think that maybe I should separated each of them to 3 separated tables and has a separated index. like this:
table 1

index | maket_indicator
a1  STOCK   
a2  STOCK   
a3  STOCK   
a4  STOCK

table 2

index | country
b1  U.K.    
b2  U.K.    
b3  U.K.    
b4  U.K.

table 3

index | commodity
    c1      Soybeans    
    c2      Soybeans    
    c3      Soybeans    
    c4      Soybeans

then I refer them to the main table

index | mcc     |year|month|Thousand Ton| sign
1   a1b1c1      2013    1   141     *
2   a1b1c1      2013    2   129     *
3   a1b1c1      2013    3   82.2    *
4   a1b1c1      2013    4   127.6   *
5   a1b1c1      2013    5   120.2   *
6   a1b1c1      2013    6   136     *
7   a1b1c1      2013    7   134.6   *
8   a1b1c1      2013    8   113.2   *
9   a1b1c1      2013    9   135     *

please guide me.

7gcisfzg

7gcisfzg1#

every week I renew the main data
The best way to do that is to build a new table then do

RENAME TABLE main_table TO old_table,
             new_table TO main_table;
DROP old_table;

This avoids downtime. And it gives you a chance to validate and cleanse the data before putting it in place.
broke into several tables
Don't. This is usually a bad idea. (Provide more details; you may have an exception.)
several tables
Your layout implies "normalization". Yes, this is often a good idea. However, since your strings are quite short, there is not much advantage.
Based on the data indicated implies a single table would be under 1GB. With suitable indexes, most activity will be quite efficient.
Packing "a1b2c1" will be a nightmare if you need to split it apart; do not do that.
Suggestions:

  • Consider an ENUM for "STOCK", etc. It works like a string, but takes only 1 byte.
  • Use the standard country_code CHAR(2) CHARACTER SET ascii with 'US', 'UK', etc. Only 2 bytes -- same as your "b1", while providing the entire information.
  • Use TINYINT , SMALLINT , etc as appropriate for integers.
  • Use a suitable DECIMAL for Ton . But first, discover how many decimal places and leading digits you need.

To do the Normalization, LOAD DATA into a temp table, then do some SQL to convert to the ids. Then do the RENAME mentioned above. The LOAD and normalization might add up to a minute; the RENAME should take less than a second.
We have not looked at your SELECTs; they may need some INDEXes and/or Summary Tables.

相关问题