Tecdoc queries and structure manual - version 2

Материал из MstarProject Manual
Версия от 07:56, 10 июня 2013; Mstar (обсуждение | вклад)

(разн.) ← Предыдущая | Текущая версия (разн.) | Следующая → (разн.)
Перейти к: навигация, поиск

MySQL installation and operation guide for TecDoc database

=====================================


ATTENTION: It is required to optimize MySQL settings before upload of such a large database into MySQL

          - in some cases it allows to reduce the upload time from several days to several hours! 
          See section "MySQL otimization" below.

Folders and files structure


   base.full/ - SQL-full database dump (for all countries and in all languages)
   base.noctm/ - SQL-full database dump (for all countries and in all languages)without coloumn of parts delivery availability to different countries - after load in MySQL the size is 4 times less than base.full!
   base.europe/ - SQL-database dump, filtered by a group of countries "Europe" (code 248) in all languages - after upload in MySQL the size is 4 times less than base.full! 
   base.russia/ - SQL-database dump, filtered by a group of countries "Former USSR countries" (code 204) and Russian languauge (code 16) - after upload in MySQL the size is 5 times less than base.full!                   
   distributive/ - TecDoc catalog distributive (in the form of DVD images), which data was extracted from, crack works fine for Windows XP/Vista/7/2003/2008/2008 R2.
   queries/ - samples of SQL-queries for MySQL database.
   queries/Filter_Country_Language/ - Database downsizing script: filters data by country and/or languauge. Allows to get from base.full database your own analog of base.russia database for any country/languauge combination. 
   addons/count_rows - number of rows in each table of all database versions (may be used for checkup: whether all data has been loaded?).
   
   addons/models.php - PHP-sample output of all models of the brand ALFA ROMEO (based on sample SQL-query MODELS.sql)  
   addons/SEARCH_TREE4.xlsx - table with a tree of all spare parts categories.
   addons/tecdoc_schema.png - flowchart of the main TecDoc tables.
   tecdoc_tables.txt - main TecDoc tables description.
   useful_tables.txt - list of tables, used in my sample SQL-queries - will allow you to not to upload unnecessary tables into the databes.

MySQL Database load procedure


1. Download necessary files

2. Check archives' consistency by MD5 checksums

3. Configure and restart MySQL

4. Create an empty database

5. Create empty tables by SQL-table generating script

6. Unpack and load tables' data from the SQL-dump

Upload of SQL-dumps into MySQL


Tables structure and the data itself can be found in different files. The data is also divided into several files - in accordance with the tables, which it is loaded into:

   tecdoc.Version.BaseType.structure.mysql.StorageType.sql - SQL-scripts to create the structure of all

the tables with all indexes, necessary for rapid sample SQL-queries execution.

   tecdoc.Version.BaseType.structure.mysql.StorageType.nokeys.sql - SQL-scripts to create the structure

of all the tables with a minimum set of indexes.

   data/tecdoc.Version.BaseType.data.TableName.7z - Archives, containing tables data.
   Here StorageType - InnoDB or MyISAM (detailed description can be found below).
   Do not try to upload SQL-dumps of this size through phpmyadmin - it is useless.

Use native console utility mysql (or mysql.exe for Windows) - comes with DBMS MySQL. In Windows running in command prompt, in UNIX - in SSH-console. If you don't have SSH-access to hosting, try PHP-script bigdump: http://www.ozerov.de/bigdump/ (anyway console mysql-client is still much faster)

   To create empty tables, run the following command in a folder with SQL-script:

mysql -uLogin -pPassword BaseName <tecdoc.Version.BaseType.structure.mysql.StorageType.sql

   To load all .sql-files with a dump from the current folder, run the following command

in this folder: For UNIX: cat *.sql | mysql -uLogin -pPassword BaseName For Windows: type *.sql | mysql -uLogin -pPassword BaseName

   If you want to upload into MySQL not all but only some of the tables, you need not download

all data archives, choose only necessary ones. Remove unnecessary tables from SQL-table generation script as well, or execute all of it first and then delete unnecessary empty tables from MySQL.

   You won't surely need ALL TecDoc tables. For instance to execute all of my sample SQL-queries

30 tables, listed in the file useful_tables.txt, are enough, comparing to a total of 77 TecDoc tables! Besides, if you don't use all of the sample SQL-queries, you may choose even a smaller set out of these 30, looking at their short descriptions in that same file useful_tables.txt.

   If you need only crosses from TecDoc (cross links for analog spare parts search), then the

following tables will be enough:

ART_LOOKUP
ARTICLES
BRANDS
DES_TEXTS
DESIGNATIONS
SUPPLIERS
   And the following sample SQL-queries will be also enough:
ART_LOOKUP.sql
ART_LOOKUP2.sql
   Extracting archives
   -------------------------------
   The database is packed by 7-Zip archiver. You can download it here:     

For Windows: http://7-zip.org/download.html (GUI + console version) For Linux/UNIX (p7zip): http://sourceforge.net/projects/p7zip/files/ (console version only)

   p7zip is available both in the compiled version (for Linux x86: p7zip_*_x86_linux_bin.tar.bz2)

and source codes (p7zip_*_src_all.tar.bz2), which can be compiled for any other platform. Standard bundles of most OS (operating systems) distributives already contain a version of p7zip specifically compiled for this OS. The bundle is usually called p7zip or p7zip-full (for extracting .7z archives any of the above is enough).

   In some Linux-systems instead of the name 7z it is necessary to use 7za - keep this in mind in

all the examples below.

   If you want to extract with p7zip not only SQL-dump of the database, but also pictures, my advise is

to use only versions 4.58-4.65, as they unpack large number of small files considerably faster, than all the other ones (including latest beta-versions 9.xx) - up to 5 times faster! There is no difference in extraction speed between different versions of 7-Zip for Windows.

   Extract from the command line this way:

7z x ArchiveName (or the name of the first part of the archive - with extension .001)

   It is possible to load SQL-dumps into MySQL with 7-Zip extraction "on the fly" - my advise is to

do this way, because it is faster and requires less disk space. To upload data into MySQL from all of the downloaded archives at once in the current folder use the following command: 7z x -so "*.7z" | mysql -uLogin -pPassword BaseName

   For Windows the full paths to files 7z.exe and mysql.exe should be specified, enclosing them in

"double quotes" if the paths contain spaces.

   The tables are uploaded into MySQL in UTF-8.
   Upload of all the tables takes 3-12 hours with a minimum set of indexes and 6-24 hours with the

full set of indexes. If you choose the minimum option, then all additional indexes, necessary for fast sample SQL-queries execution, can be further added from queries/KEYS.sql file, although in this case additional indexes generation will take the same 5 hours, which you initially may save during upload of the tables.

   Archive consistency check (MD5)
   -----------------------------------------------------
   As the archives are large and will be extracted for a fairly long time (especially if data is directly

uploaded into MySQL database during extraction) not to run into a mistake somewhere in the middle of the process, I advise to check downloaded files for consistency using the file dir.md5 first. It contains MD5 checksums for all archives.

   MD5 check can be performed by:

1. Built-in Total Commander tools (for Windows) 2. Command line utility md5sum.

   md5sum for Windows can be downloaded here: http://etree.org/cgi-bin/counter.cgi/software/md5sum.exe
   md5sum for Linux/UNIX is a part of GNU Coreutils: http://www.gnu.org/software/coreutils/

which are initially available in all Linux/UNIX distributives.

   To launch type:

md5sum.exe -c dir.md5 & pause

   If it doesn't return the line WARNING after launch, then everything is OK.
   MySQL optimizing
   -----------------------------
   I recommend that you should use InnoDB storage, as it allows you to cache in memory not only indexes, but

also table data. If needed, you can upload everything into MyISAM as well, using the appropriate SQL- table structure generating script.

   All of the following parameters are specified in [mysqld] file section:

my.cnf (for Linux/UNIX) my.ini (for Windows)

  It is necessary to restart MySQL after change of parameters in  this file. 
   InnoDB storage optimizing
   -------------------------------------------
   If you upload in InnoDB storage, the main setting that affect the upload (as well as 

further database operation): innodb_buffer_pool_size = 1024M It allocates 1024 Mb of shared cache for data and indexes in InnoDB including operations of indexes creation. In this case MySQL total memory usage is about 1.5 Gb RAM. If not enough memory is available, then allocate at least 512 MB for this parameter.

   Using InnoDB it is also useful to add the following parameter:  

innodb_file_per_table = 1

   It makes MySQL save each table of InnoDB storage into a separate file with the following name pattern

"BaseName/TableName.ibd". Otherwise, all of the tables are located in a shared file "ibdata1", which only grows and never downsizes - even when tables and databses are deleted!

   It is necessary to restart MySQL after settings change.
   If you had InnoDB tables in MySQL before, they'd remain in file "ibdata1" after the change of the parameter-

only tables created afterwards appear in separate files. If you are apt to to move the existing InnoDB tables into separate files, use the following SQL-command: ALTER TABLE BaseName.TableName ENGINE=InnoDB; for each of your existing tables. Next exit MySQL and delete the file "ibdata1". During the next start of MySQL the default empty 10 MB file "ibdata1" will be created, and all the tables will be in separate files.

   MyISAM storage optimizing
   --------------------------------------------
   If  you upload into MyISAM stoarage, use the following settings:
   Indexes cache - may not be used for indexes generation, but is surely used for futher operations with tables:

key_buffer_size = 128M

   RAM buffer for operations of indexes generation:

myisam_sort_buffer_size = 512M

   It is better to set a fairly large size of the temporary file on your hard drive for index generation, or else the

indexes are generated somehow very slowly via keycache: myisam_max_sort_file_size = 10G

   SSH-connection break during Linux/UNIX data upload 
   ----------------------------------------------------------------------------------
   There are 2 ways to settle  the issue:

1. You may run SSH commands in the background, adding at the end: >/dev/null 2>&1 & E.g.: 7za x "*.7z" >/dev/null 2>&1 & Then you may separate this background process from the current SSH session with the command: disown %1 In detail Linux console background processes management is described here: http://clinuxworld.com/ubuntu/36-administration/278-running-processes-in-the-background-and-foreground

2. There is such a console utility in Linux: screen It creates its own virtual session, separate from the current SSH-connection session. It also allows to create multiple virtual sessions and switch between them as between windows. Moreover, if the SSH-session breaks, all of your usual processes continue running in screen. Under next SSH-connection you may go back to any of the previously opened sessions and control how they are running, interrupt processes etc. the main thing is not to forget about your open screen-sessions and close them, when they are no longer needed, with the command 'exit'. If 'screen' is not installed in your OS, in all OS distributives it can be installed from a standard set of 'packages' or 'ports'. This utility is described in detail here: http://help.ubuntu.ru/wiki/screen

   Choosing hosting
   ----------------
   It is recommended to use either dedicated server or shared dedicated server (VPS) for the database,as

it loads up a hosting heavily during data upload and update, and requires additional MySQL configuration as well.

   The most crucial is the amount of RAM. 

Recommended: 2 GB RAM, 1.5 GB of which allocated for MySQL. Minimum required: 2 GB RAM, 700 MB for MySQL. It will run, but it will take a long time to upload/update data.

   You may also use the database remotely from my MySQL server - in this case the process of initial upload and further updates is simplified, and the requirements for your web hosting are reduced: in most cases cheap shared 1-5 GB hosting will be enough for the site. 


   Sample SQL-queries
   -----------------------
   If it is not clear, how any of the sample SQL-queries works, add the word EXPLAIN before it, e.g.:

EXPLAIN SELECT * FROM LANGUAGES; It often helps to understand the structure of a complex query: how tables are related to each other and exactly how MySQL processes the query. It is usually clear from query decryption, which additional indexes can speed up the query, if it is executed slowly.

   You can find samples of 22 popular queries to the received MySQL database with detailed comments in 'queries' folder:

ART_LOOKUP.sql - Product search (from any manufacturer) by specified original/non-original commercial number (article)

                It is used to specify the manufacturer, for further analog search by a specific combination of:  Article+Manufacturer

ART_LOOKUP2.sql - Analog search (original/non-original/commercial) by any specified article (original/ non-original/commercial) ARTICLES.sql - Displays detailed information for a given non-original product (ART_ID):

   Product Overview
   Criteria (features)
   Additional product information
   Analog articles (original/non-original/commercial)
   File paths for pictures
   File paths for vendors logos
   PDF-files paths
   Applicability to vehicles
   Recommended prices

ARTICLES_SEARCH.sql - Search of non-original products by category with the specified text in name, suitable for a specified car (TYP_ID)

COUNTRIES.sql - Displays a list of all TecDoc countries and groups of countries

MANUFACTURERS.sql - Displays all car brands

MODELS.sql - Displays a list of models for a specified make of car (MFA_ID)

SEARCH_TREE.sql - Builds the product category tree for a specified make of car (TYP_ID)

SEARCH_TREE2.sql - Displays a list of non-original products for a specified make of car (TYP_ID) and category (STR_ID)

SEARCH_TREE3.sql - Displays a list of non-original products for a specified category (STR_ID)

SEARCH_TREE4.sql - Displays the full category tree for all spare parts

SEARCH_TREE4.xlsx - A table with the result of the same name query for the current version of TecDoc

SUPPLIER_COUNTRIES.sql - Displays the country of the specified non-original manufacturer (SUP_ID)

TYPE_NUMBERS.sql - Displays a list of makes of cars for a specified KBA-number (used in Germany)

TYPES.sql - Displays a list of makes of cars for a specified model(MOD_ID)

There is a LIMIT 100 restriction in some of the samples to display only the first 100 found matches. If you want to see all the matches - remove this restriction. These samples are more often used in chain, placing the values, obtained in each previous step, into initial data of the next one:

1. Select via the tree of cars and parts categories:

MANUFACTURERS.sql -> MODELS.sql -> TYPES.sql -> SEARCH_TREE.sql (несколько раз) -> SEARCH_TREE2.sql -> ARTICLES.sql

2. Direct article search (including search by the analog number):

ART_LOOKUP.sql -> ART_LOOKUP2.sql -> ARTICLES.sql

Add-ins:

FUNCTIONS.sql - Additional MySQL-functions and examples of their use

KEYS.sql - Adding supplementary indexes to tables for sample queries speeding-up. It is only necessary, if you initially uploaded tables with minimum set of indexes.