MySQL 8.0 Reference Manual(读书笔记53节--Optimizing SQL Statements)
1.Optimizing INFORMATION_SCHEMA Queries
Applications that monitor databases may make frequent【ˈfriːkwənt , friˈkwent 频繁的;经常发生的;】 use of INFORMATION_SCHEMA tables. To write queries for these tables most efficiently, use the following general【ˈdʒenrəl 一般的;总的;普遍的;大致的,大概的(方向或地区);常规的;大体的;整体的;正常的;全体的;首席的;笼统的;概括性的;非专门的;】 guidelines:
• Try to query only INFORMATION_SCHEMA tables that are views on data dictionary tables.
• Try to query only for static metadata. Selecting columns or using retrieval conditions for dynamic metadata along with static metadata adds overhead to process the dynamic metadata.
These INFORMATION_SCHEMA tables are implemented【ˈɪmplɪmentɪd 实施;执行;贯彻;使生效;】 as views on data dictionary tables, so queries on them retrieve information from the data dictionary:
CHARACTER_SETS CHECK_CONSTRAINTS COLLATIONS COLLATION_CHARACTER_SET_APPLICABILITY COLUMNS EVENTS FILES INNODB_COLUMNS INNODB_DATAFILES INNODB_FIELDS INNODB_FOREIGN INNODB_FOREIGN_COLS INNODB_INDEXES INNODB_TABLES INNODB_TABLESPACES INNODB_TABLESPACES_BRIEF INNODB_TABLESTATS KEY_COLUMN_USAGE PARAMETERS PARTITIONS REFERENTIAL_CONSTRAINTS RESOURCE_GROUPS ROUTINES SCHEMATA STATISTICS TABLES TABLE_CONSTRAINTS TRIGGERS VIEWS VIEW_ROUTINE_USAGE VIEW_TABLE_USAGE
Some types of values, even for a non-view INFORMATION_SCHEMA table, are retrieved by lookups from the data dictionary. This includes values such as database and table names, table types, and storage engines.
Some INFORMATION_SCHEMA tables contain columns that provide table statistics:
STATISTICS.CARDINALITY TABLES.AUTO_INCREMENT TABLES.AVG_ROW_LENGTH TABLES.CHECKSUM TABLES.CHECK_TIME TABLES.CREATE_TIME TABLES.DATA_FREE TABLES.DATA_LENGTH TABLES.INDEX_LENGTH TABLES.MAX_DATA_LENGTH TABLES.TABLE_ROWS TABLES.UPDATE_TIME
Those columns represent【ˌreprɪˈzent 代表,表示;(在竞赛或体育赛事中)代表(国家或地区);(标志或符号)意味着,代表着,标志着;描绘,(形象地)表现,描写;代理(个人或团体);代表(个人或团体)出席;使】 dynamic table metadata; that is, information that changes as table contents change.
By default, MySQL retrieves cached values for those columns from the mysql.index_stats and mysql.table_stats dictionary tables when the columns are queried, which is more efficient than retrieving statistics directly from the storage engine. If cached statistics are not available or have expired, MySQL retrieves the latest statistics from the storage engine and caches them in the mysql.index_stats and mysql.table_stats dictionary tables. Subsequent【ˈsʌbsɪkwənt 随后的;之后的;后来的;接后的;】 queries retrieve the cached statistics until the cached statistics expire. A server restart or the first opening of the mysql.index_stats and mysql.table_stats tables do not update cached statistics automatically.
The information_schema_stats_expiry session variable defines the period of time before cached statistics expire. The default is 86400 seconds (24 hours), but the time period can be extended to as much as one year.
To update cached values at any time for a given table, use ANALYZE TABLE.
Querying statistics columns does not store or update statistics in the mysql.index_stats and mysql.table_stats dictionary tables under these circumstances【ˈsɜrkəmˌstænsəz , ˈsɜrkəmˌstænsɪz 环境;条件;状况;境况;境遇;(尤指)经济状况;命运;客观环境;】:
• When cached statistics have not expired.
• When information_schema_stats_expiry is set to 0.
• When the server is in read_only, super_read_only, transaction_read_only, or innodb_read_only mode.
• When the query also fetches Performance Schema data.
information_schema_stats_expiry is a session variable, and each client session can define its own expiration value. Statistics that are retrieved from the storage engine and cached by one session are available to other sessions.
【If the innodb_read_only system variable is enabled, ANALYZE TABLE may fail because it cannot update statistics tables in the data dictionary, which use InnoDB. For ANALYZE TABLE operations that update the key distribution, failure may occur even if the operation updates the table itself (for example, if it is a MyISAM table). To obtain the updated distribution statistics, set information_schema_stats_expiry=0.】
For INFORMATION_SCHEMA tables implemented as views on data dictionary tables, indexes on the underlying data dictionary tables permit the optimizer to construct efficient query execution plans. To see the choices made by the optimizer, use EXPLAIN. To also see the query used by the server to execute an INFORMATION_SCHEMA query, use SHOW WARNINGS immediately following EXPLAIN.
Consider this statement, which identifies【aɪˈdentɪfaɪz 确认;发现;鉴定;显示;找到;认出;说明身份;】 collations for the utf8mb4 character set:
mysql> SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE CHARACTER_SET_NAME = 'utf8mb4'; +----------------------------+ | COLLATION_NAME | +----------------------------+ | utf8mb4_general_ci | | utf8mb4_bin | | utf8mb4_unicode_ci | | utf8mb4_icelandic_ci | | utf8mb4_latvian_ci | | utf8mb4_romanian_ci | | utf8mb4_slovenian_ci | ...
How does the server process that statement? To find out, use EXPLAIN:
mysql> EXPLAIN SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE CHARACTER_SET_NAME = 'utf8mb4'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cs partitions: NULL type: const possible_keys: PRIMARY,name key: name key_len: 194 ref: const rows: 1 filtered: 100.00 Extra: Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: col partitions: NULL type: ref possible_keys: character_set_id key: character_set_id key_len: 8 ref: const rows: 68 filtered: 100.00 Extra: NULL 2 rows in set, 1 warning (0.01 sec)
To see the query used to satisfy that statement, use SHOW WARNINGS:
mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `mysql`.`col`.`name` AS `COLLATION_NAME` from `mysql`.`character_sets` `cs` join `mysql`.`collations` `col` where ((`mysql`.`col`.`character_set_id` = '45') and ('utf8mb4' = 'utf8mb4'))
As indicated by SHOW WARNINGS, the server handles the query on COLLATION_CHARACTER_SET_APPLICABILITY as a query on the character_sets and collations data dictionary tables in the mysql system database.
2. Optimizing Performance Schema Queries
Applications that monitor databases may make frequent use of Performance Schema tables. To write queries for these tables most efficiently, take advantage of their indexes. For example, include a WHERE clause that restricts retrieved rows based on comparison to specific values in an indexed column.
Most Performance Schema tables have indexes. Tables that do not are those that normally contain few rows or are unlikely to be queried frequently. Performance Schema indexes give the optimizer access to execution plans other than full table scans. These indexes also improve performance for related objects, such as sys schema views that use those tables.
To see whether a given Performance Schema table has indexes and what they are, use SHOW INDEX or SHOW CREATE TABLE:
mysql> SHOW INDEX FROM performance_schema.accounts\G *************************** 1. row *************************** Table: accounts Non_unique: 0 Key_name: ACCOUNT Seq_in_index: 1 Column_name: USER Collation: NULL Cardinality: NULL Sub_part: NULL Packed: NULL Null: YES Index_type: HASH Comment: Index_comment: Visible: YES *************************** 2. row *************************** Table: accounts Non_unique: 0 Key_name: ACCOUNT Seq_in_index: 2 Column_name: HOST Collation: NULL Cardinality: NULL Sub_part: NULL Packed: NULL Null: YES Index_type: HASH Comment: Index_comment: Visible: YES mysql> SHOW CREATE TABLE performance_schema.rwlock_instances\G *************************** 1. row *************************** Table: rwlock_instances Create Table: CREATE TABLE `rwlock_instances` ( `NAME` varchar(128) NOT NULL, `OBJECT_INSTANCE_BEGIN` bigint(20) unsigned NOT NULL, `WRITE_LOCKED_BY_THREAD_ID` bigint(20) unsigned DEFAULT NULL, `READ_LOCKED_BY_COUNT` int(10) unsigned NOT NULL, PRIMARY KEY (`OBJECT_INSTANCE_BEGIN`), KEY `NAME` (`NAME`), KEY `WRITE_LOCKED_BY_THREAD_ID` (`WRITE_LOCKED_BY_THREAD_ID`) ) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
To see the execution plan for a Performance Schema query and whether it uses any indexes, use EXPLAIN:
mysql> EXPLAIN SELECT * FROM performance_schema.accounts WHERE (USER,HOST) = ('root','localhost')\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: accounts partitions: NULL type: const possible_keys: ACCOUNT key: ACCOUNT key_len: 278 ref: const,const rows: 1 filtered: 100.00 Extra: NULL
The EXPLAIN output indicates that the optimizer uses the accounts table ACCOUNT index that comprises the USER and HOST columns.
Performance Schema indexes are virtual: They are a construct of the Performance Schema storage engine and use no memory or disk storage. The Performance Schema reports index information to the optimizer so that it can construct efficient execution plans. The Performance Schema in turn uses optimizer information about what to look for (for example, a particular key value), so that it can perform efficient lookups without building actual index structures. This implementation provides two important benefits:
• It entirely avoids the maintenance cost normally incurred for tables that undergo frequent updates.
• It reduces at an early stage of query execution the amount of data retrieved. For conditions on the indexed columns, the Performance Schema efficiently returns only table rows that satisfy the query conditions. Without an index, the Performance Schema would return all rows in the table, requiring that the optimizer later evaluate the conditions against each row to produce the final result.
Performance Schema indexes are predefined and cannot be dropped, added, or altered.
Performance Schema indexes are similar to hash indexes. For example:
• They are used only for equality comparisons that use the = or <=> operators.
• They are unordered. If a query result must have specific row ordering characteristics, include an ORDER BY clause.
3.Optimizing Data Change Statements
This section explains how to speed up data change statements: INSERT, UPDATE, and DELETE. Traditional OLTP applications and modern web applications typically do many small data change operations, where concurrency is vital【ˈvaɪtl 至关重要的,必不可少的;生命的;生命统计的,生死统计的;生气勃勃的,充满生机的;致命的,生死攸关的;对…极重要的;维持生命所必需的;热情洋溢的;】. Data analysis and reporting applications typically run data change operations that affect many rows at once, where the main considerations is the I/O to write large amounts of data and keep indexes up-to-date. For inserting and updating large volumes of data (known in the industry as ETL, for “extract-transform-load”), sometimes you use other SQL statements or external commands, that mimic the effects of INSERT, UPDATE, and DELETE statements.
3.1 Optimizing INSERT Statements
To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
• Connecting: (3)
• Sending query to server: (2)
• Parsing query: (2)
• Inserting row: (1 × size of row)
• Inserting indexes: (1 × number of indexes)
• Closing: (1)
This does not take into consideration the initial【ɪˈnɪʃl 始的;最初的;第一的;】 overhead to open tables, which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.
You can use the following methods to speed up inserts:
• If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
• When loading a table from a text file, use LOAD DATA. This is usually 20 times faster than using INSERT statements.
• Take advantage of the fact that columns have default values. Insert values explicitly【ɪkˈsplɪsətli 明确地;明白地;】 only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.
3.2 Optimizing UPDATE Statements
An update statement is optimized like a SELECT query with the additional overhead of a write. The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated.
Another way to get fast updates is to delay updates and then do many updates in a row later. Performing multiple updates together is much quicker than doing one at a time if you lock the table.
For a MyISAM table that uses dynamic row format, updating a row to a longer total length may split the row. If you do this often, it is very important to use OPTIMIZE TABLE occasionally【əˈkeɪʒnəli 偶尔;偶然;有时候;偶尔地;】.
3.3 Optimizing DELETE Statements
The time required to delete individual rows in a MyISAM table is exactly proportional【prəˈpɔːrʃənl 成比例的;相称的;均衡的;】 to the number of indexes. To delete rows more quickly, you can increase the size of the key cache by increasing the key_buffer_size system variable.
To delete all rows from a MyISAM table, TRUNCATE TABLE tbl_name is faster than DELETE FROM tbl_name. Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock.
4.Optimizing Database Privileges
The more complex your privilege setup, the more overhead【ˌoʊvərˈhed , ˈoʊvərhed 高架的;管理费用的;经费的;头上方的;地面以上的;】 applies to all SQL statements. Simplifying the privileges established by GRANT statements enables MySQL to reduce permission-checking overhead【ˌoʊvərˈhed , ˈoʊvərhed 开销;经常费用;经常开支;(尤指飞机的)顶舱;用于高射投影器的幻灯片;】 when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the tables_priv and columns_priv tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, consider using a simplified grant structure【ˈstrʌktʃər 结构;构造;体系;(尤指)建筑物;结构体;精心组织;周密安排;】 to reduce permission-checking overhead.
5.Other Optimization Tips
This section lists a number of miscellaneous【ˌmɪsəˈleɪniəs 各种各样的;混杂的;】 tips for improving query processing speed:
• If your application makes several database requests to perform related updates, combining the statements into a stored routine can help performance. Similarly, if your application computes a single result based on several column values or large volumes of data, combining the computation into a loadable function can help performance. The resulting fast database operations are then available to be reused by other queries, applications, and even code written in different programming languages.
• To fix any compression issues that occur with ARCHIVE tables, use OPTIMIZE TABLE.
• If possible, classify【ˈklæsɪfaɪ 分类;划分;将…分类;界定;】 reports as “live” or as “statistical”, where data needed for statistical reports is created only from summary tables that are generated periodically from the live data.
• If you have data that does not conform well to a rows-and-columns table structure, you can pack【pæk 包装,装(箱);(把…)打包,(把…)装箱;收拾(行李);堆积,压实;挤满,塞满;备有,含有;(用某物)保存,保藏;佩带,携带(枪支);停止,结束;加工包装 (食品),把】 and store data into a BLOB column. In this case, you must provide code in your application to pack and unpack information, but this might save I/O operations to read and write the sets of related values.
• With Web servers, store images and other binary assets as files, with the path name stored in the database rather than the file itself. Most Web servers are better at caching files than database contents, so using files is generally faster. (Although you must handle backups and storage issues yourself in this case.)
• If you need really high speed, look at the low-level MySQL interfaces. For example, by accessing the MySQL InnoDB or MyISAM storage engine directly, you could get a substantial speed increase compared to using the SQL interface.
Similarly, for databases using the NDBCLUSTER storage engine, you may wish to investigate possible use of the NDB API.
• Replication can provide a performance benefit for some operations. You can distribute【dɪˈstrɪbjuːt 分配;使分布;分发;分销;分散;使散开;】 client retrievals among replicas to split up【splɪt ʌp (使)分手;分开;(使)离婚;拆分;(使)离散;(把…)分成若干较小部分;】 the load. To avoid slowing down the source while making backups, you can make backups using a replica.