MySQL 8.0 Reference Manual(读书笔记50节--Optimizing SELECT Statements(5))
15 IS NULL Optimization
MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.
Examples:
SELECT * FROM tbl_name WHERE key_col IS NULL; SELECT * FROM tbl_name WHERE key_col <=> NULL; SELECT * FROM tbl_name WHERE key_col=const1 OR key_col=const2 OR key_col IS NULL;
If a WHERE clause includes a col_name IS NULL condition for a column that is declared as NOT NULL, that expression is optimized away. This optimization does not occur in cases when the column might produce NULL anyway (for example, if it comes from a table on the right side of a LEFT JOIN).
MySQL can also optimize the combination col_name = expr OR col_name IS NULL, a form that is common in resolved subqueries. EXPLAIN shows ref_or_null when this optimization is used.
This optimization can handle one IS NULL for any key part.
Some examples of queries that are optimized, assuming that there is an index on columns a and b of table t2:
SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL; SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL; SELECT * FROM t1, t2 WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b; SELECT * FROM t1, t2 WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL); SELECT * FROM t1, t2 WHERE (t1.a=t2.a AND t2.a IS NULL AND ...) OR (t1.a=t2.a AND t2.a IS NULL AND ...);
ref_or_null works by first doing a read on the reference key, and then a separate search for rows with a NULL key value.
The optimization can handle only one IS NULL level. In the following query, MySQL uses key lookups only on the expression (t1.a=t2.a AND t2.a IS NULL) and is not able to use the key part on b:
SELECT * FROM t1, t2 WHERE (t1.a=t2.a AND t2.a IS NULL) OR (t1.b=t2.b AND t2.b IS NULL);
16 ORDER BY Optimization
This section describes when MySQL can use an index to satisfy an ORDER BY clause, the filesort operation used when an index cannot be used, and execution plan information available from the optimizer about ORDER BY.
An ORDER BY with and without LIMIT may return rows in different orders.
16.1 Use of Indexes to Satisfy ORDER BY
In some cases, MySQL may use an index to satisfy an ORDER BY clause and avoid the extra sorting involved in performing a filesort operation.
The index may also be used even if the ORDER BY does not match the index exactly, as long as all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause. If the index does not contain all columns accessed by the query, the index is used only if index access is cheaper than other access methods.
Assuming that there is an index on (key_part1, key_part2), the following queries may use the index to resolve【rɪˈzɑːlv 解决(问题或困难);决心;决定;表决;作出决定;作出决议;】 the ORDER BY part. Whether the optimizer actually does so depends on whether reading the index is more efficient than a table scan if columns not in the index must also be read.
• In this query, the index on (key_part1, key_part2) enables the optimizer to avoid sorting:
SELECT * FROM t1 ORDER BY key_part1, key_part2;
However, the query uses SELECT *, which may select more columns than key_part1 and key_part2. In that case, scanning an entire index and looking up table rows to find columns not in the index may be more expensive than scanning the table and sorting the results. If so, the optimizer probably does not use the index. If SELECT * selects only the index columns, the index is used and sorting avoided.
If t1 is an InnoDB table, the table primary key is implicitly【ɪmˈplɪsətli 含蓄地;无保留地;暗中地;不明显地;无疑问地;】 part of the index, and the index can be used to resolve the ORDER BY for this query:
SELECT pk, key_part1, key_part2 FROM t1 ORDER BY key_part1, key_part2;
• In this query, key_part1 is constant, so all rows accessed through the index are in key_part2 order, and an index on (key_part1, key_part2) avoids sorting if the WHERE clause is selective【sɪˈlektɪv 选择性的;有选择的;严格筛选的;认真挑选的;】 enough to make an index range scan cheaper than a table scan:
SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2;
• In the next two queries, whether the index is used is similar to the same queries without DESC shown previously:
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 DESC; SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2 DESC;
• Two columns in an ORDER BY can sort in the same direction (both ASC, or both DESC) or in opposite directions (one ASC, one DESC). A condition for index use is that the index must have the same homogeneity【ˌhɑːmədʒəˈniːəti 同质;同质性;同种;】, but need not have the same actual direction.
If a query mixes ASC and DESC, the optimizer can use an index on the columns if the index also uses corresponding mixed ascending and descending columns:
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
The optimizer can use an index on (key_part1, key_part2) if key_part1 is descending and key_part2 is ascending. It can also use an index on those columns (with a backward scan) if key_part1 is ascending and key_part2 is descending.
• In the next two queries, key_part1 is compared to a constant. The index is used if the WHERE clause is selective enough to make an index range scan cheaper than a table scan:
SELECT * FROM t1 WHERE key_part1 > constant ORDER BY key_part1 ASC; SELECT * FROM t1 WHERE key_part1 < constant ORDER BY key_part1 DESC;
• In the next query, the ORDER BY does not name key_part1, but all rows selected have a constant key_part1 value, so the index can still be used:
SELECT * FROM t1 WHERE key_part1 = constant1 AND key_part2 > constant2 ORDER BY key_part2;
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it may still use indexes to find the rows that match the WHERE clause. Examples:
• The query uses ORDER BY on different indexes:
SELECT * FROM t1 ORDER BY key1, key2;
• The query uses ORDER BY on nonconsecutive【不相邻的;】 parts of an index:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1_part1, key1_part3;
• The index used to fetch the rows differs from the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
• The query uses ORDER BY with an expression that includes terms other than the index column name:
SELECT * FROM t1 ORDER BY ABS(key); SELECT * FROM t1 ORDER BY -key;
• The query joins many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
• The query has different ORDER BY and GROUP BY expressions.
• There is an index on only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if only the first 10 bytes of a CHAR(20) column are indexed, the index cannot distinguish values past the 10th byte and a filesort is needed.
• The index does not store rows in order. For example, this is true for a HASH index in a MEMORY table.
Availability of an index for sorting may be affected by the use of column aliases. Suppose that the column t1.a is indexed. In this statement, the name of the column in the select list is a. It refers to t1.a, as does the reference to a in the ORDER BY, so the index on t1.a can be used:
SELECT a FROM t1 ORDER BY a;
In this statement, the name of the column in the select list is also a, but it is the alias name. It refers to ABS(a), as does the reference to a in the ORDER BY, so the index on t1.a cannot be used:
SELECT ABS(a) AS a FROM t1 ORDER BY a;
In the following statement, the ORDER BY refers to a name that is not the name of a column in the select list. But there is a column in t1 named a, so the ORDER BY refers to t1.a and the index on t1.a can be used. (The resulting sort order may be completely different from the order for ABS(a), of course.)
SELECT ABS(a) AS b FROM t1 ORDER BY a;
Previously (MySQL 5.7 and lower), GROUP BY sorted implicitly under certain conditions. In MySQL 8.0, that no longer occurs, so specifying ORDER BY NULL at the end to suppress implicit sorting (as was done previously) is no longer necessary. However, query results may differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.
16.2 Use of filesort to Satisfy ORDER BY
If an index cannot be used to satisfy an ORDER BY clause, MySQL performs a filesort operation that reads table rows and sorts them. A filesort constitutes【ˈkɑːnstɪtuːts 构成;组成;(合法或正式地)成立,设立;被算作;(被认为或看做)是;】 an extra sorting phase in query execution.
To obtain memory for filesort operations, as of MySQL 8.0.12, the optimizer allocates memory buffers incrementally as needed, up to the size indicated by the sort_buffer_size system variable, rather than allocating a fixed amount of sort_buffer_size bytes up front, as was done prior to MySQL 8.0.12. This enables users to set sort_buffer_size to larger values to speed up larger sorts, without concern【kənˈsɜːrn 与…有关;涉及;影响,牵涉(某人);让(某人)担忧;(对…)感兴趣;认为(做某事)重要;】 for excessive memory use for small sorts. (This benefit may not occur for multiple concurrent sorts on Windows, which has a weak multithreaded malloc.)
A filesort operation uses temporary disk files as necessary if the result set is too large to fit in memory. Some types of queries are particularly suited to completely in-memory filesort operations. For example, the optimizer can use filesort to efficiently handle in memory, without temporary files, the ORDER BY operation for queries (and subqueries) of the following form:
SELECT ... FROM single_table ... ORDER BY non_index_column [DESC] LIMIT [M,]N;
Such queries are common in web applications that display only a few rows from a larger result set. Examples:
SELECT col1, ... FROM t1 ... ORDER BY name LIMIT 10; SELECT col1, ... FROM t1 ... ORDER BY RAND() LIMIT 15;
16.3 Influencing ORDER BY Optimization
For slow ORDER BY queries for which filesort is not used, try lowering the max_length_for_sort_data system variable to a value that is appropriate【əˈproʊpriət , əˈproʊprieɪt】 to trigger a filesort. (A symptom of setting the value of this variable too high is a combination of high disk activity and low CPU activity.) This technique applies only before MySQL 8.0.20. As of 8.0.20, max_length_for_sort_data is deprecated due to optimizer changes that make it obsolete and of no effect.
To increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, try the following strategies:
• Increase the sort_buffer_size variable value. Ideally【aɪˈdiːəli 完美地;合乎理想地;作为理想的做法;最适合地;绝好地;】, the value should be large enough for the entire result set to fit in the sort buffer (to avoid writes to disk and merge passes).
Take into account that the size of column values stored in the sort buffer is affected by the max_sort_length system variable value. For example, if tuples store values of long string columns and you increase the value of max_sort_length, the size of sort buffer tuples increases as well and may require you to increase sort_buffer_size.
To monitor the number of merge passes (to merge temporary files), check the Sort_merge_passes status variable.
• Increase the read_rnd_buffer_size variable value so that more rows are read at a time.
• Change the tmpdir system variable to point to a dedicated file system with large amounts of free space. The variable value can list several paths that are used in round-robin fashion; you can use this feature to spread the load across several directories. Separate the paths by colon characters (:) on Unix and semicolon characters (;) on Windows. The paths should name directories in file systems located on different physical disks, not different partitions on the same disk.
16.4 ORDER BY Execution Plan Information Available
With EXPLAIN , you can check whether MySQL can use indexes to resolve an ORDER BY clause:
• If the Extra column of EXPLAIN output does not contain Using filesort, the index is used and a filesort is not performed.
• If the Extra column of EXPLAIN output contains Using filesort, the index is not used and a filesort is performed.
In addition, if a filesort is performed, optimizer trace output includes a filesort_summary block. For example:
"filesort_summary": { "rows": 100, "examined_rows": 100, "number_of_tmp_files": 0, "peak_memory_used": 25192, "sort_mode": "<sort_key, packed_additional_fields>" }
peak_memory_used indicates the maximum memory used at any one time during the sort. This is a value up to but not necessarily as large as the value of the sort_buffer_size system variable. Prior to MySQL 8.0.12, the output shows sort_buffer_size instead, indicating the value of sort_buffer_size. (Prior to MySQL 8.0.12, the optimizer always allocates sort_buffer_size bytes for the sort buffer. As of 8.0.12, the optimizer allocates sort-buffer memory incrementally, beginning with a small amount and adding more as necessary, up to sort_buffer_size bytes.)
The sort_mode value provides information about the contents of tuples in the sort buffer:
• <sort_key, rowid="">: This indicates that sort buffer tuples are pairs that contain the sort key value and row ID of the original table row. Tuples are sorted by sort key value and the row ID is used to read the row from the table.
• <sort_key, additional_fields="">: This indicates that sort buffer tuples contain the sort key value and columns referenced by the query. Tuples are sorted by sort key value and column values are read directly from the tuple.
• <sort_key, packed_additional_fields="">: Like the previous variant, but the additional columns are packed tightly together instead of using a fixed-length encoding.
EXPLAIN does not distinguish whether the optimizer does or does not perform a filesort in memory. Use of an in-memory filesort can be seen in optimizer trace output. Look for filesort_priority_queue_optimization.
17 GROUP BY Optimization
The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive【kənˈsekjətɪv 连续的;连续不断的;】, and then use this temporary table to discover groups and apply aggregate functions (if any). In some cases, MySQL is able to do much better than that and avoid creation of temporary tables by using index access.
The most important preconditions【ˌprikənˈdɪʃənz 前提;先决条件;】 for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE index, but not for a HASH index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.
There are two ways to execute a GROUP BY query through index access, as detailed in the following sections. The first method applies the grouping operation together with all range predicates (if any). The second method first performs a range scan, and then groups the resulting tuples.
• Loose Index Scan • Tight Index Scan
Loose Index Scan can also be used in the absence【ˈæbsəns 缺席;缺乏;不存在;不在;】 of GROUP BY under some conditions.
17.1 Loose Index Scan
The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions. This access method considers only a fraction【ˈfrækʃn 小部分;分数;小数;少量;一点儿;】 of the keys in an index, so it is called a Loose Index Scan. When there is no WHERE clause, a Loose Index Scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys. If the WHERE clause contains range predicates , a Loose Index Scan looks up the first key of each group that satisfies the range conditions, and again reads the smallest possible number of keys. This is possible under the following conditions:
• The query is over a single table.
• The GROUP BY names only columns that form a leftmost prefix of the index and no other columns. (If, instead of GROUP BY, the query has a DISTINCT clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) For example, if a table t1 has an index on (c1,c2,c3), Loose Index Scan is applicable if the query has GROUP BY c1, c2. It is not applicable if the query has GROUP BY c2, c3 (the columns are not a leftmost prefix) or GROUP BY c1, c2, c4 (c4 is not in the index).
• The only aggregate functions used in the select list (if any) are MIN() and MAX(), and all of them refer to the same column. The column must be in the index and must immediately follow the columns in the GROUP BY.
• Any other parts of the index than those from the GROUP BY referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument of MIN() or MAX() functions.
• For columns in the index, full column values must be indexed, not just a prefix. For example, with c1 VARCHAR(20), INDEX (c1(10)), the index uses only a prefix of c1 values and cannot be used for Loose Index Scan.
If Loose Index Scan is applicable to a query, the EXPLAIN output shows Using index for group-by in the Extra column.
Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The Loose Index Scan access method can be used for the following queries:
SELECT c1, c2 FROM t1 GROUP BY c1, c2; SELECT DISTINCT c1, c2 FROM t1; SELECT c1, MIN(c2) FROM t1 GROUP BY c1; SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2; SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2; SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2; SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;
The following queries cannot be executed with this quick select method, for the reasons given:
• There are aggregate functions other than MIN() or MAX():
SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
• The columns in the GROUP BY clause do not form a leftmost prefix of the index:
SELECT c1, c2 FROM t1 GROUP BY c2, c3;
• The query refers to a part of a key that comes after the GROUP BY part, and for which there is no equality with a constant:
SELECT c1, c3 FROM t1 GROUP BY c1, c2;
Were the query to include WHERE c3 = const, Loose Index Scan could be used.
The Loose Index Scan access method can be applied to other forms of aggregate function references in the select list, in addition to the MIN() and MAX() references already supported:
• AVG(DISTINCT), SUM(DISTINCT), and COUNT(DISTINCT) are supported. AVG(DISTINCT) and SUM(DISTINCT) take a single argument. COUNT(DISTINCT) can have more than one column argument.
• There must be no GROUP BY or DISTINCT clause in the query.
• The Loose Index Scan limitations described previously still apply.
Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The Loose Index Scan access method can be used for the following queries:
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1; SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;
17.2 Tight Index Scan
A Tight Index Scan may be either a full index scan or a range index scan, depending on the query conditions.
When the conditions for a Loose Index Scan are not met, it still may be possible to avoid creation of temporary tables for GROUP BY queries. If there are range conditions in the WHERE clause, this method reads only the keys that satisfy these conditions. Otherwise, it performs an index scan. Because this method reads all keys in each range defined by the WHERE clause, or scans the whole index if there are no range conditions, it is called a Tight Index Scan. With a Tight Index Scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found.
For this method to work, it is sufficient that there be a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY key. The constants from the equality conditions fill in any “gaps” in the search keys so that it is possible to form complete prefixes of the index. These index prefixes then can be used for index lookups. If the GROUP BY result requires sorting, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order.
Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The following queries do not work with the Loose Index Scan access method described previously, but still work with the Tight Index Scan access method.
• There is a gap in the GROUP BY, but it is covered by the condition c2 = 'a':
SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
• The GROUP BY does not begin with the first part of the key, but there is a condition that provides a constant for that part:
SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;