故障描述
前几天,一个mysql数据库运维同事,在生产上用insert into select from语句,在生产上备份了一张表,结果将备份表全表锁住了,业务影响了大约10分钟。
看到这个语句,我第一反应就是select语句也能锁表,可是生产上的故障,证明确实锁表了。所以,需要将insert into select from获取锁的情况彻底研究明白。
故障复盘
创建模拟表和模拟记录
[root@localhost] 17:39:55 [testdb1]>show create table t_test_1\G;*************************** 1. row *************************** Table: t_test_1Create Table: CREATE TABLE `t_test_1` ( `id` int(11) NOT NULL, `name` char(10) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '', PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci1 row in set (0.00 sec)[root@localhost] 17:40:30 [testdb1]>select * from t_test_1;+----+-------+| id | name |+----+-------+| 1 | trest || 2 | e99e || 3 | test || 4 | fresd || 5 | fsfa |+----+-------+5 rows in set (0.00 sec)[root@localhost] 17:40:17 [testdb1]>show create table t_test_2\G;*************************** 1. row *************************** Table: t_test_2Create Table: CREATE TABLE `t_test_2` ( `id` int(11) NOT NULL, `name` char(10) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '', PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci1 row in set (0.00 sec)
模拟insert into select操作
[root@localhost] 17:41:32 [testdb1]>begin;Query OK, 0 rows affected (0.00 sec)[root@localhost] 17:41:33 [testdb1]>insert into t_test_2 select * from t_test_1 where name like 'trest';Query OK, 1 row affected (0.02 sec)Records: 1 Duplicates: 0 Warnings: 0
获取innodb的lock信息
[root@localhost] 17:42:00 [(none)]>show engine innodb status\G;TRANSACTIONS------------Trx id counter 182551Purge done for trx's n:o < 182551 undo n:o < 0 state: running but idleHistory list length 0LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 421524582451936, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 182546, ACTIVE 20 sec3 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 1MySQL thread id 7, OS thread handle 140049254979328, query id 82 localhost root
从innodb引擎获取的lock信息,太少了,只能看到有3 lock struct(s),6 row lock(s),不清楚那表申请的锁,申请什么类型的锁,不知道这些信息,就研究不明白故障到底怎么发生的。
幸运的是,mysql数据库提供一个参数innodb_status_output_locks,可以打印更详细的lock信息。
启用innodb_status_output_locks参数
启用innodb_status_output_locks参数,默认是不开启,所以需要开启。
[root@localhost] 17:31:12 [(none)]>show variables like 'innodb_status_output_locks';+----------------------------+-------+| Variable_name | Value |+----------------------------+-------+| innodb_status_output_locks | OFF |+----------------------------+-------+1 row in set (0.00 sec)[root@localhost] 17:47:41 [(none)]>set global innodb_status_output_locks=on;Query OK, 0 rows affected (0.00 sec)[root@localhost] 17:47:41 [(none)]>show variables like 'innodb_status_output_locks';+----------------------------+-------+| Variable_name | Value |+----------------------------+-------+| innodb_status_output_locks | ON |+----------------------------+-------+1 row in set (0.00 sec)
获取innodb的lock详细信息
下面是开启innodb_status_output_locks参数之后,获取的详细lock信息
[root@localhost] 17:48:28 [(none)]>show engine innodb status\G;TRANSACTIONS------------Trx id counter 182552Purge done for trx's n:o < 182551 undo n:o < 0 state: running but idleHistory list length 0LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 421524582451936, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 182551, ACTIVE 5 sec3 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 1MySQL thread id 11, OS thread handle 140049254979328, query id 100 localhost rootTABLE LOCK table `testdb1`.`t_test_1` trx id 182551 lock mode ISRECORD LOCKS space id 97 page no 3 n bits 72 index PRIMARY of table `testdb1`.`t_test_1` trx id 182551 lock mode SRecord lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 00000002c710; asc ;; 2: len 7; hex af000000310110; asc 1 ;; 3: len 10; hex 74726573742020202020; asc trest ;;Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000002; asc ;; 1: len 6; hex 00000002c710; asc ;; 2: len 7; hex af00000031011c; asc 1 ;; 3: len 10; hex 65393965202020202020; asc e99e ;;Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 6; hex 00000002c710; asc ;; 2: len 7; hex af000000310128; asc 1 (;; 3: len 10; hex 74657374202020202020; asc test ;;Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000004; asc ;; 1: len 6; hex 00000002c710; asc ;; 2: len 7; hex af000000310134; asc 1 4;; 3: len 10; hex 66726573642020202020; asc fresd ;;Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 6; hex 00000002c710; asc ;; 2: len 7; hex af000000310140; asc 1 @;; 3: len 10; hex 66736661202020202020; asc fsfa ;;TABLE LOCK table `testdb1`.`t_test_2` trx id 182551 lock mode IX
从上面的信息,可以很清晰看到,t_test_1获取到IS锁,并且有5个Record lock信息,即锁了5条记录,而此表只有5条记录,所以锁全表。
TABLE LOCK table testdb1.t_test_1 trx id 182551 lock mode IS
锁全表解决方案
insert into t_test_2 select * from t_test_1 where name like 'trest';这个sql语句中,t_test_1表的name字段没有索引,索引走了全表扫描,如果在name字段创建索引呢,会有什么变化呢
创建索引
[root@localhost] 17:54:33 [testdb1]>alter table t_test_1 add index idx_t_test_1_name (name);Query OK, 0 rows affected (0.24 sec)Records: 0 Duplicates: 0 Warnings: 0[root@localhost] 17:54:52 [testdb1]>begin;Query OK, 0 rows affected (0.00 sec)[root@localhost] 17:54:55 [testdb1]>insert into t_test_2 select * from t_test_1 where name like 'trest';Query OK, 1 row affected (0.00 sec)Records: 1 Duplicates: 0 Warnings: 0
重新获取innodb的lock详细信息
TRANSACTIONS------------Trx id counter 182565Purge done for trx's n:o < 182565 undo n:o < 0 state: running but idleHistory list length 0LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 421524582451936, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 182560, ACTIVE 3 sec3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1MySQL thread id 11, OS thread handle 140049254979328, query id 105 localhost rootTABLE LOCK table `testdb1`.`t_test_1` trx id 182560 lock mode ISRECORD LOCKS space id 97 page no 4 n bits 72 index idx_t_test_1_name of table `testdb1`.`t_test_1` trx id 182560 lock mode SRecord lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 10; hex 74726573742020202020; asc trest ;; 1: len 4; hex 80000001; asc ;;TABLE LOCK table `testdb1`.`t_test_2` trx id 182560 lock mode IX
看到没有,在这里,现在只有一个Record lock,不再是锁全表了。
此故障分析未完,待续。