Ask MySQL Geek LJ...
Jul. 3rd, 2010 02:15 pmOkay LJ folks. Riddle me this one:
select * from queue_encoder
where
status='ready' and
machine_name IS NULL
order by
urgent_flag desc ,
timestamp_enqueued DESC
limit 10
yacht-2:~ dbs$ time mysql -uroot cms < urgentfirst.sql
[...]
real 0m0.037s
explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE queue_encoder range index_1,index_3,compound_1,compound_2 compound_2 154 NULL 3377962 Using where
---------------
Now if I add a constraint:
select * from queue_encoder
where
status='ready' and
machine_name IS NULL AND
codec_id IN (4,9,10,11)
order by
urgent_flag desc ,
timestamp_enqueued DESC
limit 10
yacht-2:~ dbs$ time mysql -uroot cms < urgentfirst.sql
[...]
real 0m24.982s
explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE queue_encoder ref index_1,index_3,compound_1,compound_2 compound_2 154 const,const 1576779 Using where
CREATE TABLE `queue_encoder` (
`queue_encoder_id` int(11) NOT NULL AUTO_INCREMENT,
`media_type_id` int(11) NOT NULL DEFAULT '0',
`item_id` int(11) NOT NULL DEFAULT '0',
`label_id` int(11) NOT NULL DEFAULT '0',
`codec_id` int(11) DEFAULT NULL,
`urgent_flag` int(1) NOT NULL DEFAULT '0',
`machine_name` varchar(50) DEFAULT NULL,
`process_id` int(2) DEFAULT NULL,
`timestamp_enqueued` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`timestamp_started` datetime DEFAULT NULL,
`timestamp_finished` datetime DEFAULT NULL,
`status` enum('ready','success-complete','success-incomplete','error','in-process','delay') NOT NULL DEFAULT 'ready',
`status_message` varchar(255) DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`queue_encoder_id`),
UNIQUE KEY `unique_1` (`item_id`,`media_type_id`,`codec_id`),
KEY `index_1` (`machine_name`,`process_id`,`status`),
KEY `index_2` (`timestamp_enqueued`,`label_id`,`codec_id`,`media_type_id`),
KEY `index_3` (`status`,`machine_name`,`codec_id`,`media_type_id`,`label_id`),
KEY `compound_1` (`status`,`machine_name`,`codec_id`,`media_type_id`,`label_id`,`urgent_flag`,`timestamp_enqueued`),
KEY `compound_2` (`status`,`machine_name`,`urgent_flag`,`timestamp_enqueued`)
) ENGINE=MyISAM AUTO_INCREMENT=74076324 DEFAULT CHARSET=utf8 |
(Background - this table has 3.4 million rows-ish in it. We have a process that asks the following select question about every 30 seconds:
SELECT * from queue_encoder
WHERE
status = 'ready' AND
machine_name IS NULL AND
codec_id IN (4,9,10,11) AND
media_type_id IN (1,2) AND
label_id IN (2,3,4,11,13,15,17,18,19,20,21,22,29,30,46,47)
ORDER BY
urgent_flag DESC,
timestamp_enqueued DESC
limit 20
That query, due to ISAM table locking, can take 2-4 minutes to complete (it's actually an UPDATE...WHERE clause). If I can make that be fully indexed, we will significantly increase our processing throughput.
Help?
select * from queue_encoder
where
status='ready' and
machine_name IS NULL
order by
urgent_flag desc ,
timestamp_enqueued DESC
limit 10
yacht-2:~ dbs$ time mysql -uroot cms < urgentfirst.sql
[...]
real 0m0.037s
explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE queue_encoder range index_1,index_3,compound_1,compound_2 compound_2 154 NULL 3377962 Using where
---------------
Now if I add a constraint:
select * from queue_encoder
where
status='ready' and
machine_name IS NULL AND
codec_id IN (4,9,10,11)
order by
urgent_flag desc ,
timestamp_enqueued DESC
limit 10
yacht-2:~ dbs$ time mysql -uroot cms < urgentfirst.sql
[...]
real 0m24.982s
explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE queue_encoder ref index_1,index_3,compound_1,compound_2 compound_2 154 const,const 1576779 Using where
CREATE TABLE `queue_encoder` (
`queue_encoder_id` int(11) NOT NULL AUTO_INCREMENT,
`media_type_id` int(11) NOT NULL DEFAULT '0',
`item_id` int(11) NOT NULL DEFAULT '0',
`label_id` int(11) NOT NULL DEFAULT '0',
`codec_id` int(11) DEFAULT NULL,
`urgent_flag` int(1) NOT NULL DEFAULT '0',
`machine_name` varchar(50) DEFAULT NULL,
`process_id` int(2) DEFAULT NULL,
`timestamp_enqueued` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`timestamp_started` datetime DEFAULT NULL,
`timestamp_finished` datetime DEFAULT NULL,
`status` enum('ready','success-complete','success-incomplete','error','in-process','delay') NOT NULL DEFAULT 'ready',
`status_message` varchar(255) DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`queue_encoder_id`),
UNIQUE KEY `unique_1` (`item_id`,`media_type_id`,`codec_id`),
KEY `index_1` (`machine_name`,`process_id`,`status`),
KEY `index_2` (`timestamp_enqueued`,`label_id`,`codec_id`,`media_type_id`),
KEY `index_3` (`status`,`machine_name`,`codec_id`,`media_type_id`,`label_id`),
KEY `compound_1` (`status`,`machine_name`,`codec_id`,`media_type_id`,`label_id`,`urgent_flag`,`timestamp_enqueued`),
KEY `compound_2` (`status`,`machine_name`,`urgent_flag`,`timestamp_enqueued`)
) ENGINE=MyISAM AUTO_INCREMENT=74076324 DEFAULT CHARSET=utf8 |
(Background - this table has 3.4 million rows-ish in it. We have a process that asks the following select question about every 30 seconds:
SELECT * from queue_encoder
WHERE
status = 'ready' AND
machine_name IS NULL AND
codec_id IN (4,9,10,11) AND
media_type_id IN (1,2) AND
label_id IN (2,3,4,11,13,15,17,18,19,20,21,22,29,30,46,47)
ORDER BY
urgent_flag DESC,
timestamp_enqueued DESC
limit 20
That query, due to ISAM table locking, can take 2-4 minutes to complete (it's actually an UPDATE...WHERE clause). If I can make that be fully indexed, we will significantly increase our processing throughput.
Help?
no subject
Date: 2010-07-03 08:00 pm (UTC)i'm no mysql expert, and a little rusty, but...
compound_2 is a subset of compound_1, which has everything you need. (ISTR that keys that are a subset of another key aren't a good idea in general.) yet the engine is searching compound_2 (either because the ORDER BY clause is forcing it or because of poor optimization decisions), and needing to refer to other things to do the query. try removing compound_2, or hinting the query to use compound_1 (I forget how that works).
also remember that ORDER BY...LIMIT effectively requires a sort in there somewhere, in order to be able to get the highest-valued candidates to the top for LIMIT to be applied to. if you can change this to twenty UPDATE...LIMIT 1 queries, then the engine may just say, "oh, I just need to look at the rows that have the highest known value in urgent_flag, and of those, the rows that have the highest known value in timestamp_enqueued" (and if your inserts were slow enough, there's only one of the latter).
how big are your MySQL caches/buffers?
--jh
no subject
Date: 2010-07-03 09:50 pm (UTC)Well, this IS for work - and my company does distribute music for a living :)
how big are your MySQL caches/buffers?
We actually pursued this a bunch. Using the key_buffer_read variables in the server, we could see that indeed the entire query was using in-memory indexes. so THAT wasn't it.
A good check though :)
no subject
Date: 2010-07-06 01:30 pm (UTC)KEY `compound_2` (`status`,`machine_name`,`urgent_flag`,`timestamp_enqueued`)
The first query has filters (where clause) on
status, machine name
and sorts by:
urgent_flag
timestamp_enqueued
The second query has filters on
status, machine name, codec_id
and sorts by:
urgent_flag
timestamp_enqueued
So if you just want an index for that, you'd want an index on:
(status,machine_name,codec_id,urgent_flag,timestamp_enqueued)
-------------
HOWEVER:
MySQL uses B-trees for its underlying structure for MyISAM tables and indexes (by default, unless you specify otherwise). Therefore, range and equality matches are pretty good. Also, the prefix of indexes can be used. (I liken it to scanning through a dictionary, it's easy to find the words starting with s, sh, or she, but not really possible without a full scan of each word to find words ending in e.)
Also, because you're doing a SELECT *, the MySQL optimizer has to figure out if it's worth it to actually go to the index to find the rows you need, and THEN go to the table data to find the rest of the information. If the
*optimizer thinks* you will need more than about 20-30% of the rows, it will just do a full table scan instead. This is NOT your problem, however I am bringing this point up because both this and the previous lead to a corollary:
MySQL isn't particularly good at finding "flags", even when indexed, especially when the "flag" you are searching for is shared in common with about 20% or more of the data. For example, say you have a flag on gender with male, female, other/don't know, where 45% are male, 45% are female and 10% are other/don't know. Searching for "other" will be fast and use the index, searching for male or female probably won't bother.
Likewise, NULL adds extra time in the search. In this particular case I'd advise a machine name of '', with the field marked as NOT NULL DEFAULT '', instead of NULL. You will never have a question of "do they mean 'blank' or do they mean NULL" in this case, because all machines have names.
-----------------
In addition, if you're updating on a MyISAM table, you are locking the WHOLE table until the update completes. I would advise using InnoDB for this table. If you need a certain feature of MyISAM, it's likely a read feature (ie, exact optimizer statistics, FULLTEXT search) and you could have the table InnoDB on the master and MyISAM on the slave (which would mean your slave lag could be 2-4 minutes when such a query goes through, but at least the master isn't locking).