-
Notifications
You must be signed in to change notification settings - Fork 0
/
A Case Study in Profiling Queries in MySQL.html
439 lines (329 loc) · 17 KB
/
A Case Study in Profiling Queries in MySQL.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>A Case Study in Profiling Queries in MySQL</title>
<link rel="alternate" type="application/rss+xml" title="RSS" href="https://www.xaprb.com/index.xml">
<link rel="canonical" href="https://www.xaprb.com/blog/2006/10/15/a-case-study-in-profiling-queries-in-mysql/">
<link rel="shortcut icon" type="image/png" href="https://www.xaprb.com/apple-touch-icon-precomposed.png">
<meta name="generator" content="Hugo 0.47.1" />
<meta property="og:title" content="A Case Study in Profiling Queries in MySQL" />
<meta property="og:type" content="article" />
<meta property="og:image" content="https://www.xaprb.com/img/default-header-img.jpg" />
<meta property="og:description" content="" />
<meta property="og:url" content="https://www.xaprb.com/blog/2006/10/15/a-case-study-in-profiling-queries-in-mysql/" />
<meta property="og:site_name" content="A Case Study in Profiling Queries in MySQL" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@xaprb" />
<link rel="stylesheet" href="https://www.xaprb.com/css/tachyons.min.css" />
<link rel="stylesheet" href="https://www.xaprb.com/css/story.css" />
<link rel="stylesheet" href="https://www.xaprb.com/css/descartes.css" />
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous">
<link href="https://fonts.googleapis.com/css?family=Quattrocento+Sans:400,400i,700,700i|Quattrocento:400,700|Spectral:400,400i,700,700i&subset=latin-ext" rel="stylesheet">
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-TEMocfGvRuD1rIAacqrknm5BQZ7W7uWitoih+jMNFXQIbNl16bO8OZmylH/Vi/Ei" crossorigin="anonymous">
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-jmxIlussZWB7qCuB+PgKG1uLjjxbVVIayPJwi6cG6Zb4YKq0JIw+OMnkkEC7kYCq" crossorigin="anonymous"></script>
<script src="https://www.xaprb.com/js/auto-render.min.js"></script>
<script src="https://www.xaprb.com/js/story.js"></script>
</head>
<body class="ma0 bg-white section-blog page-kind-page is-page-true feature-math feature-figcaption feature-tablecaption feature-figcaption-hidden feature-3dbook-covers feature-hyphenate feature-justify feature-highlight feature-hrfleuron feature-qrcode feature-tweetquote feature-depth">
<header class="cover bg-top" style="background-image: url('https://www.xaprb.com/img/default-header-img.jpg'); background-position: center;">
<div class="bg-black-30 bb bt">
<nav class="hide-print sans-serif border-box pa3 ph5-l">
<a href="https://www.xaprb.com/" title="Home">
<img src="https://www.xaprb.com/img/logo.jpg" class="w2 h2 br-100" alt="Baron Schwartz's Website" />
</a>
<div class="fr h2 pv2 tr">
<a class="link f5 ml2 dim near-white" href="/archives/">Archives</a>
<a class="link f5 ml2 dim near-white" href="/talks/">Talks</a>
<a class="link f5 ml2 dim near-white" href="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/xaprb"><i class="fab fa-github-square"></i></a>
<a class="link f5 ml2 dim near-white" href="https://linkedin.com/in/xaprb"><i class="fab fa-linkedin"></i></a>
<a class="link f5 ml2 dim near-white" href="https://twitter.com/xaprb"><i class="fab fa-twitter-square"></i></a>
<a class="link f5 ml2 dim near-white fas fa-rss-square" href="https://www.xaprb.com/index.xml" title="RSS Feed"></a>
<a class="link f5 ml2 dim near-white fas fa-search" href="https://www.xaprb.com/search/" role="search" title="Search"></a>
</div>
</nav>
<div id="hdr" class="tc-l pv4-ns pv5-l pv2 ph3 ph4-ns">
<h1 class="near-white mt1-ns f2 fw3 mb0 mt0 lh-title">A Case Study in Profiling Queries in MySQL</h1>
<h2 class="near-white mt3-l mb4-l fw1 f6 f3-l measure-wide-l center lh-copy mt2 mb3">
Published
<time datetime="2006-10-15T00:00:00Z">Oct 15, 2006</time>
<span class="display-print">by Baron Schwartz</span>
in <a href="https://www.xaprb.com/categories/databases" class="no-underline category near-white dim">Databases</a>
<span class="display-print">at https://www.xaprb.com/blog/2006/10/15/a-case-study-in-profiling-queries-in-mysql/</span>
</h2>
</div>
</div>
</header>
<main role="main">
<article class="center bg-white br-3 pv1 ph4 lh-copy f5 nested-links mw7">
<p>This is the second in a series of articles on profiling queries in MySQL. In this article I’ll demonstrate the technique I described in the <a href="/blog/2006/10/12/how-to-profile-a-query-in-mysql/">first article</a>.</p>
<blockquote>
<p>Note: I wrote this article in 2006, when I didn’t have a clear understanding
of even simple concepts such as <em>what performance really is</em>. Since then I
have learned a lot from wise people such as Cary Millsap. In 2012 I founded
<a href="https://vividcortex.com/">VividCortex, the best database performance optimization and database monitoring platform</a>
to chase away the kind of ignorance I display in the article below. Enjoy this
trip down memory lane.</p>
</blockquote>
<p>I’m using 5.0.22-Debian_0ubuntu6.06.2-log for my tests.</p>
<h3 id="example-re-index-a-table-and-measure-the-performance-gain">Example: re-index a table and measure the performance gain</h3>
<p>I started with the following table, and an <code>INSERT</code> statement to populate it with a million rows of data. Aside from the fact that this data is all evenly distributed, this table and data are similar to some tables my coworkers and I redesigned earlier this year (see my article on <a href="/blog/2006/06/14/how-to-re-index-a-large-database-table/">how to re-index a large database table</a>).</p>
<pre><code class="language-sql">CREATE TABLE `tracking` (
`id` int(11) NOT NULL auto_increment,
`day` date NOT NULL,
`ad` int(11) NOT NULL,
`clicks` int(11) NOT NULL,
`impressions` int(11) NOT NULL,
`client` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ad` (`ad`,`day`),
KEY `day` (`day`),
KEY `client` (`client`)
) ENGINE=InnoDB;
insert into tracking(day, ad, clicks, impressions, client)
select date_add('2006-01-01', interval a.i - 1 day),
b.i,
rand() * 100,
rand() * 100,
rand() * 20
from numbers as a
cross join numbers as b
where a.i <= 1000 and b.i <= 1000;
</code></pre>
<p>This table is designed with a surrogate key that isn’t used at all, and prevents the table from being clustered day-first, which is how it tends to be queried. Here’s a typical query for this data, which consistently takes 0.58 seconds to run on my machine:</p>
<pre><code class="language-sql">select day, ad, sum(clicks), sum(impressions)
from tracking
where client = 11 and day between '2007-01-01' and '2007-01-31'
group by day;
</code></pre>
<h3 id="step-1-measure-the-query-before-re-indexing">Step 1: Measure the query before re-indexing</h3>
<p>I began by measuring the query against the table as currently designed. I ran <code>SHOW STATUS</code> once, ran the query, then ran <code>SHOW STATUS</code> twice more. I put all three sets of data into a spreadsheet, with columns titled <code>Before</code>, <code>After1</code>, and <code>Calibration</code>. I then added a column called <code>TotalCost</code>, whose formula is <code>After1 - Before - (Calibration - After1)</code>. You can read the first article for more on why I use this math to subtract out the effects of running <code>SHOW STATUS</code> itself.</p>
<p>I’ll show you a synopsis of the data in a bit, but in case you’re curious, <a href="/media/2006/10/status-before-indexing.csv">here’s the raw data in CSV format</a>.</p>
<h3 id="step-2-re-index-and-measure-again">Step 2: Re-index and measure again</h3>
<p>I re-indexed the table, removing the surrogate key and clustering on <code>day, ad</code>. Now I have the following table:</p>
<pre><code class="language-sql">CREATE TABLE `tracking` (
`day` date NOT NULL,
`ad` int(11) NOT NULL,
`clicks` int(11) NOT NULL,
`impressions` int(11) NOT NULL,
`client` int(11) NOT NULL,
PRIMARY KEY (`day`,`ad`),
KEY `ad` (`ad`),
KEY `client` (`client`)
) ENGINE=InnoDB;
</code></pre>
<p>The same query now consistently runs in two or three hundredths of a second. Here are the <code>SHOW STATUS</code> <a href="/media/2006/10/status-after-indexing.csv">numbers for the redesigned table</a>.</p>
<h3 id="just-the-important-numbers">Just the important numbers</h3>
<p>That’s a lot of numbers to look at, so here’s a synopsis of all the numbers that didn’t come out to zero in both cases:</p>
<table>
<thead>
<tr>
<th>Variable_name</th>
<th>Design1</th>
<th>Design2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Com_select</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Created_tmp_tables</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>Handler_read_key</td>
<td>1574</td>
<td>2</td>
</tr>
<tr>
<td>Handler_read_next</td>
<td>49987</td>
<td>31000</td>
</tr>
<tr>
<td>Handler_read_rnd</td>
<td>31</td>
<td></td>
</tr>
<tr>
<td>Handler_read_rnd_next</td>
<td>32</td>
<td></td>
</tr>
<tr>
<td>Handler_update</td>
<td>1540</td>
<td></td>
</tr>
<tr>
<td>Handler_write</td>
<td>31</td>
<td></td>
</tr>
<tr>
<td>Innodb_buffer_pool_read_ahead_rnd</td>
<td>3</td>
<td></td>
</tr>
<tr>
<td>Innodb_buffer_pool_read_ahead_seq</td>
<td>169</td>
<td></td>
</tr>
<tr>
<td>Innodb_buffer_pool_read_requests</td>
<td>205242</td>
<td>3969</td>
</tr>
<tr>
<td>Innodb_buffer_pool_reads</td>
<td>86</td>
<td></td>
</tr>
<tr>
<td>Innodb_data_read</td>
<td>46153728</td>
<td></td>
</tr>
<tr>
<td>Innodb_data_reads</td>
<td>265</td>
<td></td>
</tr>
<tr>
<td>Innodb_pages_read</td>
<td>2817</td>
<td></td>
</tr>
<tr>
<td>Innodb_rows_read</td>
<td>49987</td>
<td>31001</td>
</tr>
<tr>
<td>Questions</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Select_range</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>Sort_rows</td>
<td>31</td>
<td></td>
</tr>
<tr>
<td>Sort_scan</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>Table_locks_immediate</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>I gave a high-level, hand-waving overview of interpreting results in my previous article, because there’s just too much to go over in one article. Hopefully you can sink your teeth into this example. For example, you can see the first design created a temporary table for some reason, made more index reads, made lots more requests to the buffer pool, and read a lot more bytes of data. What’s going on here?</p>
<p>As I mentioned in the first article, it really helps to</p>
<ol>
<li>time the query (been there, done that: 0.03 seconds versus 0.58)</li>
<li><code>EXPLAIN</code> the query</li>
</ol>
<p>The <code>EXPLAIN</code> is the missing link here. Here it is:</p>
<pre><code class="language-sql">*************************** Design 1
id: 1
select_type: SIMPLE
table: tracking
type: ref
possible_keys: day,client
key: client
key_len: 4
ref: const
rows: 59898
Extra: Using where; Using temporary; Using filesort
*************************** Design 2
id: 1
select_type: SIMPLE
table: tracking
type: range
possible_keys: PRIMARY,client
key: PRIMARY
key_len: 3
ref: NULL
rows: 46284
Extra: Using where
</code></pre>
<p>The query against the first table required a temporary table and filesort. It scanned the <code>client</code> key and did bookmark lookups to the clustered index. The second query just scanned a range of the clustered index. Armed with this knowledge, look back at the actual numbers; it’s pretty amazing how much extra work is caused in the first case by having to navigate a secondary index and then a clustered index. The most dramatic difference is how the InnoDB buffer pool is used. Here are some highlights:</p>
<ul>
<li>The first query made 50 times the requests to the buffer pool.</li>
<li>The first query had to go to disk 86 times to satisfy a buffer pool read request, but the second query never had a buffer pool miss at all.</li>
<li>The first query needed to read 62% more rows, because it had to shotgun through the whole table looking for data.</li>
<li>The first query had to make more than a thousand times as many key reads as the second.</li>
</ul>
<p>I want to point out that the number of times a buffer pool read request could be satisfied without going to disk—every time in the second query—is really important. It means the first query kept requesting data that turned out not to be in memory, whereas the second query kept requesting data and it was already in memory. Why is this? Well, the second query was confined to a contiguous range of the disk, so once that range was read into memory, it stayed there. The first query had to keep requesting blocks from every part of the table, and they couldn’t all fit in the buffer pool, so some of them were getting pushed out and re-read later. How much did this happen? About 44 MiB, according to <code>Innodb_data_read</code>. Remember, even if you have to get just one row, InnoDB reads an entire block from disk, which may be really inefficient—the size of a row divided by 16,384 bytes per block.</p>
<p>I hope you agree that’s much more concrete and useful than comparing execution time. Armed with this knowledge, you can understand which optimizations really make a difference.</p>
<h3 id="one-really-strange-result">One really strange result</h3>
<p>One thing I didn’t show you about those two queries was the value of <code>Last_query_cost</code>. That’s because it showed the slower, more data-intensive query actually having a <em>lower</em> cost than the faster one:</p>
<table>
<thead>
<tr>
<th>Variable_name</th>
<th>Design1</th>
<th>Design2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Last_query_cost</td>
<td>20343.599000</td>
<td>71039.632551</td>
</tr>
</tbody>
</table>
<p>That’s pretty bizarre, isn’t it? I don’t know how the query cost is calculated; I believe the optimizer calculates it in advance of actually executing the query. It definitely doesn’t match the actual cost of executing these queries. It is usually more in line with the true cost, but not always. You should not rely on it absolutely.</p>
<h3 id="conclusion">Conclusion</h3>
<p>The example I gave should make it pretty clear how much you can measure about query performance—execution time is only one data point. In the third article in this series, I’ll take the wraps off a shiny new tool that can do all this tedious math for you in the blink of an eye. Stay tuned.</p>
</article>
</main>
<div class="hide-print sans-serif f6 f5-l mt5 ph3 pb6 center nested-copy-line-height lh-copy nested-links mw-100 measure-wide">
<div class="about-the-author">
<p><a href="/about/"><img src="https://d33wubrfki0l68.cloudfront.net/843019e4d498ed7c38871b8f93da3ccfc0f91632/7e911/baron-square.jpg" alt="Baron Schwartz" /></a></p>
<p>I’m the founder and CTO of <a href="https://vividcortex.com">VividCortex</a>, author of
several books, and creator of various open-source software. I write about topics such as technology, entrepreneurship, and fitness, and I tweet at <a href="https://twitter.com/xaprb">@xaprb</a>. <a href="/about/">More about me</a>.</p>
</div>
<div class="subscribe-form cb w-100 pt3">
<form action="//xaprb.us19.list-manage.com/subscribe?u=6ee277790a99c26a414c9693f&id=703c46c578" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" target="_blank" novalidate>
<input class="w-50 dib ph3 pv2 ba b--silver br2 sans-serif f6 f5-l mr2" type="email" placeholder="Subscribe for email updates!" name="EMAIL" id="mce-EMAIL">
<input class="w-40 dib ph3 pv2 ba b--silver br2 sans-serif f6 f5-l white bg-mid-gray" type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe">
</form>
</div>
</div>
<footer class="hide-print sans-serif f6 fw1 bg-black near-white bottom-0 w-100 pa3" role="contentinfo">
<p class="w-50 fr tr">
<a class="no-underline near-white" href="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/xaprb/story"><img class="dib" title="Made with Hugo and Story" alt="Story logo" src="https://www.xaprb.com/img/story-logo-white.svg" style="width: 1.5rem; height: 1.5rem" /></a>
</p>
<p class="w-50 near-white">
© 2019 Baron Schwartz
</p>
</footer>
<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-101883-1', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
</body>
</html>