Saturday, June 18, 2016

Query rewrite with ProxySQL : use case scenario

One of the interesting features of ProxySQL is the support of query rewrite.
In fact, this functionality was one of the main motivations for writing a proxy that sits between the application and the database server to rewrite queries on the fly, without the need to have a developer involved in rewriting the query ASAP (if ever possible!).


In this short post I will describe a use case scenario on how to use ProxySQL to troubleshoot performance and quickly rewrite queries if needed.
The setup in this specific blog post is Nylas platform, a sharded environment where there are over 80 ProxySQL instances running.

After noticing some odd workload originated from Nylas' open source sync engine , as per the attached graph on InnoDB rows read:


We could ask ProxySQL to get the list of the most time consuming queries:

mysql> SELECT SUM(sum_time), SUM(count_star), digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time) DESC LIMIT 3\G
*************************** 1. row ***************************
SUM(sum_time): 1907314442459
SUM(count_star): 15033880
digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at, trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?) AND trx.object_type = ? AND trx.command = ? ORDER BY trx.id ASC LIMIT ?
*************************** 2. row ***************************
SUM(sum_time): 1753837589659
SUM(count_star): 3533157
digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at, trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?, ?) AND trx.object_type = ? AND trx.command = ? ORDER BY trx.id ASC LIMIT ?
*************************** 3. row ***************************
SUM(sum_time): 764491985658
SUM(count_star): 1705938
digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at, trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?, ?, ?) AND trx.object_type = ? AND trx.command = ? ORDER BY trx.id ASC LIMIT ?
3 rows in set (0.35 sec)
view raw gistfile1.txt hosted with ❤ by GitHub

Or even ask which is the query with the highest average execution time:


mysql> SELECT SUM(sum_time), SUM(count_star), SUM(sum_time)/SUM(count_star) avg, digest_text FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT trx.public_id AS trx_public_id%' GROUP BY
digest ORDER BY SUM(sum_time)/SUM(count_star) DESC LIMIT 1\G
*************************** 1. row ***************************
SUM(sum_time): 1753837589659
SUM(count_star): 3533157
avg: 496393
digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at, trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?, ?) AND trx.object_type = ? AND trx.command = ? ORDER BY trx.id ASC LIMIT ?
view raw gistfile1.txt hosted with ❤ by GitHub

Note that we could have get the same information from information_schema.events_statements_summary_by_digest , but we already knew that the load was coming from a specific node so we queried stats.stats_mysql_query_digest inside ProxySQL itself.

From the output above, it was obvious that the problem was with a specific query type running at that time.

Without copying again the whole query, the query could be simplified as:

SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 IN (? , ?)
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ? 


col2 and col3 are not indexes because do not provide enough cardinality, while idx_col1 does, so it is indexed.
Because of the WHERE clauses and the ORDER BY , mysqld processes it performing a range scan on the primary key.
The optimizer could be smart enough to optimize this query, but it is not.

The query can be rewritten as:

SELECT * FROM (
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
UNION ALL
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
) t ORDER BY PK LIMIT ?

How to do this with ProxySQL? Extremely easy!
The snippet below shows how to create a rule to rewrite the query, load the new rule into runtime, and persist the new rule on disk:


INSERT INTO mysql_query_rules (rule_id,active,flagIN,match_pattern, replace_pattern) VALUES (92,1,0,
'^SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command(.*)\nFROM trx(.*)\nWHERE trx.id >= (\d+) AND trx.namespace_id IN \((\d+), (\d+)\) AND trx.object_type = (.*) ORDER BY trx.id ASC(.*)\n LIMIT (.*)$',
'SELECT * FROM (
(SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.id AS trx_id,
trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type,
trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command \1
FROM trx \2 WHERE trx.id >= \3 AND trx.namespace_id = \4 AND trx.object_type = \6 ORDER BY trx.id ASC LIMIT \8)
UNION ALL
(SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.id AS trx_id,
trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type,
trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command \1
FROM trx \2 WHERE trx.id >= \3 AND trx.namespace_id = \5 AND trx.object_type = \6 ORDER BY trx.id ASC LIMIT \8)
) t ORDER BY trx_id LIMIT \8'
);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
view raw gistfile1.txt hosted with ❤ by GitHub

From the following graph of InnoDB rows read it is immediately clear the effect of the new query rewrite:



In conclusion: query rewrite is a feature that, depending from the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.