Saturday, June 18, 2016

Query rewrite with ProxySQL : use case scenario

One of the interesting features of ProxySQL is the support of query rewrite.
In fact, this functionality was one of the main motivations for writing a proxy that sits between the application and the database server to rewrite queries on the fly, without the need to have a developer involved in rewriting the query ASAP (if ever possible!).


In this short post I will describe a use case scenario on how to use ProxySQL to troubleshoot performance and quickly rewrite queries if needed.
The setup in this specific blog post is Nylas platform, a sharded environment where there are over 80 ProxySQL instances running.

After noticing some odd workload originated from Nylas' open source sync engine , as per the attached graph on InnoDB rows read:


We could ask ProxySQL to get the list of the most time consuming queries:


Or even ask which is the query with the highest average execution time:



Note that we could have get the same information from information_schema.events_statements_summary_by_digest , but we already knew that the load was coming from a specific node so we queried stats.stats_mysql_query_digest inside ProxySQL itself.

From the output above, it was obvious that the problem was with a specific query type running at that time.

Without copying again the whole query, the query could be simplified as:

SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 IN (? , ?)
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ? 


col2 and col3 are not indexes because do not provide enough cardinality, while idx_col1 does, so it is indexed.
Because of the WHERE clauses and the ORDER BY , mysqld processes it performing a range scan on the primary key.
The optimizer could be smart enough to optimize this query, but it is not.

The query can be rewritten as:

SELECT * FROM (
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
UNION ALL
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
) t ORDER BY PK LIMIT ?

How to do this with ProxySQL? Extremely easy!
The snippet below shows how to create a rule to rewrite the query, load the new rule into runtime, and persist the new rule on disk:



From the following graph of InnoDB rows read it is immediately clear the effect of the new query rewrite:



In conclusion: query rewrite is a feature that, depending from the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.

4 comments:

  1. Is it possible to rewrite queries with a plugin like MySQL proxy? I have added a new type of plugin to my MySQL 5.7 fork which intercepts queries like MySQL proxy, but it would be nice to have the functionality in an external tool too.
    https://shardquery.com/2016/07/25/sql-injection-in-the-mysql-server-of-the-proxy-kind/

    ReplyDelete
    Replies
    1. Justin, ProxySQL allows query rewriting using regex match/replace . Regexes can be concatenated, therefore is possible to make quite complex rewrites.
      Future version of ProxySQL will probably support LUA scripting, but so far regex engine is going a great work!

      Delete
  2. Can I use the "match_digest" instead of "match_pattern" to rewrite the queries?

    ReplyDelete
    Replies
    1. Short answer: no.
      Because you can use multiple matching criteria to match a query, you can use match_digest as one of the matching criteria, but you also *must* use match_pattern.

      Delete