Sunday, October 27, 2013

Introducing ProxySQL : High Performance Proxy for MySQL

I concluded my previous post stating that what is really missing is a proxy that:
- is stable as HAProxy
- scales like MySQL Proxy
- is rich of features


That is what is driving my development for ProxySQL : a high performance proxy for MySQL.



Some background first.
In the past I have worked with customers that, after providing them a detailed SQL review on how to improve performance rewriting queries, answer with the usual "we can't modify the queries" , and the most common reasons for this are, among others:
- the queries are generated by ORM ;
- they don't own the application ;
- they don't have the time to dig the code and rewrite the queries .
It is a quite spread false believe that adding indexes and tuning MySQL can magically improve performance of badly written queries, but the truth is far from that.


I am pretty confident that any DBA was in the situation of seeing a perfectly tuned MySQL server dying under the load of poorly written SQL statements, or under the load of a high number of redundant queries sent by applications that do not implement any sort of caching.
And often this lead to downtime!
What makes it even worse is that you, as DBA, you have identified the bad queries and you are telling the Devs exactly how they need to rewrite them, but all your work and effort become worthless if Devs don't perform the rewrite. This may either never happen, or happen after few hours of downtime.
Why a DBA should depends from a Dev for this sort of scenarios? The DBA should be able to reconfigure the system with the new rewritten queries. The DBA should have more power over the RDBMS, power that right now is in the hands of developers.
So the idea of the proxy: when a DBA find a broken query, in case of emergency (s)he should fix it immediately on the proxy, without waiting any Dev.
MySQL Proxy can do that, but its current implementation is still in alpha and far from being production ready (see my previous post). I looked for alternatives for a while, but after some time I combined the need of a new proxy with some study of the MySQL protocol, and this leaded to a new project: ProxySQL.


What started as a “weekends project” being a prototype that was simply analyzing mysql protocol packets and rewriting queries using some basic regex pattern/replace, has slowly evolved into a more functional proxy. And ironically, the rewrite functionally is not implemented yet!




ProxySQL is still in alpha phase, and lots of features are not implemented yet, but before adding new features I want to ensure it is fast enough to not affect performance.
In my previous post I compared performance of MySQL vs MySQL Proxy vs HAProxy , using both sysbench and mysqlslap.
Now, it is time to add ProxySQL in the same graphs.

sysbench:
# for i in 1 2 4 8 16 32 64 128 ; do sysbench --num-threads=$i --max-requests=0 --max-time=60 --test=oltp --oltp-table-size=100000 --mysql-user=root --mysql-password=pass --mysql-host=127.0.0.1 --mysql-db=test --db-ps-mode=disable --mysql-port=6033 --oltp-read-only run | grep 'transactions:' ; done
    transactions:                        16357  (272.60 per sec.)
    transactions:                        31678  (527.95 per sec.)
    transactions:                        63219  (1053.60 per sec.)
    transactions:                        117977 (1966.17 per sec.)
    transactions:                        192814 (3213.33 per sec.)
    transactions:                        241416 (4023.18 per sec.)
    transactions:                        239564 (3992.18 per sec.)
    transactions:                        237762 (3961.61 per sec.)




mysqlslap:
# mysqlslap --create-schema=test -u root -ppass -h 127.0.0.1 -P6033 -c 1,2,4,8,16,32,64,128 -q select1.sql  | grep "Average number of seconds to run all queries"
Warning: Using a password on the command line interface can be insecure.
        Average number of seconds to run all queries: 8.189 seconds
        Average number of seconds to run all queries: 8.432 seconds
        Average number of seconds to run all queries: 8.716 seconds
        Average number of seconds to run all queries: 10.256 seconds
        Average number of seconds to run all queries: 11.808 seconds
        Average number of seconds to run all queries: 18.809 seconds
        Average number of seconds to run all queries: 34.577 seconds
        Average number of seconds to run all queries: 69.186 seconds





The above look very promising!
The performance of ProxySQL outnumbers the performance of both MySQL Proxy and HAProxy!!

One of the first features I wanted to introduce in ProxySQL is caching.
This features is not available in HAProxy and it is not available in MySQL Proxy, and I am pretty confident that a good implementation can't be developed with Lua scripting.
In future, users of ProxySQL (either DBAs or Devs) will have the opportunity to decide what to cache and what not, for how long and how to invalidate it, but for now the current implementation is simple as that: every SELECT statement that is not a SELECT FOR UPDATE is cached for a configurable amount of time (30 seconds by default). From an application point of view, it is like caching mysql resultsets in memcached (or other caching solutions), or reading from slaves that are lagging up to 30 seconds.

In the follow graphs I rerun the same benchmark, but with caching enabled.

sysbench:
 
# for i in 1 2 4 8 16 32 64 128 ; do sysbench --num-threads=$i --max-requests=0 --max-time=60 --test=oltp --oltp-table-size=100000 --mysql-user=root --mysql-password=pass --mysql-host=127.0.0.1 --mysql-db=test --db-ps-mode=disable --mysql-port=6033 --oltp-read-only run | grep 'transactions:' ; done
    transactions:                        43855  (730.91 per sec.)
    transactions:                        100656 (1677.58 per sec.)
    transactions:                        204358 (3405.90 per sec.)
    transactions:                        404678 (6744.51 per sec.)
    transactions:                        717239 (11953.78 per sec.)
    transactions:                        960561 (16008.91 per sec.)
    transactions:                        947388 (15788.89 per sec.)
    transactions:                        932289 (15536.14 per sec.)





mysqlslap:

# mysqlslap --create-schema=test -u root -ppass -h 127.0.0.1 -P6033 -c 1,2,4,8,16,32,64,128 -q select1.sql  | grep "Average number of seconds to run all queries"
Warning: Using a password on the command line interface can be insecure.
        Average number of seconds to run all queries: 3.106 seconds
        Average number of seconds to run all queries: 3.300 seconds
        Average number of seconds to run all queries: 3.569 seconds
        Average number of seconds to run all queries: 4.197 seconds
        Average number of seconds to run all queries: 5.028 seconds
        Average number of seconds to run all queries: 7.188 seconds
        Average number of seconds to run all queries: 14.405 seconds
        Average number of seconds to run all queries: 28.475 seconds



With cache enabled, performance of ProxySQL are way better than performance against MySQL directly!
I expect to question "what about MySQL with query cache?" .
The performance of the cache of ProxySQL are superior than the performance of query cache in MySQL, and this is easy to demonstrate without running any benchmark: MySQL query cache cannot cache statements like "SELECT 111" used in the mysqlslap benchmark, thus is of no use and the number of QPS of mysqlslap against MySQL cannot grow.


Being the above results really interesting and promising, I have good reasons to continue the development of ProxySQL , adding features, improving performance and stability, and fixing bugs.



2 comments:

  1. Query rewriting feature is awesome. That would help a lot for such scenarios.

    ReplyDelete
  2. Thank you for this information I have benefited greatly from this topic , really good job , my best regards
    PalVue Proxy

    ReplyDelete