Saturday, January 18, 2014

Presenting ProxySQL at FOSDEM and Percona Live

In October I briefly introduced ProxySQL , a prototype to perform caching, load balancing and query rewrite. In light of its initial promising performance, in the months I redesigned and re-architectured it almost completely, in a way that is now easier to solve many important issues in the MySQL ecosystem:
- load balancing;
- read/write split;
- caching reads outside the database server;
- on-the-fly rewrite of queries;
- query routing;
All of these features are completely transparent to the application layer, and gives numerous benefits:
- increased throughput and reduced latency, for both write and read workload;
- simplified clients configuration;
- optimize applications that generate bloated queries (most of the ORMs do that).
This results in faster setups, improved users experience, and optimized usage of resources (hardware, software, but also man-hours).


ProxySQL is still under active development and some of its features are still either incomplete or not very user friendly, but I strongly advise you to try it: you will be impressed by its original and unique features, and from its astonishing performance! 
The software is available at https://github.com/renecannao/proxysql .


I will be presenting ProxySQL at FOSDEM and at Percona Live / MySQL Conference , and I will soon publish a series of small tutorials to describe how to use.


As a preface for the following small tutorial, below are the results of some simple benchmarks I run using a small cluster on AWS with 2 DB servers running MariaDB+Galera on c3.large instances, and 1 client server running sysybench+proxysql on c3.2xlarge instance.


After an installation and minimal configuration , these are the results running sysbench directly against Galera cluster , or against ProxySQL connected to Galera:


sysbench --report-interval=10 --test=oltp.lua \
--oltp-tables-count=1 --oltp-table-size=100000 --rand-init=on \
--oltp-read-only=off --rand-type=uniform --max-requests=0 \
--mysql-user=root --mysql-password=password \
--mysql-port=3306 --mysql-host=server1,server2 \
--mysql-table-engine=innodb --max-time=300 --num-threads=64 \
--oltp-auto-inc=off --db-ps-mode=disable \
--oltp-order-ranges=50 --oltp-distinct-ranges=50 run


sysbench --report-interval=10 --test=oltp.lua \
--oltp-tables-count=1 --oltp-table-size=100000 --rand-init=on \
--oltp-read-only=off --rand-type=uniform --max-requests=0 \
--mysql-user=root --mysql-password=password \
--mysql-socket=/tmp/proxysql.sock \
--mysql-table-engine=innodb --max-time=300 --num-threads=64 \
--oltp-auto-inc=off --db-ps-mode=disable \
--oltp-order-ranges=50 --oltp-distinct-ranges=50 run






The capability of ProxySQL to cache result sets improves performance out of the box!
But that is not all ProxySQL can do! “on-the-fly rewrite of queries” is perhaps the feature that can improve performance even more than caching result sets.


sysbench run many of queries like the follow:

SELECT c FROM sbtest WHERE id between N and M ORDER BY c
SELECT DISTINCT c FROM sbtest WHERE id BETWEEN N and M ORDER BY c


As an example, assume sysbench is one of these applications that run not optimal queries, and after an analysis of the code you note that the application is requesting data in which sorting and filtering aren’t useful for the final purpose. Nonetheless they are included in the code, generating a lot of load to the database system bringing down its performance.
Rewriting these queries is not an easy task because they are generated by ORM, or because you don’t khow to edit the code itself.


ProxySQL can easily solve this problem. In fact, in this example, I configured ProxySQL to rewrite the above queries in:

SELECT c FROM sbtest WHERE id between N and M



Running again benchmark with query rewrite enabled:




The graphs don’t need further comments: caching can be a huge performance boost, but rewriting queries on-the-fly is one of the most interesting features of ProxySQL !
This is a huge value for any setup using ORM or third party softwares that automatically generated bogus or bloated queries.


Don’t miss the coming tutorials, and come and to my talks at FOSDEM and Percona Live .

9 comments:

  1. Impressive.

    Is it stable enough to be used in a production environment?

    You didn't include a link to the project: https://github.com/renecannao/proxysql

    ReplyDelete
    Replies
    1. Thanks Yermo.

      It is still under active development and I consider it still in alpha.
      In the next few days I will publish some tutorial on how to configure and use it: please feel free to try it in your testing environment.

      Link added

      Delete
  2. Very nice and promise great!
    In a simple scenario where there are instances of Application, two instances of proxySQL and the database , how the HA of proxySQL will be guarantee in case one instance of the proxy fail?

    thanks

    ReplyDelete
    Replies
    1. The answer to this question needs an article on its own.
      ProxySQL is developed with caching as part of its core, like many other caching proxies (HTTP and DNS just to name the most popular).
      For this reason, the best location to run ProxySQL is as close as possible to the application. Therefore you can run ProxySQL on the same server where the application is running: this improves communication between application and proxy, and makes application+proxysql a single unit. The problem of HA is now solved.
      But you can also have multiple layers of ProxySQL ...

      Delete
    2. That is okay for when the software itself is stable..the problem is that proxysql is so unstable it crashes every 15 minutes right now (seems like whenever I run a prepare-exec for a specific write query) in a batch process..and then the whole app stack just hangs..

      Delete
    3. @Anonymous:
      there is a known issue with prepared statements and connection pooling.
      Please try the Hebe branch (https://github.com/renecannao/proxysql/tree/Hebe) that should solve this problem.
      If the problem isn't solved in Hebe branch, please open an issue on https://github.com/renecannao/proxysql/issues I will make sure I follow up on this issue.
      Thanks!

      Delete
  3. Is ProxySQL giving me the option to split my records from a-f in HG0, g-k in HG1, etc.. to reduce load on servers?

    ReplyDelete
  4. Hi Rene
    Does ProxySQL is still extending? or It had been put away?
    Because I had not seen any commits from last year to now

    ReplyDelete
    Replies
    1. Hi J G C,
      the old version is on code freeze.
      New development is on https://github.com/sysown/proxysql-0.2/ : a version built from scratch and still in alpha stage

      Delete