Tuesday, December 29, 2015

On ProxySQL, MaxScale, Persistent Connection, response time, and bugs

Few days ago I came across the announcement that MaxScale 1.3 finally supports Persistent Connection.
ProxySQL supports persistent connection since it was a prototype (sometime back in 2013), therefore I am very happy that the MaxScale Team finally introduced a long waited feature.
Although, MaxScale implementation of persistent connection has a serious drawback and I would consider it as a serious bug (more details at the end of the article). A bug so serious that shouldn't reach any GA release.

Since I like running benchmark, and due the new feature in MaxScale, I thought it is a good time to compare again ProxySQL vs MaxScale, around 6 months after a previous benchmark .

Benchmark : ProxySQL vs MaxScale


Benchmark setup


The benchmark setup is very simple:
- a physical server (20 CPU cores) running sysbench , proxysql and maxscale
- a physical server (20 CPU cores) running 3 mysqld instances (1 master and 2 slaves) launched using MySQL Sandbox

Software version used:
- MySQL 5.6.28
- 0.4.12
- ProxySQL v1.1.0 stable
- MaxScale 1.3.0-beta

Configuration files can be found at the follow URLs:
ProxySQL
MaxScale

ProxySQL configuration was completed running these commands through the admin interface:

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply) VALUES(1,1,'^SELECT.*FOR UPDATE$', 1, 1);
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply) VALUES(2,1,'^SELECT', 2, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;


Preliminary benchmarks confirm the conclusion of my previous blog post : MaxScale is very CPU intensive, therefore to make a fair comparison between the two proxies I ran both of them with only 1 worker thread.


Benchmarks were executed running sysbench against the local proxy (either ProxySQL or MaxScale) , using this command:

sysbench --max-requests=0 --test=oltp --mysql-user=rcannao --mysql-password=rcannao \
--mysql-db=test --oltp-table-size=1000000 --oltp-read-only=on --oltp-point-selects=1 \
--oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 \
--oltp-skip-trx=off --db-ps-mode=disable --max-time=60 \
--oltp-reconnect-mode=transaction --mysql-host=10.1.1.164 \
--num-threads=$thr  --mysql-port=$port run

What is important to to note is that the workload is read-only , it performs only point selects, and it will reconnect at the end of each transaction : this is a workload meant to check the performance benefit of Persistent Connection .


The benchmark will compare:
- ProxySQL configured with read/write split
- MaxScale with readwritesplit module (RW)
- MaxScale with readconnroute module (RR)
- MaxScale with readwritesplit module and persistent connection (RW-PC)
- MaxScale with readconnroute module and persistent connection (RR-PC)


Benchmark result



Here the graph of the benchmark result about throughput:



There are a lot of information, but also some unanswered questions.
At very low concurrency, ProxySQL is slightly slower.
At 64 connections, ProxySQL and MaxScale RW-PC have very similar throughput, and that is great since these two configurations have similar behaviors.
Always at 64 connections it seems that MaxScale without Persistent Connection has reached its maximum throughput: as throughput with Persistent Connection is higher, we can already conclude that this feature is indeed useful and improves performance. MaxScale RR-PC will continue giving more throughput than the others, but this is expected as this routing module is very simple.

At 256 connections, throughput of ProxySQL and MaxScale RR-PC are the only two that continue growing. That means that the other configurations have saturated 1 core and are unable to scale anymore, while ProxySQL continues providing all its feature and scales with just 1 core.

At 1024 connections, all proxies configurations have a drop in performance. Although the drop in performance in ProxySQL is marginal, the drop in performance in MaxScale is severe.
This confirms ProxySQL's ability to scale.


What about response time?





From this graph of response time we can note that at high concurrency ProxySQL is able to provide the better response time. Let's remove the response time for 1024 connections and compare at lower concurrency:



What about maximum response time? This is really interesting:



No blue columns ... did I forgot to add the response time of ProxySQL? No, I didn't forget, but the max response time of MaxScale is too high for a proper comparison.
Starting at 64 connections, the maximum response time of MaxScale becomes so high that it reaches 60 seconds: this matches the max time in sysbench, therefore we should conclude that at least one (or perhaps more) connection created by sysbench isn't able to complete a transaction until the end of the benchmark.
That needs to be further validated by its developers, but it seems that at high concurrency (even if 64 connections shouldn't be considered "high") MaxScale is only processing a subset of connections while completely ignoring others. If that's correct, this should be considered as a serious bug.

For further testing, I rerun sysbench with 256 connections (not a lot, at all!) against MaxScale for 600 seconds , and max response times was 600 seconds : that is, at least one transaction (maybe more) wasn't processed until all the other transactions were completed.
I am sure nobody wants a transaction stuck for and undefined amount of time because the proxy is busy processing other transactions.

For reference, here the graph of max response times without the off the charts values:





Severe bug in Persistent Connection


The release note of MaxScale 1.3.0 hides a very important detail that is available only in the Administration Tutorial , that I report here for reference:

Please note that because persistent connections have previously been in use, they may give a different environment from a fresh connection. For example, if the previous use of the connection issued "use mydatabase" then this setting will be carried over into the reuse of the same connection. [...] In exceptional cases this feature could be a problem.

If I read it correctly, this is not a feature but a severe series of bugs.
More details below.

MaxScale doesn't track current schema


$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE()" test
+------------+
| DATABASE() |
+------------+
| test       |
+------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE()" mysql
+------------+
| DATABASE() |
+------------+
| test       |
+------------+

I assume nobody wants this to happen : the second client believes to connect to schema "mysql" , but in reality it is connected to schema "test" .
Unless your application is using only one schema, I strongly discourage the use of persistent connection.

MaxScale doesn't track charset and returns incorrect encoding


Let's try to identify another possible problem:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | utf8                           |
+------------+--------------------------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SET NAMES latin1" mysql

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | latin1                         |
+------------+--------------------------------+

In this example the current database is always incorrect (as already pointed out previously), but also the character set is compromised/corrupted. This can be a serious issue for many application, as MaxScale is ignoring the charset as specific by the client.
That is, MaxScale is ignoring schemaname and charset as specific during the initial handshake.


MaxScale doesn't track autocommit


Same applies for autocommit ...

$ mysql -u rcannao -prcannao -h 10.1.1..164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SET autocommit=0"

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+


Here we can see another major issue/bug : an application could issue statements assuming autocommit=ON (the default) while in reality another client could have change it.

MaxScale doesn't track transactions


I think this is perhaps the most serious bugs of how Persistent Connection are implemented in MaxScale.
Without MaxScale, when a client disconnects its transaction should be rolled back.
Let's see what happens with MaxScale and Persistent Connection.

First, we create a transaction the way many applications do: SET autocommit=0 , followed by any DML :

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SET autocommit=0"

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+

So far, we have modified autocommit in a Persistent connection.
Second, let's run some SELECT statement:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT COUNT(*) FROM sbtest" test
ERROR 1046 (3D000) at line 1: No database selected

Ops, error ... I forgot that MaxScale ignores my request for a default schema ... Now I must specify it in the query itself!

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT COUNT(*) FROM test.sbtest" test
+----------+
| COUNT(*) |
+----------+
|  1000000 |
+----------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT COUNT(*) FROM test.sbtest WHERE id < 1000" test
+----------+
| COUNT(*) |
+----------+
|      999 |
+----------+

All looks good so far. Let me check if there are active transactions:
 
$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE
$

No active transactions, that's good. Now, let's run a DML statement ...
 
$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "DELETE FROM test.sbtest WHERE id < 1000"
$

Let me check again if there are active transactions ...

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE
---TRANSACTION 2253315, ACTIVE 29 sec
$

Here is the bug! The client that issued the DML statement and started the transaction is gone/disconnected, yet MaxScale is holding a transaction open.
The bad news is that MaxScale doesn't track transaction no matter if they are started due to autocommit or an explicit START TRANSACTION . Here an example (after restarting maxscale) :

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "START TRANSACTION; SELECT 1 FROM test.sbtest LIMIT 1" test
+---+
| 1 |
+---+
| 1 |
+---+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE
---TRANSACTION 2253317, ACTIVE 2 sec

$


ProxySQL provides a safe environment with Persistent Connetion


By comparison, ProxySQL has a more mature implementation of Persistent Connection, and keeps track of the environment set by the client, ensuring that the environments of backend and frontend match.

ProxySQL tracks current schema


ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks the schema as specific by the client:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE()" test
+------------+
| DATABASE() |
+------------+
| test       |
+------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE()" mysql
+------------+
| DATABASE() |
+------------+
| mysql      |
+------------+

ProxySQL tracks character set


ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks the character set as specific by the client:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | utf8                           |
+------------+--------------------------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SET NAMES latin1" mysql

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | utf8                           |
+------------+--------------------------------+

ProxySQL tracks autocommit


Also in this case, ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks the value of autocommit as specific by the client:

$ mysql -u rcannao -prcannao -h 10.22.20.164 -P6033 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SET autocommit=0"

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+  

ProxySQL tracks transactions


Also in this case, ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks transactions, terminating them if required:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "START TRANSACTION; SELECT 1 FROM test.sbtest LIMIT 1" test
+---+
| 1 |
+---+
| 1 |
+---+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE

$

 

Why ProxySQL implements better Persistent Connection?


Since its initial implementation 2 years ago, ProxySQL was designed to handle frontends (clients) and backends (servers) as different entities, only connects them when needed, and remove the link between them as soon as possible.
When a client connects to ProxySQL no connection to any backend is established. When a client issues a request ProxySQL determines if the request needs a connection to the backend or not, and only if required it forwards the request to a backend. As soon as the request is completed, ProxySQL determines if the connection to the backend is still required, and if not it returns it to a connection pool.
That is: the connection to the backend is returned to the connection pool not when the client disconnects, but when the request is completed, assuming that it can be reused by other clients (for example if there are no active transactions).
Similarly, when a backend connection is linked to a client connection, ProxySQL will ensure that the environment is set correctly : schema, charset, autocommit, etc .
In other words, ProxySQL doesn't just implement Persistent Connection, but also Connection Multiplexing. In fact, ProxySQL can handle hundreds of thousands of clients, yet forward all their traffic to few connections to the backend.

As a final note, I invite everybody to try ProxySQL , now GA , and feel free to contact me for any question.

11 comments:

  1. Hi René,

    Thanks for the post and for the time you spent in testing ProxySQL and MaxScale - this is extremely valuable!

    It is indeed sad to see some of the issues you described on MaxScale. But before throwing everything away and jumping on alternative bandwagons, I would consider few points:

    1. What you describe is mostly handled in plugins. The RWSplit plugin has always been a kind of "Cinderella plugin", whilst the Read Conn Router has been consistently developed, mostly by Massimiliano Pinto, one of the members of the team.

    2. I still believe the core of MaxScale is the right approach to a proxy system. I discussed some of these points with the team at Oracle and I reiterated the obvious: you may want to use a proxy to avoid bottlenecks, but if you are not careful you will add a bigger bottleneck instead. This is the main reasons behind some technical and architectural choices at MaxScale.

    3. Based on point 2, I tried to compare your previous benchmarks, but it looks like most of the tests are processed differently (or at least presented differently). It would be good to take an older version of MaxScale - I would go back 12-18 months - and test it with your last testbed (i.e. the one you have used in your blog post). At least the Read Conn Router.

    4. I do not think the value of MaxScale (and ProxySQL indeed) resides in its use with 1 Master and 2 Slaves. You should have a much bigger cluster and a way heavier workload to appreciate the value of a proxy. This is a key point, because the use of proxies, at least for r/w splitting and connection routing, is highly exaggerated by marketing fluff. I believe proxies are essential components of a modern DB infrastructure, but I do not think the way they are usually proposed makes sense. To prove/disprove my point: test a single MySQL (and a 1M/1S for HA) and compare it against a cluster with 1M/1S, 1M/2S, 1M/3S and so on. At some point you will see some benefits in performance (highly dependent on the workloads, and sysbench is indeed not the best example to reproduce typical workloads), then it is a matter of needs and price/performance.

    Thanks!
    -ivan

    ReplyDelete
    Replies
    1. Hi Ivan,

      Thank you for your feedback.

      The issues I described I think can be grouped in two categories:
      a) performance
      b) functionality

      About performance:
      I know the architecture of MaxScale is structured using a core and several plugins around it. Without digging into the code or debugging it, I cannot identify where the issues are, but I believe that if a transaction isn't able to complete for 10 minutes there is something wrong in the core, not in the plugin. This is just an hypothesis, I am not sure of it.

      Still about performance, I absolutely agree with the fact that a proxy can become a bottleneck if not well designed, and architectural choices are very important.
      I also full agree that sysbench is not the best example to reproduce typical workload, but as pointed at the beginning of the post I wanted to benchmark the benefit of Persistent Connection for an extreme case (yet not unusual) of an application that connects, sends 1-2 queries and disconnect. As said, I am happy to know that MaxScale finally has Persistent Connection and that this feature can really boost performance for a specific type of workload.
      Although, this performance boost come with a not small set of bugs, in my opinion quite severe bugs.
      These are functionality issues, and assuming that the core doesn't track any state connection state, theses are bugs in the plugin(s).

      Finally, I fully agree with your point #4 .
      Proxies (no matter which one) should indeed be an essential components of a modern DB infrastructure, and be able to handle a very large amount of clients and servers.
      In fact, this is what driving my development of ProxySQL: be able to handle a very large infrastructure.
      For reference, a single instance of ProxySQL was tested to handle up to 150.000 client connections and 60.000 backend connections to 1.200 servers.
      More details on this deserves its own blog post.

      Thanks!

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Ivan,

      while answering Vilho I checked perf for both proxies.

      ProxySQL:
      # Overhead Shared Object
      # ........ ...................
      #
      66.80% [kernel.kallsyms]
      25.53% proxysql
      4.93% libc-2.15.so
      1.47% libpthread-2.15.so
      0.59% libcrypto.so.1.0.0
      0.26% perf-24134.map
      0.21% librt-2.15.so
      0.21% libstdc++.so.6.0.16
      0.00% ld-2.15.so


      MaxScale:
      # Overhead Shared Object
      # ........ ............................
      #
      87.82% maxscale
      7.90% [kernel.kallsyms]
      1.65% libc-2.15.so
      1.40% libreadwritesplit.so
      0.56% libpthread-2.15.so
      0.19% libMySQLClient.so
      0.15% libquery_classifier.so.1.0.0
      0.11% perf-22137.map
      0.09% libMySQLBackend.so
      0.08% liblog_manager.so.1.0.0
      0.05% libcrypto.so.1.0.0
      0.02% librt-2.15.so
      0.00% ld-2.15.so
      0.00% libmysqlmon.so
      0.00% libnss_files-2.15.so


      I think we can conclude that the bottleneck is not in plugin (libreadwritesplit.so), but in the core

      Delete
  2. Hi René, thanks for very interesting post.
    I wonder why MaxScale only used single server thread? It is designed to handle multiple requests in parallel, thus, one would assume that using multiple threads on multi-core environment would perform better. MaxScale is CPU intensive but that's a good thing as long as cycles are used for something useful. High CPU usage should be expected when multiple clients are served simultaneously. Thus, high CPU usage often hints about efficient use of resources.
    I have no experience on the version you tested but in general, assigning at least two cores for MaxScale should give better results both in terms of throughput and latency when tested with multiple concurrent clients.

    Regards

    Vilho Raatikka

    ReplyDelete
    Replies
    1. Hi Vilho,

      I agree that high CPU usage itself isn't something bad, but for this specific workload I noticed that MaxScale tends to use more CPU that sysbench itself.

      I ran few more tests with multiple worker threads both in ProxySQL and MaxScale (using readwritesplit and persistent connection), here the results:

      With 2 threads and 1024 connections
      - MaxScale uses 200% cpu, sysbench 50% cpu : throughput is 2533 trx/s
      - ProxySQL uses 133% cpu, sysbench 133% cpu : throughput is 7695 trx/s

      With 4 threads and 1024 connections:
      - MaxScale uses 400% cpu, sysbench 90% cpu : throughput is 3293 trx/s
      - ProxySQL uses 266% cpu, sysbench 266% cpu : throughput is 14863 trx/s

      With 8 threads and 1024 connections:
      - MaxScale uses 795% cpu, sysbench 85% cpu : throughput is 3425 trx/s
      - ProxySQL uses 510% cpu, sysbench 466% cpu : throughput is 26109 trx/s

      As you can see from these results, with 2 threads ProxySQL is 3 times faster than MaxScale, while at 8 threads ProxySQL is 7.6x faster than MaxScale, and no matter the number of threads ProxySQL uses less CPU to achieve these results.

      Interesting, MaxScale uses as many CPU cores as the number of threads.
      According to perf, this is probably caused by some spin loop (63.2% in spinlock_acquire) .

      Thank you.

      Delete
  3. Hi Renè, thanks. It is indeed interesting and without knowing better, it looks a bit like lock collision.

    ReplyDelete
  4. Doesn't work right of course, but they are more than happy to sell it to you. Monty and company never change.

    ReplyDelete
  5. Does ProxySQL have own HA so as to prevent it from being a single point of failure please?
    Thanks

    ReplyDelete
    Replies
    1. James,
      ProxySQL is meant to be an agent, not a centralized manager manager.
      That means you can install it everywhere, even a single instance for each application host, therefore removing any single point of failure.
      You can also deploy multiple layers of ProxySQL : for example, 1 instance for each application host, and all these instances connect to a cluster of proxies . Because ProxySQL automatically detects when a backend fails, the proxy in the application host will automatically redirect the traffic to another proxy.
      Please have a look at http://www.slideshare.net/renecannao/proxysql-high-performance-and-ha-proxy-for-mysql/ , starting from slide 44.
      Thanks

      Delete
    2. Hi René,

      Thanks a lot for the advice. Shall try ProxySQL some time. I am also learning MaxScale and Fabric - looks to me they both lack HA.

      With best regards
      James

      Delete