Saturday, July 16, 2016

Galera awareness and ProxySQL Scheduler

One of the most common request for ProxySQL it to support Galera cluster.
Support for Galera requires a lot of features to be added to ProxySQL , from checking if a node is in sync state, if it is a garbd process, auto-detect of new nodes, send writes to only one node,  remote DC support, and more.
Furthermore, as Galera is a clustering solution and not all setups have the same requirements, to support all the possible configurations seemed to be a too complex task.

Support for Galera in all its possible configurations wasn't the only request that seems to be complex to implement in the core of ProxySQL, but also additional requests not related to Galera, like changing the weight on a node while overloaded (as measured by different metrics! CPU usage, disk IO, number of threads) or removing a node from a pool while a backup was running.

I always advised that ProxySQL is easily configurable, and anybody can write a script that configures it at runtime through the Admin interface. The script can have any complex logic or algorithm, and then configures ProxySQL accordingly. In short, I always advised that ProxySQL should be a core component, and tools can be built around it.
But there was still a missing part: a glue to link ProxySQL and the external script(s).
For this reason, recently ProxySQL supported a new feature: jobs Scheduler.
Scheduler is a cron-like solution embedded directly inside ProxySQL. For more details, please have a look at the documentation.

Scheduler can run any sort of external executable, therefore we can write a script that monitor all backends of a specific hostgroup (a Galera cluster), remove hosts that are not in sync and re-add them once in sync.
A ready to use script is distributed with ProxySQL itself: proxysql_galera_checker.sh .
The script is inspired by Percona clustercheck, and accepts 3 arguments mandatory and one optional:

$ ./proxysql_galera_checker.sh
Usage: ./proxysql_galera_checker.sh <hostname> <port> <hostgroup_id> <log_file>

When running the script we need to pass ProxySQL's hostname and the port where the Admin interface is running. The script will connect (using hardcoded credential) to the Admin interface, it will retrieves credentials to connect to the backends and a list of backends for a specific hostgroup, it will check the value of wsrep_local_state on all these backends, and if they change status will automatically reconfigure ProxySQL.

As an example, we have a Galera cluster with 3 nodes:



Admin> SELECT hostgroup_id,hostname,port,status FROM mysql_servers;
+--------------+-------------+------+--------+
| hostgroup_id | hostname    | port | status |
+--------------+-------------+------+--------+
| 0            | 10.10.10.11 | 3306 | ONLINE |
| 0            | 10.10.10.12 | 3306 | ONLINE |
| 0            | 10.10.10.13 | 3306 | ONLINE |
+--------------+-------------+------+--------+
3 rows in set (0.00 sec)




Next, we need to configure the scheduler to run the script at regular interval. In this example, we will run the job every 10000 milliseconds (10 seconds):



Admin> SHOW CREATE TABLE scheduler\G
*************************** 1. row ***************************
       table: scheduler
Create Table: CREATE TABLE scheduler (
id INTEGER NOT NULL,
interval_ms INTEGER CHECK (interval_ms>=100 AND interval_ms<=100000000) NOT NULL,
filename VARCHAR NOT NULL,
arg1 VARCHAR,
arg2 VARCHAR,
arg3 VARCHAR,
arg4 VARCHAR,
arg5 VARCHAR,
PRIMARY KEY(id))
1 row in set (0.00 sec)

Admin> INSERT INTO scheduler(id,interval_ms,filename,arg1,arg2,arg3,arg4) VALUES
(1,'10000','/var/lib/proxysql/proxysql_galera_checker.sh','127.0.0.1','6032','0',
'/var/lib/proxysql/proxysql_galera_checker.log');
Query OK, 1 row affected (0.00 sec)

Admin> select * from scheduler\G
*************************** 1. row ***************************
         id: 1
interval_ms: 10000
   filename: /var/lib/proxysql/proxysql_galera_checker.sh
       arg1: 127.0.0.1
       arg2: 6032
       arg3: 0
       arg4: /var/lib/proxysql/proxysql_galera_checker.log
       arg5: NULL
1 row in set (0.00 sec)

Admin> SELECT * FROM runtime_scheduler;
Empty set (0.00 sec)


At this stage the job is not loaded at runtime yet. So next we must run LOAD SCHEDULER TO RUNTIME:

Admin> LOAD SCHEDULER TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

Admin> select * from runtime_scheduler\G
*************************** 1. row ***************************
         id: 1
interval_ms: 10000
   filename: /var/lib/proxysql/proxysql_galera_checker.sh
       arg1: 127.0.0.1
       arg2: 6032
       arg3: 0
       arg4: /var/lib/proxysql/proxysql_galera_checker.log
       arg5: NULL
1 row in set (0.00 sec)


The first test we will run is to shutdown a node:

vagrant@n3:~$ sudo service mysql stop
 * Stopping MySQL (Percona XtraDB Cluster) mysqld


From the error log of this check, we can see that the script detected that a node is not reachable, and set it to offline:

vagrant@n1:~$ sudo tail /var/lib/proxysql/proxysql_galera_checker.log -n 20
Sat Jul 16 18:41:33 UTC 2016 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:34 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:44 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:44 UTC 2016 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:44 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:54 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:54 UTC 2016 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:41:54 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:42:04 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:42:04 UTC 2016 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:42:04 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:42:14 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:42:14 UTC 2016 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state
Sat Jul 16 18:42:14 UTC 2016 Changing server 10.10.10.12:3306 to status OFFLINE_SOFT
Sat Jul 16 18:42:14 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:42:25 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:42:25 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:42:25 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4


Also mysql_servers confirms the change:

Admin> SELECT hostgroup_id,hostname,port,status FROM mysql_servers;
+--------------+-------------+------+--------------+
| hostgroup_id | hostname    | port | status       |
+--------------+-------------+------+--------------+
| 0            | 10.10.10.11 | 3306 | ONLINE       |
| 0            | 10.10.10.12 | 3306 | OFFLINE_SOFT |
| 0            | 10.10.10.13 | 3306 | ONLINE       |
+--------------+-------------+------+--------------+
3 rows in set (0.00 sec)


To continue our tests, we will set a host in desync mode:

mysql> show global variables like 'wsrep_desync';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_desync  | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

mysql> set global wsrep_desync=ON;
Query OK, 0 rows affected (0.05 sec)

mysql> show global variables like 'wsrep_desync';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_desync  | ON    |
+---------------+-------+
1 row in set (0.00 sec)


Also in this case the script will notice it:

vagrant@n1:~$ sudo tail /var/lib/proxysql/proxysql_galera_checker.log -n 20
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:48:01 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:48:01 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:48:11 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:48:11 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:48:11 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:48:21 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:48:21 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:48:21 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:48:31 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 2
Sat Jul 16 18:48:31 UTC 2016 Changing server 10.10.10.11:3306 to status OFFLINE_SOFT
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:48:31 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:48:31 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:48:41 UTC 2016 Check server 10.10.10.11:3306 , status OFFLINE_SOFT , wsrep_local_state 2
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:48:42 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:48:42 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4


And mysql_servers will confirm the change:

Admin> SELECT hostgroup_id,hostname,port,status FROM mysql_servers;
+--------------+-------------+------+--------------+
| hostgroup_id | hostname    | port | status       |
+--------------+-------------+------+--------------+
| 0            | 10.10.10.11 | 3306 | OFFLINE_SOFT |
| 0            | 10.10.10.12 | 3306 | OFFLINE_SOFT |
| 0            | 10.10.10.13 | 3306 | ONLINE       |
+--------------+-------------+------+--------------+
3 rows in set (0.00 sec)


To continue this proof of concept, we will now put the last node in sync:

mysql> show global variables like 'wsrep_desync';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_desync  | ON    |
+---------------+-------+
1 row in set (0.00 sec)

mysql> set global wsrep_desync=OFF;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like 'wsrep_desync';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_desync  | OFF   |
+---------------+-------+
1 row in set (0.00 sec)


The script will detect it that the node is back ONLINE:

vagrant@n1:~$ sudo tail /var/lib/proxysql/proxysql_galera_checker.log -n 20
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:49:53 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:49:53 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:50:03 UTC 2016 Check server 10.10.10.11:3306 , status OFFLINE_SOFT , wsrep_local_state 2
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:50:03 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:50:03 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:50:13 UTC 2016 Check server 10.10.10.11:3306 , status OFFLINE_SOFT , wsrep_local_state 2
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:50:13 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:50:13 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:50:23 UTC 2016 Check server 10.10.10.11:3306 , status OFFLINE_SOFT , wsrep_local_state 2
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:50:24 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:50:24 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:50:34 UTC 2016 Check server 10.10.10.11:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Sat Jul 16 18:50:34 UTC 2016 Changing server 10.10.10.11:3306 to status ONLINE
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:50:34 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:50:34 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4


And correctly reconfigure mysql_servers:

Admin> SELECT hostgroup_id,hostname,port,status FROM mysql_servers;
+--------------+-------------+------+--------------+
| hostgroup_id | hostname    | port | status       |
+--------------+-------------+------+--------------+
| 0            | 10.10.10.11 | 3306 | ONLINE       |
| 0            | 10.10.10.12 | 3306 | OFFLINE_SOFT |
| 0            | 10.10.10.13 | 3306 | ONLINE       |
+--------------+-------------+------+--------------+
3 rows in set (0.01 sec)


Finally, we will restart the node shutdown previously:

vagrant@n3:~$ sudo service mysql start
 * Starting MySQL (Percona XtraDB Cluster) database server mysqld                                                                                                                                             [ OK ]


Once again, the script will detect that a node is now healthy:

vagrant@n1:~$ sudo tail /var/lib/proxysql/proxysql_galera_checker.log -n 20
Sat Jul 16 18:51:15 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:51:25 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:51:25 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:51:25 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:51:35 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:51:35 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:51:35 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:51:45 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
ERROR 2003 (HY000): Can't connect to MySQL server on '10.10.10.12' (111)
Sat Jul 16 18:51:45 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state
Sat Jul 16 18:51:45 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:51:55 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:51:55 UTC 2016 Check server 10.10.10.12:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Sat Jul 16 18:51:55 UTC 2016 Changing server 10.10.10.12:3306 to status ONLINE
Sat Jul 16 18:51:55 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:52:06 UTC 2016 Check server 10.10.10.11:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:52:06 UTC 2016 Check server 10.10.10.12:3306 , status ONLINE , wsrep_local_state 4
Sat Jul 16 18:52:06 UTC 2016 Check server 10.10.10.13:3306 , status ONLINE , wsrep_local_state 4


And correctly reconfigure mysql_servers:

Admin> SELECT hostgroup_id,hostname,port,status FROM mysql_servers;
+--------------+-------------+------+--------+
| hostgroup_id | hostname    | port | status |
+--------------+-------------+------+--------+
| 0            | 10.10.10.11 | 3306 | ONLINE |
| 0            | 10.10.10.12 | 3306 | ONLINE |
| 0            | 10.10.10.13 | 3306 | ONLINE |
+--------------+-------------+------+--------+
3 rows in set (0.00 sec)


Finally, do not forget to save the job to disk:

Admin> SAVE SCHEDULER TO DISK;
Query OK, 0 rows affected (0.01 sec)


Conclusion

Complete Support for Galera Cluster in ProxySQL is still not ready.
Although, the ability of ProxySQL to be easily reconfigurable at runtime, together with its Scheduler, will quickly extend ProxySQL capabilities, include a better integration with Galera in complex setups.

Saturday, June 18, 2016

Query rewrite with ProxySQL : use case scenario

One of the interesting features of ProxySQL is the support of query rewrite.
In fact, this functionality was one of the main motivations for writing a proxy that sits between the application and the database server to rewrite queries on the fly, without the need to have a developer involved in rewriting the query ASAP (if ever possible!).


In this short post I will describe a use case scenario on how to use ProxySQL to troubleshoot performance and quickly rewrite queries if needed.
The setup in this specific blog post is Nylas platform, a sharded environment where there are over 80 ProxySQL instances running.

After noticing some odd workload originated from Nylas' open source sync engine , as per the attached graph on InnoDB rows read:


We could ask ProxySQL to get the list of the most time consuming queries:


Or even ask which is the query with the highest average execution time:



Note that we could have get the same information from information_schema.events_statements_summary_by_digest , but we already knew that the load was coming from a specific node so we queried stats.stats_mysql_query_digest inside ProxySQL itself.

From the output above, it was obvious that the problem was with a specific query type running at that time.

Without copying again the whole query, the query could be simplified as:

SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 IN (? , ?)
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ? 


col2 and col3 are not indexes because do not provide enough cardinality, while idx_col1 does, so it is indexed.
Because of the WHERE clauses and the ORDER BY , mysqld processes it performing a range scan on the primary key.
The optimizer could be smart enough to optimize this query, but it is not.

The query can be rewritten as:

SELECT * FROM (
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
UNION ALL
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
) t ORDER BY PK LIMIT ?

How to do this with ProxySQL? Extremely easy!
The snippet below shows how to create a rule to rewrite the query, load the new rule into runtime, and persist the new rule on disk:



From the following graph of InnoDB rows read it is immediately clear the effect of the new query rewrite:



In conclusion: query rewrite is a feature that, depending from the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.

Tuesday, December 29, 2015

On ProxySQL, MaxScale, Persistent Connection, response time, and bugs

Few days ago I came across the announcement that MaxScale 1.3 finally supports Persistent Connection.
ProxySQL supports persistent connection since it was a prototype (sometime back in 2013), therefore I am very happy that the MaxScale Team finally introduced a long waited feature.
Although, MaxScale implementation of persistent connection has a serious drawback and I would consider it as a serious bug (more details at the end of the article). A bug so serious that shouldn't reach any GA release.

Since I like running benchmark, and due the new feature in MaxScale, I thought it is a good time to compare again ProxySQL vs MaxScale, around 6 months after a previous benchmark .

Benchmark : ProxySQL vs MaxScale


Benchmark setup


The benchmark setup is very simple:
- a physical server (20 CPU cores) running sysbench , proxysql and maxscale
- a physical server (20 CPU cores) running 3 mysqld instances (1 master and 2 slaves) launched using MySQL Sandbox

Software version used:
- MySQL 5.6.28
- 0.4.12
- ProxySQL v1.1.0 stable
- MaxScale 1.3.0-beta

Configuration files can be found at the follow URLs:
ProxySQL
MaxScale

ProxySQL configuration was completed running these commands through the admin interface:

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply) VALUES(1,1,'^SELECT.*FOR UPDATE$', 1, 1);
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply) VALUES(2,1,'^SELECT', 2, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;


Preliminary benchmarks confirm the conclusion of my previous blog post : MaxScale is very CPU intensive, therefore to make a fair comparison between the two proxies I ran both of them with only 1 worker thread.


Benchmarks were executed running sysbench against the local proxy (either ProxySQL or MaxScale) , using this command:

sysbench --max-requests=0 --test=oltp --mysql-user=rcannao --mysql-password=rcannao \
--mysql-db=test --oltp-table-size=1000000 --oltp-read-only=on --oltp-point-selects=1 \
--oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 \
--oltp-skip-trx=off --db-ps-mode=disable --max-time=60 \
--oltp-reconnect-mode=transaction --mysql-host=10.1.1.164 \
--num-threads=$thr  --mysql-port=$port run

What is important to to note is that the workload is read-only , it performs only point selects, and it will reconnect at the end of each transaction : this is a workload meant to check the performance benefit of Persistent Connection .


The benchmark will compare:
- ProxySQL configured with read/write split
- MaxScale with readwritesplit module (RW)
- MaxScale with readconnroute module (RR)
- MaxScale with readwritesplit module and persistent connection (RW-PC)
- MaxScale with readconnroute module and persistent connection (RR-PC)


Benchmark result



Here the graph of the benchmark result about throughput:



There are a lot of information, but also some unanswered questions.
At very low concurrency, ProxySQL is slightly slower.
At 64 connections, ProxySQL and MaxScale RW-PC have very similar throughput, and that is great since these two configurations have similar behaviors.
Always at 64 connections it seems that MaxScale without Persistent Connection has reached its maximum throughput: as throughput with Persistent Connection is higher, we can already conclude that this feature is indeed useful and improves performance. MaxScale RR-PC will continue giving more throughput than the others, but this is expected as this routing module is very simple.

At 256 connections, throughput of ProxySQL and MaxScale RR-PC are the only two that continue growing. That means that the other configurations have saturated 1 core and are unable to scale anymore, while ProxySQL continues providing all its feature and scales with just 1 core.

At 1024 connections, all proxies configurations have a drop in performance. Although the drop in performance in ProxySQL is marginal, the drop in performance in MaxScale is severe.
This confirms ProxySQL's ability to scale.


What about response time?





From this graph of response time we can note that at high concurrency ProxySQL is able to provide the better response time. Let's remove the response time for 1024 connections and compare at lower concurrency:



What about maximum response time? This is really interesting:



No blue columns ... did I forgot to add the response time of ProxySQL? No, I didn't forget, but the max response time of MaxScale is too high for a proper comparison.
Starting at 64 connections, the maximum response time of MaxScale becomes so high that it reaches 60 seconds: this matches the max time in sysbench, therefore we should conclude that at least one (or perhaps more) connection created by sysbench isn't able to complete a transaction until the end of the benchmark.
That needs to be further validated by its developers, but it seems that at high concurrency (even if 64 connections shouldn't be considered "high") MaxScale is only processing a subset of connections while completely ignoring others. If that's correct, this should be considered as a serious bug.

For further testing, I rerun sysbench with 256 connections (not a lot, at all!) against MaxScale for 600 seconds , and max response times was 600 seconds : that is, at least one transaction (maybe more) wasn't processed until all the other transactions were completed.
I am sure nobody wants a transaction stuck for and undefined amount of time because the proxy is busy processing other transactions.

For reference, here the graph of max response times without the off the charts values:





Severe bug in Persistent Connection


The release note of MaxScale 1.3.0 hides a very important detail that is available only in the Administration Tutorial , that I report here for reference:

Please note that because persistent connections have previously been in use, they may give a different environment from a fresh connection. For example, if the previous use of the connection issued "use mydatabase" then this setting will be carried over into the reuse of the same connection. [...] In exceptional cases this feature could be a problem.

If I read it correctly, this is not a feature but a severe series of bugs.
More details below.

MaxScale doesn't track current schema


$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE()" test
+------------+
| DATABASE() |
+------------+
| test       |
+------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE()" mysql
+------------+
| DATABASE() |
+------------+
| test       |
+------------+

I assume nobody wants this to happen : the second client believes to connect to schema "mysql" , but in reality it is connected to schema "test" .
Unless your application is using only one schema, I strongly discourage the use of persistent connection.

MaxScale doesn't track charset and returns incorrect encoding


Let's try to identify another possible problem:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | utf8                           |
+------------+--------------------------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SET NAMES latin1" mysql

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | latin1                         |
+------------+--------------------------------+

In this example the current database is always incorrect (as already pointed out previously), but also the character set is compromised/corrupted. This can be a serious issue for many application, as MaxScale is ignoring the charset as specific by the client.
That is, MaxScale is ignoring schemaname and charset as specific during the initial handshake.


MaxScale doesn't track autocommit


Same applies for autocommit ...

$ mysql -u rcannao -prcannao -h 10.1.1..164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SET autocommit=0"

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+


Here we can see another major issue/bug : an application could issue statements assuming autocommit=ON (the default) while in reality another client could have change it.

MaxScale doesn't track transactions


I think this is perhaps the most serious bugs of how Persistent Connection are implemented in MaxScale.
Without MaxScale, when a client disconnects its transaction should be rolled back.
Let's see what happens with MaxScale and Persistent Connection.

First, we create a transaction the way many applications do: SET autocommit=0 , followed by any DML :

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SET autocommit=0"

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+

So far, we have modified autocommit in a Persistent connection.
Second, let's run some SELECT statement:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT COUNT(*) FROM sbtest" test
ERROR 1046 (3D000) at line 1: No database selected

Ops, error ... I forgot that MaxScale ignores my request for a default schema ... Now I must specify it in the query itself!

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT COUNT(*) FROM test.sbtest" test
+----------+
| COUNT(*) |
+----------+
|  1000000 |
+----------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SELECT COUNT(*) FROM test.sbtest WHERE id < 1000" test
+----------+
| COUNT(*) |
+----------+
|      999 |
+----------+

All looks good so far. Let me check if there are active transactions:
 
$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE
$

No active transactions, that's good. Now, let's run a DML statement ...
 
$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "DELETE FROM test.sbtest WHERE id < 1000"
$

Let me check again if there are active transactions ...

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE
---TRANSACTION 2253315, ACTIVE 29 sec
$

Here is the bug! The client that issued the DML statement and started the transaction is gone/disconnected, yet MaxScale is holding a transaction open.
The bad news is that MaxScale doesn't track transaction no matter if they are started due to autocommit or an explicit START TRANSACTION . Here an example (after restarting maxscale) :

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "START TRANSACTION; SELECT 1 FROM test.sbtest LIMIT 1" test
+---+
| 1 |
+---+
| 1 |
+---+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P3307 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE
---TRANSACTION 2253317, ACTIVE 2 sec

$


ProxySQL provides a safe environment with Persistent Connetion


By comparison, ProxySQL has a more mature implementation of Persistent Connection, and keeps track of the environment set by the client, ensuring that the environments of backend and frontend match.

ProxySQL tracks current schema


ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks the schema as specific by the client:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE()" test
+------------+
| DATABASE() |
+------------+
| test       |
+------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE()" mysql
+------------+
| DATABASE() |
+------------+
| mysql      |
+------------+

ProxySQL tracks character set


ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks the character set as specific by the client:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | utf8                           |
+------------+--------------------------------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SET NAMES latin1" mysql

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SELECT DATABASE(), @@session.character_set_client" mysql --default-character-set=utf8
+------------+--------------------------------+
| DATABASE() | @@session.character_set_client |
+------------+--------------------------------+
| mysql      | utf8                           |
+------------+--------------------------------+

ProxySQL tracks autocommit


Also in this case, ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks the value of autocommit as specific by the client:

$ mysql -u rcannao -prcannao -h 10.22.20.164 -P6033 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SET autocommit=0"

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SHOW VARIABLES LIKE 'autocommit'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+  

ProxySQL tracks transactions


Also in this case, ProxySQL isn't affected by the same bug of MaxScale , and correctly tracks transactions, terminating them if required:

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "START TRANSACTION; SELECT 1 FROM test.sbtest LIMIT 1" test
+---+
| 1 |
+---+
| 1 |
+---+

$ mysql -u rcannao -prcannao -h 10.1.1.164 -P6033 -e "SHOW ENGINE INNODB STATUS\G" test | grep ACTIVE

$

 

Why ProxySQL implements better Persistent Connection?


Since its initial implementation 2 years ago, ProxySQL was designed to handle frontends (clients) and backends (servers) as different entities, only connects them when needed, and remove the link between them as soon as possible.
When a client connects to ProxySQL no connection to any backend is established. When a client issues a request ProxySQL determines if the request needs a connection to the backend or not, and only if required it forwards the request to a backend. As soon as the request is completed, ProxySQL determines if the connection to the backend is still required, and if not it returns it to a connection pool.
That is: the connection to the backend is returned to the connection pool not when the client disconnects, but when the request is completed, assuming that it can be reused by other clients (for example if there are no active transactions).
Similarly, when a backend connection is linked to a client connection, ProxySQL will ensure that the environment is set correctly : schema, charset, autocommit, etc .
In other words, ProxySQL doesn't just implement Persistent Connection, but also Connection Multiplexing. In fact, ProxySQL can handle hundreds of thousands of clients, yet forward all their traffic to few connections to the backend.

As a final note, I invite everybody to try ProxySQL , now GA , and feel free to contact me for any question.

Tuesday, September 22, 2015

ProxySQL Announcement : version 0.2 is stable and production ready

Long time has passed since the first time ProxySQL was announced in October 2013 .
A lot of changes happened from that day: ProxySQL was initially just a prototype, and had many many limitations.
Quickly these limitations became a showstopper for future development, but that was expected from a simple prototype.
Since then it was completely rewritten, and after almost 2 years I am glad to announce that ProxySQL is now a proxy with a long list of features, stable and production ready.
Please join me and attend my session at Percona Live where I will present the new exciting features of ProxySQL , a High Availability proxy designed to solve real issues of MySQL setups from small to very large production environments.

Tuesday, September 15, 2015

ProxySQL tutorial : seamless replication switchover in less than one second

In the previous blog post we setup a replication topology with one master and 2 slaves, and we installed proxysql on a forth servers where the application should connect to.
The post was quite long, and we covered several topics, from installation and configuration of the cluster to installation of proxysql and creation of users and backends, from configuration of query rules for routing and their statistics to monitoring module and replication lag.

But ProxySQL can do more than this.
Among the most interesting features of proxysql is the ability to hold traffic until a backend is available (within a configurable time limit).
For instance, if a query should be send to hostgroup 0 but there is no servers in hostgroup 0, ProxySQL will wait until a server is available in hostgroup 0 or a timeout expires.

This feature has the very interesting effect that using ProxySQL it is possible to perform a planned failover without any error to the application.
In a nutshell, the process is the following:
  1. ProxySQL is configured to disable the master from hostgroup 0 ;
  2. the failover is performed ;
  3. ProxySQL is configured to demote the old master to become a slave, and the new master is moved to hostgroup 0.
Let see these steps in details.

1. ProxySQL is configured to disable the master from hostgroup 0


To make an example, we can run something like this:

vagrant@app:~$ export OldMaster='192.168.124.101'
vagrant@app:~$ export NewMaster='192.168.124.102' 
vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\" AND hostgroup_id=0; LOAD MYSQL SERVERS TO RUNTIME;"

What the above command is simple: the master is disabled (status=OFFLINE_SOFT).
At this stage, no more queries can be sent to old master.


2. Perform the failover


The reason why we configured GTID for replication is to simplify the failover process. In fact, we can use MySQL Utilities to perform the failover. Here an example:

vagrant@app:~$ mysqlrpladmin --discover-slaves-login=root:root --rpl-user=repl:replpass --master=root:root@$OldMaster --new-master=root:root@$NewMaster --demote-master switchover
# Discovering slaves for master at 192.168.124.101:3306
# Discovering slave at 192.168.124.102:3306
# Found slave: 192.168.124.102:3306
# Discovering slave at 192.168.124.103:3306
# Found slave: 192.168.124.103:3306
# Checking privileges.
# Performing switchover from master at 192.168.124.101:3306 to slave at 192.168.124.102:3306.
# Checking candidate slave prerequisites.
# Checking slaves configuration to master.
# Waiting for slaves to catch up to old master.
# Stopping slaves.
# Performing STOP on all slaves.
# Demoting old master to be a slave to the new master.
# Switching slaves to new master.
# Starting all slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Switchover complete.
#
# Replication Topology Health:
+------------------+-------+---------+--------+------------+---------+
| host             | port  | role    | state  | gtid_mode  | health  |
+------------------+-------+---------+--------+------------+---------+
| 192.168.124.102  | 3306  | MASTER  | UP     | ON         | OK      |
| 192.168.124.101  | 3306  | SLAVE   | UP     | ON         | OK      |
| 192.168.124.103  | 3306  | SLAVE   | UP     | ON         | OK      |
+------------------+-------+---------+--------+------------+---------+
# ...done.
vagrant@app:~$


3. ProxySQL is configured to demote the old master to become a slave, and the new master is moved to hostgroup 0


Similarly, we can quickly reconfigure ProxySQL to perform the traffic switch configuring the old master as a slave and enabling it, and configuring the new master.

vagrant@app:~$ (
> echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=1 WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;"
> echo "UPDATE mysql_servers SET hostgroup_id=0 WHERE hostname=\"$NewMaster\" AND hostgroup_id=1;"
> echo "LOAD MYSQL SERVERS TO RUNTIME;"
> ) | mysql -u admin -padmin -h 127.0.0.1 -P6032
Warning: Using a password on the command line interface can be insecure.
vagrant@app:~$



Seems very easy, right?
Let's try to put everything together in a failover script.

vagrant@app:~$ cat ./switchover.sh
#!/bin/sh
if [ $# -ne 2 ]; then
  echo "Usage: ./switchover.sh oldmaster newmaster"
  exit 1
fi
OldMaster=$1
NewMaster=$2

(
echo "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

sleep 1 # let's give some time to current transactions to complete

mysqlrpladmin --discover-slaves-login=root:root --rpl-user=repl:replpass \
  --master=root:root@$OldMaster --new-master=root:root@$NewMaster \
  --demote-master switchover

(
echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=1 WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;"
echo "UPDATE mysql_servers SET hostgroup_id=0 WHERE hostname=\"$NewMaster\" AND hostgroup_id=1;"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

Note that the script above doesn't perform any sanity check or error handling. This is intentionally, to make the script short and easy to understand.


Seamless replication switchover


At this point we have a script (switchover.sh) that is able to perform a switch in a very short period of time.
Let's put it in action, and while running sysbench (below) in another terminal I will be running:
./switchover.sh 192.168.124.102 192.168.124.101
and then:
./switchover.sh 192.168.124.101 192.168.124.102

Here is the output of sysbench while performing 2 switchovers :


vagrant@app:~$ sysbench --report-interval=1 \
--num-threads=4 --max-requests=0 --max-time=20 \
--test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua \
--mysql-user=utest --mysql-password=ptest \
--mysql-host=127.0.0.1 --mysql-port=6033 --mysql-db=test \
--oltp-read-only=off run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored


Threads started!

[   1s] threads: 4, tps: 137.94, reads: 1958.22, writes: 551.78, response time: 42.52ms (95%), errors: 0.00, reconnects:  0.00
[   2s] threads: 4, tps: 156.01, reads: 2205.11, writes: 635.03, response time: 32.96ms (95%), errors: 0.00, reconnects:  0.00
[   3s] threads: 4, tps: 155.93, reads: 2172.96, writes: 612.71, response time: 36.52ms (95%), errors: 0.00, reconnects:  0.00
[   4s] threads: 4, tps: 156.05, reads: 2200.76, writes: 628.22, response time: 32.25ms (95%), errors: 0.00, reconnects:  0.00
[   5s] threads: 4, tps: 75.00, reads: 996.06, writes: 296.02, response time: 30.29ms (95%), errors: 0.00, reconnects:  0.00
[   6s] threads: 4, tps: 12.00, reads: 197.01, writes: 52.00, response time: 1493.45ms (95%), errors: 0.00, reconnects:  0.00
[   7s] threads: 4, tps: 160.99, reads: 2230.92, writes: 639.98, response time: 33.70ms (95%), errors: 0.00, reconnects:  0.00
[   8s] threads: 4, tps: 158.99, reads: 2275.84, writes: 649.95, response time: 30.82ms (95%), errors: 0.00, reconnects:  0.00
[   9s] threads: 4, tps: 160.02, reads: 2240.22, writes: 642.06, response time: 31.89ms (95%), errors: 0.00, reconnects:  0.00
[  10s] threads: 4, tps: 155.99, reads: 2170.91, writes: 612.97, response time: 34.46ms (95%), errors: 0.00, reconnects:  0.00
[  11s] threads: 4, tps: 132.00, reads: 1825.05, writes: 523.01, response time: 39.40ms (95%), errors: 0.00, reconnects:  0.00
[  12s] threads: 4, tps: 157.00, reads: 2207.00, writes: 628.00, response time: 34.01ms (95%), errors: 0.00, reconnects:  0.00
[  13s] threads: 4, tps: 160.99, reads: 2271.83, writes: 643.95, response time: 31.08ms (95%), errors: 0.00, reconnects:  0.00
[  14s] threads: 4, tps: 164.01, reads: 2286.18, writes: 660.05, response time: 30.51ms (95%), errors: 0.00, reconnects:  0.00
[  15s] threads: 4, tps: 102.01, reads: 1405.07, writes: 405.02, response time: 34.48ms (95%), errors: 1.00, reconnects:  0.00
[  16s] threads: 4, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  17s] threads: 4, tps: 152.99, reads: 2187.93, writes: 618.98, response time: 32.53ms (95%), errors: 0.00, reconnects:  0.00
[  18s] threads: 4, tps: 162.00, reads: 2277.96, writes: 652.99, response time: 30.47ms (95%), errors: 0.00, reconnects:  0.00
[  19s] threads: 4, tps: 160.78, reads: 2250.98, writes: 640.14, response time: 33.41ms (95%), errors: 0.00, reconnects:  0.00
[  20s] threads: 4, tps: 160.22, reads: 2225.01, writes: 631.85, response time: 32.45ms (95%), errors: 0.00, reconnects:  0.00
OLTP test statistics:
    queries performed:
        read:                            37604
        write:                           10741
        other:                           5371
        total:                           53716
    transactions:                        2685   (134.13 per sec.)
    read/write requests:                 48345  (2415.02 per sec.)
    other operations:                    5371   (268.30 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          20.0185s
    total number of events:              2685
    total time taken by event execution: 80.0231s
    response time:
         min:                                 13.86ms
         avg:                                 29.80ms
         max:                               1505.34ms
         approx.  95 percentile:              34.40ms

Threads fairness:
    events (avg/stddev):           671.2500/4.15
    execution time (avg/stddev):   20.0058/0.00

The provided script, using proxysql and mysqlrpladmin, is able to perform a switch in a very short period of time (roughly 1.5 second) without stopping the application and without generating any error.
The application (sysbench in this case) doesn't report any error/failure!
Yet, it is possible to note that sysbench pauses at seconds 5-6 and again at second ~16 . Furthermore, sysbench also reports that the maximum response time is 1505ms : this is exactly the time it took to perform a switch. The trade off for no error is increased latency, within a configurable threshold.

ProxySQL is able to give us some more information about what happened during the switch:

mysql> SELECT max_time, digest_text FROM stats_mysql_query_digest ORDER BY max_time DESC LIMIT 3;
+----------+-------------------------------------+
| max_time | digest_text                         |
+----------+-------------------------------------+
| 1523629  | BEGIN                               |
| 157867   | COMMIT                              |
| 151759   | UPDATE sbtest1 SET k=k+? WHERE id=? |
+----------+-------------------------------------+
3 rows in set (0.00 sec)

BEGIN is the statement that was the slowest among all (1523ms). Let's see why.
When a server is put into OFFLINE_SOFT mode, new incoming connections aren't accepted anymore, while the existing connections are used until they became inactive. In other words, connections are kept in use until the current transaction is completed. This allows to gracefully detach a backend.
With the sysbench executed above, all statements are executed within a transaction that starts with BEGIN , therefore during the failover only the BEGIN statements were blocked: the other statements weren't sent yet.


Even faster seamless replication switchover


In the previous switchover script there is a completely arbitrary "sleep 1" to wait transactions to complete.
This means that if any transaction takes longer than 1 second it is killed, but also that it is possible that the switchover if paused for too long and could be faster.

ProxySQL tracks the number of active connections (in use) in stats_mysql_connection_pool.ConnUsed .
Therefore, if SUM(ConnUsed)==0 it means that no connections are in use.
Let's use this information to improve the script, replacing "sleep 1" with a loop waiting that all connections get free within 1 second.

vagrant@app:~$ cat ./switchover.sh
#!/bin/sh
if [ $# -ne 2 ]; then
  echo "Usage: ./switchover.sh oldmaster newmaster"
  exit 1
fi
OldMaster=$1
NewMaster=$2

(
echo "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032


CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND hostgroup=0' -B -N 2> /dev/null`
TRIES=0
while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ]
do
  CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND hostgroup=0' -B -N 2> /dev/null`
  TRIES=$(($TRIES+1))
  if [ $CONNUSED -ne "0" ]; then
    sleep 0.05
  fi
done

mysqlrpladmin --discover-slaves-login=root:root --rpl-user=repl:replpass \
  --master=root:root@$OldMaster --new-master=root:root@$NewMaster \
  --demote-master switchover

(
echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=1 WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;"
echo "UPDATE mysql_servers SET hostgroup_id=0 WHERE hostname=\"$NewMaster\" AND hostgroup_id=1;"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032


We will now re-run sysbench, while on another terminal we perform 2 switchover:

vagrant@app:~$ sysbench --report-interval=1 --num-threads=4 --max-requests=0 --max-time=20 --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua --mysql-user=utest --mysql-password=p
test --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-db=test --oltp-read-only=off run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored


Threads started!

[   1s] threads: 4, tps: 132.01, reads: 1885.13, writes: 532.04, response time: 40.72ms (95%), errors: 0.00, reconnects:  0.00
[   2s] threads: 4, tps: 139.95, reads: 1955.36, writes: 555.82, response time: 37.97ms (95%), errors: 0.00, reconnects:  0.00
[   3s] threads: 4, tps: 130.06, reads: 1820.88, writes: 524.25, response time: 43.67ms (95%), errors: 0.00, reconnects:  0.00
[   4s] threads: 4, tps: 138.02, reads: 1935.22, writes: 550.06, response time: 38.49ms (95%), errors: 0.00, reconnects:  0.00
[   5s] threads: 4, tps: 75.00, reads: 1013.98, writes: 297.99, response time: 41.33ms (95%), errors: 0.00, reconnects:  0.00
[   6s] threads: 4, tps: 103.00, reads: 1474.96, writes: 417.99, response time: 46.80ms (95%), errors: 0.00, reconnects:  0.00
[   7s] threads: 4, tps: 140.00, reads: 1955.00, writes: 557.00, response time: 37.95ms (95%), errors: 0.00, reconnects:  0.00
[   8s] threads: 4, tps: 137.99, reads: 1938.81, writes: 548.95, response time: 38.18ms (95%), errors: 0.00, reconnects:  0.00
[   9s] threads: 4, tps: 135.89, reads: 1919.42, writes: 543.55, response time: 36.65ms (95%), errors: 0.00, reconnects:  0.00
[  10s] threads: 4, tps: 133.09, reads: 1846.30, writes: 536.38, response time: 41.17ms (95%), errors: 0.00, reconnects:  0.00
[  11s] threads: 4, tps: 141.03, reads: 1970.43, writes: 560.12, response time: 35.89ms (95%), errors: 0.00, reconnects:  0.00
[  12s] threads: 4, tps: 137.00, reads: 1926.02, writes: 548.01, response time: 39.55ms (95%), errors: 0.00, reconnects:  0.00
[  13s] threads: 4, tps: 141.99, reads: 1961.80, writes: 567.94, response time: 37.37ms (95%), errors: 0.00, reconnects:  0.00
[  14s] threads: 4, tps: 125.01, reads: 1777.13, writes: 504.04, response time: 43.84ms (95%), errors: 0.00, reconnects:  0.00
[  15s] threads: 4, tps: 95.88, reads: 1336.31, writes: 379.52, response time: 59.99ms (95%), errors: 0.00, reconnects:  0.00
[  16s] threads: 4, tps: 122.16, reads: 1716.26, writes: 495.65, response time: 46.88ms (95%), errors: 0.00, reconnects:  0.00
[  17s] threads: 4, tps: 136.99, reads: 1917.90, writes: 543.97, response time: 42.31ms (95%), errors: 0.00, reconnects:  0.00
[  18s] threads: 4, tps: 125.01, reads: 1734.07, writes: 499.02, response time: 45.23ms (95%), errors: 0.00, reconnects:  0.00
[  19s] threads: 4, tps: 135.00, reads: 1912.97, writes: 541.99, response time: 44.89ms (95%), errors: 0.00, reconnects:  0.00
[  20s] threads: 4, tps: 127.00, reads: 1773.07, writes: 512.02, response time: 48.10ms (95%), errors: 0.00, reconnects:  0.00
OLTP test statistics:
    queries performed:
        read:                            35784
        write:                           10224
        other:                           5112
        total:                           51120
    transactions:                        2556   (127.71 per sec.)
    read/write requests:                 46008  (2298.83 per sec.)
    other operations:                    5112   (255.43 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          20.0137s
    total number of events:              2556
    total time taken by event execution: 79.9988s
    response time:
         min:                                 16.54ms
         avg:                                 31.30ms
         max:                                687.98ms
         approx.  95 percentile:              43.43ms

Threads fairness:
    events (avg/stddev):           639.0000/5.48
    execution time (avg/stddev):   19.9997/0.01

The switchover are at second ~5 and ~15 . The complete failover switches took around 0.58s , of which ~0.51s running mysqlrpladmin while ~0.07s to reconfigure proxysql .

Conclusion


In this tutorial we described how it is possible to use ProxySQL to perform a master switchover and reconfiguration of all the slaves in less than one second.
Using ProxySQL to perform a master switchover provides extremely high availability and without returning any error to the application. That is: the application won't be aware that a switchover happened as it doesn't need to be reconfigured, and neither it will get any error. All the application will notice is a subsecond latency while the switchover is running.


Last, a friendly reminder that I will be speaking at PerconaLive Amsterdam on 23rd September about ProxySQL : please come to me session and ask all the questions you have!

Monday, September 14, 2015

ProxySQL Tutorial : setup in a MySQL replication topology

With the imminent release of ProxySQL as stable version, and its presentation at PerconaLive Amsterdam , I decide to write few blog posts to describe how to install, configure and use it without digging too much through documentation.
Moreover, practical examples are often easier to understand.

This tutorial is a long one, although none of the topics are discussed in details to avoid making it even longer (or boring).

For this tutorial, we will use a small setup with 4 nodes:
- node1 (192.168.124.101) , mysql master
- node2 (192.168.124.102) , mysql slave
- node3 (192.168.124.103) , mysql slave
- app (192.168.124.104) , application server where proxysql will also run


Replication setup


Without going into details on how to configure mysqld in the nodes (this is outside the scope of this tutorial), the nodes have already mysqld up and running, configure to support gtid, but replication is not running yet. We will configure replication during this tutorial from "app".

Create root user user,  replication user, setup replication, and create application user

First, let's create the root user to connect from app:
vagrant@app:~$ for i in 1 2 3 ; do
ssh node$i -- 'mysql -uroot -proot -e "GRANT ALL PRIVILEGES ON *.* TO root@app IDENTIFIED BY \"root\" WITH GRANT OPTION" -NB'
done
Warning: Using a password on the command line interface can be insecure.
Warning: Using a password on the command line interface can be insecure.
Warning: Using a password on the command line interface can be insecure.

These warnings are annoying, let's remove them and try our new user:
vagrant@app:~$ echo -e "[client]\npassword=root" > .my.cnf
vagrant@app:~$ for i in 1 2 3 ; do mysql -u root -h node$i -e "SELECT CURRENT_USER()" -NB ; done
root@app
root@app
root@app

Create replication user:

vagrant@app:~$ mysql -u root -h node1 -e "GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.124.%' IDENTIFIED BY 'replpass'"

Setup and start replication:

vagrant@app:~$ for i in 2 3 ; do mysql -u root -h node$i -e "CHANGE MASTER TO MASTER_HOST='192.168.124.101' , MASTER_USER='repl', MASTER_PASSWORD='replpass', MASTER_AUTO_POSITION=1; START SLAVE;" ; done 

At this stage, we assume replication is up and running (please double check this in your setup).
Application needs a user to connect to the database. We will create a simple user with credential "utest"/"ptest"

vagrant@app:~$ mysql -u root -h node1 -e "GRANT ALL PRIVILEGES ON test.* TO utest@app IDENTIFIED BY \"ptest\""
vagrant@app:~$ mysql -u root -h node1 -e "CREATE DATABASE IF NOT EXISTS test"

Now that the user is created, we can run some test to verify that we can connect everywhere and that replication is working using sysbench to generate data on master, and run a readonly workload on a slave:

vagrant@app:~$ sysbench --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua --mysql-user=utest --mysql-password=ptest --mysql-host=node1 --mysql-db=test prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 10000 records into 'sbtest1'

vagrant@app:~$ sysbench --max-requests=0 --max-time=10 --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua --mysql-user=utest --mysql-password=ptest --mysql-host=node2 --mysql-db=
test --oltp-read-only=on run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Threads started!

OLTP test statistics:
    queries performed:
        read:                            22638
        write:                           0
        other:                           3234
        total:                           25872
    transactions:                        1617   (161.63 per sec.)
    read/write requests:                 22638  (2262.88 per sec.)
    other operations:                    3234   (323.27 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.0041s
    total number of events:              1617
    total time taken by event execution: 9.9958s
    response time:
         min:                                  4.96ms
         avg:                                  6.18ms
         max:                                212.22ms
         approx.  95 percentile:               6.66ms

Threads fairness:
    events (avg/stddev):           1617.0000/0.00
    execution time (avg/stddev):   9.9958/0.00



Download and install ProxySQL


Now that we know that replication is up and running correctly and users have been created, it is time to download and install ProxySQL on app server.
Precompiled  binaries and packages of ProxySQL are available at https://github.com/sysown/proxysql-binaries , for Centos7 , Ubuntu12 and Ubuntu14 .

This is a Ubuntu14 system, so once we download the package we can install it with dpkg :

vagrant@app:~$ wget -q https://github.com/sysown/proxysql-binaries/raw/master/binaries/Ubuntu14/proxysql_0.2.0902-ubuntu14_amd64.deb

vagrant@app:~$ sudo dpkg -i proxysql_0.2.0902-ubuntu14_amd64.deb
Selecting previously unselected package proxysql.
(Reading database ... 61067 files and directories currently installed.)
Preparing to unpack proxysql_0.2.0902-ubuntu14_amd64.deb ...
Unpacking proxysql (0.2.0902) ...
Setting up proxysql (0.2.0902) ...
update-rc.d: warning: /etc/init.d/proxysql missing LSB information
update-rc.d: see 
 Adding system startup for /etc/init.d/proxysql ...
   /etc/rc0.d/K20proxysql -> ../init.d/proxysql
   /etc/rc1.d/K20proxysql -> ../init.d/proxysql
   /etc/rc6.d/K20proxysql -> ../init.d/proxysql
   /etc/rc2.d/S20proxysql -> ../init.d/proxysql
   /etc/rc3.d/S20proxysql -> ../init.d/proxysql
   /etc/rc4.d/S20proxysql -> ../init.d/proxysql
   /etc/rc5.d/S20proxysql -> ../init.d/proxysql
Processing triggers for ureadahead (0.100.0-16) ...

vagrant@app:~$ 


ProxySQL installs an basic config file in /etc/proxysql.cnf , and uses /var/run/proxysql as its datadir (working dir).
Although,  /etc/proxysql.cnf is not configured with any mysql servers to be used as backend, neither it is configured with any mysql users to connect to such backend. Therefore we either edit the config file before starting proxysql, or we configure users and servers directly from within proxysql.
Although, the config file /etc/proxysql.cnf has one user configured: the user required to connect to the admin interface (details later).
Note that proxysql doesn't parse its config file after the first time it was executed (more details in https://github.com/sysown/proxysql-0.2/blob/master/doc/configuration_system.md), therefore the preferred way to configure proxysql is not through its config file but through its admin interface (more details in https://github.com/sysown/proxysql-0.2/blob/master/doc/configuration_system.md and https://github.com/sysown/proxysql-0.2/blob/master/doc/admin_tables.md) .


Start ProxySQL

We can start proxysql using service:

vagrant@app:~$ sudo service proxysql start
Starting ProxySQL: DONE!

Checking the content of proxysql's datadir, we can identify: a database file, a log file and a pid file:

vagrant@app:~$ ls -l /var/run/proxysql/
total 36
-rw------- 1 root root 25600 Sep 12 22:01 proxysql.db
-rw------- 1 root root   892 Sep 12 22:01 proxysql.log
-rw-r--r-- 1 root root     5 Sep 12 22:01 proxysql.pid

Note that the database file is the file that stores all the configuration related to proxysql. If this file is present, the config file is not parsed because the database file is the preferred source of configuration.

From the list of processes running we can see two proxysql processes:
vagrant@app:~$ ps aux | grep proxysql | grep -v grep
root      3589  0.0  0.6  35700  5036 ?        S    22:01   0:00 proxysql -c /etc/proxysql.cnf -D /var/run/proxysql
root      3590  0.0  1.3  85148 10112 ?        Sl   22:01   0:00 proxysql -c /etc/proxysql.cnf -D /var/run/proxysql

The reason behind this is that by default proxysql forks at startup, and the parent process is nothing more than an angel process that restarts proxysql if it crashes.

 

Connect to ProxySQL Admin Interface

As said previously, proxysql.cnf is read just the very first time that proxysql is started.
In proxysql.cnf , when was read for the first time, ProxySQL admin interface was configured as:
admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
}

Therefore, to connect to ProxySQL admin interface we should use those information.
Let's connect to proxysql and run few commands:

vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 0
Server version: 5.1.30 (ProxySQL Admin Module)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
mysql> SHOW DATABASES;
+-----+---------+-------------------------------+
| seq | name    | file                          |
+-----+---------+-------------------------------+
| 0   | main    |                               |
| 2   | disk    | /var/run/proxysql/proxysql.db |
| 3   | stats   |                               |
| 4   | monitor |                               |
+-----+---------+-------------------------------+
4 rows in set (0.00 sec)

mysql> SHOW TABLES;
+-------------------+
| tables            |
+-------------------+
| mysql_servers     |
| mysql_users       |
| mysql_query_rules |
| global_variables  |
| mysql_collations  |
+-------------------+
5 rows in set (0.01 sec)

The "main" database represents the in-memory configuration, while "disk" represents the on-disk configuration. More details here: https://github.com/sysown/proxysql-0.2/blob/master/doc/configuration_system.md
The "mysql_*" tables are described at https://github.com/sysown/proxysql-0.2/blob/master/doc/admin_tables.md , and at this stage we can verify that they are empty:

mysql> SELECT * FROM mysql_users;
Empty set (0.00 sec)

mysql> SELECT * FROM mysql_servers;
Empty set (0.01 sec)

mysql> SHOW CREATE TABLE mysql_users\G
*************************** 1. row ***************************
       table: mysql_users
Create Table: CREATE TABLE mysql_users (
username VARCHAR NOT NULL,
password VARCHAR,
active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
use_ssl INT CHECK (use_ssl IN (0,1)) NOT NULL DEFAULT 0,
default_hostgroup INT NOT NULL DEFAULT 0,
default_schema VARCHAR,
schema_locked INT CHECK (schema_locked IN (0,1)) NOT NULL DEFAULT 0,
transaction_persistent INT CHECK (transaction_persistent IN (0,1)) NOT NULL DEFAULT 0,
fast_forward INT CHECK (fast_forward IN (0,1)) NOT NULL DEFAULT 0,
backend INT CHECK (backend IN (0,1)) NOT NULL DEFAULT 1,
frontend INT CHECK (frontend IN (0,1)) NOT NULL DEFAULT 1,
max_connections INT CHECK (max_connections >=0) NOT NULL DEFAULT 10000,
PRIMARY KEY (username, backend),
UNIQUE (username, frontend))
1 row in set (0.01 sec)

mysql> SHOW CREATE TABLE mysql_servers\G
*************************** 1. row ***************************
       table: mysql_servers
Create Table: CREATE TABLE mysql_servers (
hostgroup_id INT NOT NULL DEFAULT 0,
hostname VARCHAR NOT NULL,
port INT NOT NULL DEFAULT 3306,
status VARCHAR CHECK (UPPER(status) IN ('ONLINE','SHUNNED','OFFLINE_SOFT', 'OFFLINE_HARD')) NOT NULL DEFAULT 'ONLINE',
weight INT CHECK (weight >= 0) NOT NULL DEFAULT 1,
compression INT CHECK (compression >=0 AND compression <= 102400) NOT NULL DEFAULT 0,
max_connections INT CHECK (max_connections >=0) NOT NULL DEFAULT 1000,
max_replication_lag INT CHECK (max_replication_lag >= 0 AND max_replication_lag <= 126144000) NOT NULL DEFAULT 0,
PRIMARY KEY (hostgroup_id, hostname, port) )
1 row in set (0.00 sec)

Configure backends

We can now start configuring the backend.
As described in previous articles, ProxySQL uses the concept of hostgroup: a hostgroup is a group of host with logical functionalities.
For example, you can have the production master in hostgroup0, all the production slaves in hostgroup1, reporting slaves in hostgroup2, DR slaves in hostgroup3 , etc etc .
For this tutorial, we will use just 2 hostgroups:
  • hostgroup0 for the master
  • hostgroup1 for the slaves
Configuring backends in ProxySQL is as easy as inserting records into mysql_servers representing such backends, specifying the correct hostgroup_id based on their roles:

mysql> INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (0,'192.168.124.101',3306);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (1,'192.168.124.102',3306);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (1,'192.168.124.103',3306);
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM mysql_servers;
+--------------+-----------------+------+--------+--------+-------------+-----------------+---------------------+
| hostgroup_id | hostname        | port | status | weight | compression | max_connections | max_replication_lag |
+--------------+-----------------+------+--------+--------+-------------+-----------------+---------------------+
| 0            | 192.168.124.101 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   |
| 1            | 192.168.124.102 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   |
| 1            | 192.168.124.103 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   |
+--------------+-----------------+------+--------+--------+-------------+-----------------+---------------------+
3 rows in set (0.00 sec)

Note that at this stage the backends are yet not activated, they are only configured. To load such configuration at runtime we need to issue a LOAD command:

mysql> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)

These configuration changes won't persist after proxysql is shutdown because they are all in-memory. To persist these configuration changes we need to issue the correct SAVE command to save these changes onto on-disk database configuration:

mysql> SELECT * FROM disk.mysql_servers;
Empty set (0.00 sec)

mysql> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM disk.mysql_servers;
+--------------+-----------------+------+--------+--------+-------------+-----------------+---------------------+
| hostgroup_id | hostname        | port | status | weight | compression | max_connections | max_replication_lag |
+--------------+-----------------+------+--------+--------+-------------+-----------------+---------------------+
| 0            | 192.168.124.101 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   |
| 1            | 192.168.124.102 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   |
| 1            | 192.168.124.103 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   |
+--------------+-----------------+------+--------+--------+-------------+-----------------+---------------------+
3 rows in set (0.00 sec)


Configure users

This is very similar to how we configured backends:

mysql> INSERT INTO mysql_users (username,password) VALUES ('utest','ptest');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM mysql_users\G
*************************** 1. row ***************************
              username: utest
              password: ptest
                active: 1
               use_ssl: 0
     default_hostgroup: 0
        default_schema: NULL
         schema_locked: 0
transaction_persistent: 0
          fast_forward: 0
               backend: 1
              frontend: 1
       max_connections: 10000
1 row in set (0.00 sec) 
 
mysql> LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec) 
 
mysql> SELECT * FROM disk.mysql_users;
Empty set (0.00 sec)

mysql> SAVE MYSQL USERS TO DISK;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM disk.mysql_users\G
*************************** 1. row ***************************
              username: utest
              password: ptest
                active: 1
               use_ssl: 0
     default_hostgroup: 0
        default_schema: NULL
         schema_locked: 0
transaction_persistent: 0
          fast_forward: 0
               backend: 1
              frontend: 1
       max_connections: 10000
1 row in set (0.00 sec)
mysql> exit
Bye



Testing connectivity via proxysql


To test if the application is able to send queries to the backend through proxysql we can run any query against proxysql. For example:

vagrant@app:~$ mysql -u utest -pptest -h 127.0.0.1 -P6033 -e "SELECT @@hostname"
Warning: Using a password on the command line interface can be insecure.
+------------+
| @@hostname |
+------------+
| node1      |
+------------+

In this example, the SELECT statement is sent to node1, and it will be always sent to node1, because there is no query rule that defines routing for such query (mysql_query_rules table is still empty), therefore the default_hostgroup for user "utest" applies (that is "0" according to mysql_users), and the query is sent to a host that has hostgroup_id=0 (that is "node1" according to mysql_servers).

We can run a bit more complex test using sysbench pointing to proxysql (host=127.0.0.1, port=6033) :

vagrant@app:~$ sysbench --num-threads=4 --max-requests=0 --max-time=10 \
> --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua \
> --mysql-user=utest --mysql-password=ptest \
> --mysql-host=127.0.0.1 --mysql-port=6033 \
> --mysql-db=test \
> --oltp-read-only=off \
> run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4
Random number generator seed is 0 and will be ignored


Threads started!

OLTP test statistics:
    queries performed:
        read:                            11592
        write:                           3312
        other:                           1656
        total:                           16560
    transactions:                        828    (82.15 per sec.)
    read/write requests:                 14904  (1478.62 per sec.)
    other operations:                    1656   (164.29 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.0797s
    total number of events:              828
    total time taken by event execution: 40.2186s
    response time:
         min:                                 17.03ms
         avg:                                 48.57ms
         max:                                307.84ms
         approx.  95 percentile:              96.42ms

Threads fairness:
    events (avg/stddev):           207.0000/3.08
    execution time (avg/stddev):   10.0546/0.02

vagrant@app:~$


Read write split

As already said in other posts, ProxySQL doesn't implement any read/write algorithm but tries to consider read/write a subset of query routing. Therefore, to implement read/write split, we need to correctly configure query routing.
A very basic read/write split can be implemented sending to slaves all SELECTs without FOR UPDATE, and to send everything else to master. We can configure this through the admin interface:



vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 19
Server version: 5.1.30 (ProxySQL Admin Module)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SELECT * FROM mysql_query_rules\G
Empty set (0.00 sec)

mysql> SHOW CREATE TABLE mysql_query_rules\G
*************************** 1. row ***************************
       table: mysql_query_rules
Create Table: CREATE TABLE mysql_query_rules (
rule_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 0,
username VARCHAR,
schemaname VARCHAR,
flagIN INT NOT NULL DEFAULT 0,
match_pattern VARCHAR,
negate_match_pattern INT CHECK (negate_match_pattern IN (0,1)) NOT NULL DEFAULT 0,
flagOUT INT,
replace_pattern VARCHAR,
destination_hostgroup INT DEFAULT NULL,
cache_ttl INT CHECK(cache_ttl > 0),
reconnect INT CHECK (reconnect IN (0,1)) DEFAULT NULL,
timeout INT UNSIGNED,
delay INT UNSIGNED,
apply INT CHECK(apply IN (0,1)) NOT NULL DEFAULT 0)
1 row in set (0.00 sec)

mysql> INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,'^SELECT.*FOR UPDATE$',0,1);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,'^SELECT',1,1);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM mysql_query_rules\G
*************************** 1. row ***************************
              rule_id: 1
               active: 1
             username: NULL
           schemaname: NULL
               flagIN: 0
        match_pattern: ^SELECT.*FOR UPDATE$
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 0
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
                delay: NULL
                apply: 1
*************************** 2. row ***************************
              rule_id: 2
               active: 1
             username: NULL
           schemaname: NULL
               flagIN: 0
        match_pattern: ^SELECT
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 1
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
                delay: NULL
                apply: 1
2 rows in set (0.00 sec)


mysql> SELECT rule_id, match_pattern,destination_hostgroup hg_id, apply FROM mysql_query_rules WHERE active=1;
+---------+----------------------+-------+-------+
| rule_id | match_pattern        | hg_id | apply |
+---------+----------------------+-------+-------+
| 1       | ^SELECT.*FOR UPDATE$ | 0     | 1     |
| 2       | ^SELECT              | 1     | 1     |
+---------+----------------------+-------+-------+
2 rows in set (0.00 sec)

mysql> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

mysql> SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye

We created rules for SELECTs and SELECT FOR UPDATE : what about all the other queries? When the Query Processor scans the query rules trying to find a match with no success and it reaches the end, it will apply the default_hostgroup for the specific user according to mysql_users entry.
In our case, user "utest" has a default_hostgroup=0 , therefore any query not matching the above rules will be sent to hostgroup 0.

Also note that by default new servers are configured as ONLINE and new users are configured as active. Although, for query rules, the default is active=0 . This is intentional.

Now that we have configured ProxySQL with query routing, let's try it running new queries against proxysql. SELECT FOR UPDATE will be send to master, while other SELECT will be send to one of the slave, completely randomly.

vagrant@app:~$ mysql -u utest -pptest -h 127.0.0.1 -P6033 -e "SELECT @@hostname"
Warning: Using a password on the command line interface can be insecure.
+------------+
| @@hostname |
+------------+
| node2      |
+------------+
vagrant@app:~$ mysql -u utest -pptest -h 127.0.0.1 -P6033 -e "SELECT @@hostname"
Warning: Using a password on the command line interface can be insecure.
+------------+
| @@hostname |
+------------+
| node3      |
+------------+
vagrant@app:~$ mysql -u utest -pptest -h 127.0.0.1 -P6033 -e "SELECT @@hostname FOR UPDATE"
Warning: Using a password on the command line interface can be insecure.
+------------+
| @@hostname |
+------------+
| node1      |
+------------+

And now sysbench:

vagrant@app:~$ sysbench --num-threads=4 --max-requests=0 --max-time=10 \
> --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua \
> --mysql-user=utest --mysql-password=ptest \
> --mysql-host=127.0.0.1 --mysql-port=6033 \
> --mysql-db=test \
> --oltp-read-only=off \
> run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4
Random number generator seed is 0 and will be ignored


Threads started!

OLTP test statistics:
    queries performed:
        read:                            21112
        write:                           6032
        other:                           3016
        total:                           30160
    transactions:                        1508   (150.51 per sec.)
    read/write requests:                 27144  (2709.16 per sec.)
    other operations:                    3016   (301.02 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.0193s
    total number of events:              1508
    total time taken by event execution: 40.0503s
    response time:
         min:                                 15.87ms
         avg:                                 26.56ms
         max:                                215.16ms
         approx.  95 percentile:              38.74ms

Threads fairness:
    events (avg/stddev):           377.0000/1.87
    execution time (avg/stddev):   10.0126/0.00

vagrant@app:~$

sysbench ran successfully, but from this report we cannot understand if read/write split worked.
No problem, proxysql can tell us!

vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 32
Server version: 5.1.30 (ProxySQL Admin Module)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SHOW TABLES FROM stats;
+--------------------------------+
| tables                         |
+--------------------------------+
| stats_mysql_query_rules        |
| stats_mysql_commands_counters  |
| stats_mysql_processlist        |
| stats_mysql_connection_pool    |
| stats_mysql_query_digest       |
| stats_mysql_query_digest_reset |
| stats_mysql_global             |
+--------------------------------+
7 rows in set (0.00 sec)


mysql> SELECT * FROM stats_mysql_connection_pool;
+-----------+-----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+
| hostgroup | srv_host        | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv |
+-----------+-----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+
| 0         | 192.168.124.101 | 3306     | ONLINE | 0        | 4        | 4      | 0       | 32884   | 1991796         | 47602338        |
| 1         | 192.168.124.102 | 3306     | ONLINE | 0        | 4        | 4      | 0       | 9499    | 410383          | 25702366        |
| 1         | 192.168.124.103 | 3306     | ONLINE | 0        | 4        | 4      | 0       | 9457    | 410141          | 26360288        |
+-----------+-----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+
3 rows in set (0.00 sec)

mysql> SELECT hostgroup, SUM(Queries), SUM(Bytes_data_sent), SUM(Bytes_data_recv) FROM stats_mysql_connection_pool GROUP BY hostgroup\G
*************************** 1. row ***************************
           hostgroup: 0
        SUM(Queries): 32884
SUM(Bytes_data_sent): 1991796
SUM(Bytes_data_recv): 47602338
*************************** 2. row ***************************
           hostgroup: 1
        SUM(Queries): 18956
SUM(Bytes_data_sent): 820524
SUM(Bytes_data_recv): 52062654
2 rows in set (0.01 sec)

From stats_mysql_connection_pool it seems queries are distributed across hosts and hostgroups. Yet, this is not enough to say that read/write is working.
Let try a read only workload, and see what it changes.

vagrant@app:~$ sysbench --num-threads=4 --max-requests=0 --max-time=10 --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua --mysql-user=utest --mysql-password=ptest --mysql-host=1
27.0.0.1 --mysql-port=6033 --mysql-db=test --oltp-read-only=on --oltp-skip-trx=on run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4
Random number generator seed is 0 and will be ignored


Threads started!

OLTP test statistics:
    queries performed:
        read:                            32984
        write:                           0
        other:                           0
        total:                           32984
    transactions:                        0      (0.00 per sec.)
    read/write requests:                 32984  (3293.70 per sec.)
    other operations:                    0      (0.00 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.0143s
    total number of events:              2356
    total time taken by event execution: 40.0156s
    response time:
         min:                                  7.62ms
         avg:                                 16.98ms
         max:                                 73.30ms
         approx.  95 percentile:              33.39ms

Threads fairness:
    events (avg/stddev):           589.0000/146.87
    execution time (avg/stddev):   10.0039/0.01


vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT hostgroup, SUM(Queries), SUM(Bytes_data_sent), SUM(Bytes_data_recv) FROM stats_mysql_connection_pool GROUP BY hostgroup
\G"
Warning: Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
           hostgroup: 0
        SUM(Queries): 32884
SUM(Bytes_data_sent): 1991796
SUM(Bytes_data_recv): 47602338
*************************** 2. row ***************************
           hostgroup: 1
        SUM(Queries): 51940
SUM(Bytes_data_sent): 2248260
SUM(Bytes_data_recv): 142653210

It all seems correct. The number of queries sent to hostgroup=0 didn't change, and all queries were sent to hostgroup=1 .

More granular read/write split

We previously said that read/write split is a subset of query routing.
In fact, we can decide the destination of each group of queries. Maybe you don't want to send all SELECT to slaves, but only some specific ones. ProxySQL allows this granularity.

If you don't know what queries are generated by your application, you can check in proxysql what queries are passing through using the table stats_mysql_query_digest . For example:

mysql> SELECT * FROM stats_mysql_query_digest;
+------------+----------+--------------------+----------------------------------------------------------------------+------------+------------+------------+----------+----------+----------+
| schemaname | username | digest             | digest_text                                                          | count_star | first_seen | last_seen  | sum_time | min_time | max_time |
+------------+----------+--------------------+----------------------------------------------------------------------+------------+------------+------------+----------+----------+----------+
| test       | utest    | 0x4760CBDEFAD1519E | BEGIN                                                                | 2592       | 1442148405 | 1442148446 | 1997120  | 0        | 49046    |
| test       | utest    | 0x10634DACE52A0A02 | INSERT INTO sbtest1 (id, k, c, pad) VALUES (?, ?, ?, ?)              | 2592       | 1442148405 | 1442148446 | 2517386  | 377      | 35926    |
| test       | utest    | 0xE75DB8313E268CF3 | SELECT SUM(K) FROM sbtest1 WHERE id BETWEEN ? AND ?+?                | 4948       | 1442148405 | 1442149117 | 5694972  | 365      | 35860    |
| test       | utest    | 0xCCB481C7C198E52B | UPDATE sbtest1 SET k=k+? WHERE id=?                                  | 2592       | 1442148405 | 1442148446 | 4131398  | 531      | 19211    |
| test       | utest    | 0x55319B9EE365BEB5 | DELETE FROM sbtest1 WHERE id=?                                       | 2592       | 1442148405 | 1442148446 | 2694171  | 409      | 14392    |
| test       | utest    | 0xADF3DDF2877EEAAF | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ?+? ORDER BY c          | 4948       | 1442148405 | 1442149117 | 7315147  | 456      | 32245    |
| test       | utest    | 0x704822A0F7D3CD60 | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN ? AND ?+? ORDER BY c | 4948       | 1442148405 | 1442149117 | 10310160 | 677      | 40372    |
| test       | utest    | 0x13781C1DBF001A0C | SELECT c FROM sbtest1 WHERE id=?                                     | 49480      | 1442148405 | 1442149117 | 40873691 | 321      | 33556    |
| test       | utest    | 0x7DD56217AF7A5197 | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ?+?                     | 4948       | 1442148405 | 1442149117 | 5989877  | 365      | 54172    |
| test       | utest    | 0x5A23CA36FB239BC9 | UPDATE sbtest1 SET c=? WHERE id=?                                    | 2592       | 1442148405 | 1442148446 | 3006402  | 519      | 21786    |
| test       | utest    | 0x5DBEB0DD695FBF25 | COMMIT                                                               | 2592       | 1442148405 | 1442148446 | 8335739  | 1209     | 55347    |
+------------+----------+--------------------+----------------------------------------------------------------------+------------+------------+------------+----------+----------+----------+
11 rows in set (0.00 sec)

We can run any sort of clause against this table for a more interesting result. For example, let assume we want to identify the SELECT statements that take most of the time:

mysql> SELECT count_star, sum_time, digest_text FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' ORDER BY sum_time DESC;
+------------+----------+----------------------------------------------------------------------+
| count_star | sum_time | digest_text                                                          |
+------------+----------+----------------------------------------------------------------------+
| 49480      | 40873691 | SELECT c FROM sbtest1 WHERE id=?                                     |
| 4948       | 10310160 | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN ? AND ?+? ORDER BY c |
| 4948       | 7315147  | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ?+? ORDER BY c          |
| 4948       | 5989877  | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ?+?                     |
| 4948       | 5694972  | SELECT SUM(K) FROM sbtest1 WHERE id BETWEEN ? AND ?+?                |
+------------+----------+----------------------------------------------------------------------+
5 rows in set (0.00 sec)

The first type of SELECT is surely the one that is taking more time, for a total of ~40.8s (times in stats_mysql_query_digest are in microseconds).
Once we have identified these, we can decide to disable our generic read/write split and send to slaves only the queries that take most of the times. This is just an example on how proxysql allows fine-tuning query routing, and that read/write split is just a subcase of query routing.

vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 74
Server version: 5.1.30 (ProxySQL Admin Module)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> DELETE FROM mysql_query_rules;
Query OK, 2 rows affected (0.01 sec)

mysql> INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,'^SELECT c FROM sbtest1 WHERE id=\d+$',1,1);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,'^SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN \d+ AND \d+\+\d+ ORDER BY c$',1,1);
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye

Once again, note that while you change the content of table mysql_query_rules, the rules aren't effective until you run LOAD MYSQL QUERY RULES TO RUNTIME . This gives you not only the time to review them before loading them at runtime, but also to create complex chains of rules. Therefore, while you are changing the content of mysql_query_rules, the rules at runtime aren't changed.
As soon as LOAD MYSQL QUERY RULES TO RUNTIME is executed, the currently rules are wiped and replaced with the new ones, that immediately take effect.

How can we ensure that our new rules are matching queries, and routing queries as we expect?
Did we write the regular expression correctly? (regex could be hard to write and debug)
Table stats_mysql_query_rules provides counters to the number of times the specified rules is matched against a query.
For example:

mysql> SELECT rule_id, hits, destination_hostgroup hg, match_pattern pattern FROM mysql_query_rules NATURAL JOIN stats_mysql_query_rules\G
*************************** 1. row ***************************
rule_id: 6
   hits: 19340
     hg: 1
pattern: ^SELECT c FROM sbtest1 WHERE id=\d+$
*************************** 2. row ***************************
rule_id: 7
   hits: 1934
     hg: 1
pattern: ^SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN \d+ AND \d+\+\d+ ORDER BY c$
2 rows in set (0.00 sec)

We have hits, that means that queries are matched against the rules and routed accordingly.



Backends monitoring

To operate MySQL traffic between clients and backends, ProxySQL doesn't need to proactively monitor the health of the backends. When a connection to a backend fails either during the connect phase or during the execution of a query, ProxySQL reacts to such event trying to establish a new connection (possibly on a different node) and eventually retry the query. This is a quite complex topic and needs a blog post on its own, but the main point here is that ProxySQL doesn't need proactive monitoring of backends to determine their availability and health.
Nonetheless, there is a Monitor module that monitors the health of the backends and stores such information in some logging tables.

Without going into all the details of Monitor module, for the scope of this tutorial we will see only few important ones.

Small parenthesis before proceeding further: ProxySQL stores configuration variables in a table called global_variables, recalling it from mysqld. The big difference is that this table is writable and it is the way to reconfigure ProxySQL .

Back to Monitor module...

mysql> SELECT * FROM global_variables WHERE variable_name IN ('mysql-monitor_username','mysql-monitor_password','mysql-monitor_connect_interval','mysql-monitor_ping_interval','mysql-monitor_history');
+--------------------------------+----------------+
| variable_name                  | variable_value |
+--------------------------------+----------------+
| mysql-monitor_connect_interval | 120000         |
| mysql-monitor_history          | 600000         |
| mysql-monitor_password         | monitor        |
| mysql-monitor_ping_interval    | 60000          |
| mysql-monitor_username         | monitor        |
+--------------------------------+----------------+
5 rows in set (0.00 sec)

This means that Monitor module will try to connect to all its backends once every 2 minutes (120000ms) and ping them once every 1 minute (60000ms), storing statistics and log for 10 minutes (600000ms) , using user/password "monitor"/"monitor" .

We never created such user, so currently Monitor module is failing. To solve this we can either create the "monitor" user with the right privileges, or reconfigure Monitor module to use "root".
As an exercise, instead of creating such user we will reconfigure Monitor module.
This is as easy as updating the records in table global_variables and issue a LOAD MYSQL VARIABLES TO RUNTIME .
Again, don't forget to persist the changes to disk with LOAD MYSQL VARIABLES TO DISK .

mysql> UPDATE global_variables SET variable_value="root" WHERE variable_name="mysql-monitor_username";
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE global_variables SET variable_value="root" WHERE variable_name="mysql-monitor_password";
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

mysql> SAVE MYSQL VARIABLES TO DISK;
Query OK, 46 rows affected (0.00 sec)


You can now check the results from Monitor module in tables mysql_server_connect_log and mysql_server_ping_log .

mysql> SHOW TABLES FROM monitor;
+----------------------------------+
| tables                           |
+----------------------------------+
| mysql_server_connect             |
| mysql_server_connect_log         |
| mysql_server_ping                |
| mysql_server_ping_log            |
| mysql_server_replication_lag_log |
+----------------------------------+
5 rows in set (0.00 sec)

mysql> SELECT * FROM monitor.mysql_server_ping_log ORDER BY time_start DESC;
+-----------------+------+------------------+-------------------+------------+
| hostname        | port | time_start       | ping_success_time | ping_error |
+-----------------+------+------------------+-------------------+------------+
| 192.168.124.101 | 3306 | 1442153413013369 | 513               | NULL       |
| 192.168.124.102 | 3306 | 1442153413013369 | 400               | NULL       |
| 192.168.124.103 | 3306 | 1442153413013369 | 353               | NULL       |
| 192.168.124.101 | 3306 | 1442153353012568 | 695               | NULL       |
| 192.168.124.102 | 3306 | 1442153353012568 | 580               | NULL       |
| 192.168.124.103 | 3306 | 1442153353012568 | 520               | NULL       |
| 192.168.124.101 | 3306 | 1442153293011283 | 856               | NULL       |
| 192.168.124.102 | 3306 | 1442153293011283 | 683               | NULL       |
| 192.168.124.103 | 3306 | 1442153293011283 | 550               | NULL       |
| 192.168.124.101 | 3306 | 1442153233009628 | 225               | NULL       |
| 192.168.124.102 | 3306 | 1442153233009628 | 347               | NULL       |
| 192.168.124.103 | 3306 | 1442153233009628 | 347               | NULL       |
+-----------------+------+------------------+-------------------+------------+
12 rows in set (0.01 sec)

mysql> SELECT * FROM monitor.mysql_server_connect_log ORDER BY time_start DESC;
+-----------------+------+------------------+----------------------+---------------+
| hostname        | port | time_start       | connect_success_time | connect_error |
+-----------------+------+------------------+----------------------+---------------+
| 192.168.124.101 | 3306 | 1442153412970870 | 1334                 | NULL          |
| 192.168.124.102 | 3306 | 1442153412970870 | 1606                 | NULL          |
| 192.168.124.103 | 3306 | 1442153412970870 | 1555                 | NULL          |
| 192.168.124.101 | 3306 | 1442153292970677 | 1523                 | NULL          |
| 192.168.124.102 | 3306 | 1442153292970677 | 1386                 | NULL          |
| 192.168.124.103 | 3306 | 1442153292970677 | 1283                 | NULL          |
+-----------------+------+------------------+----------------------+---------------+
6 rows in set (0.00 sec)

Note that time_start is not a unix_timestamp but a monotic time in microsecond.
Similarly, "connect_success_time" and "ping_success_time" are in microsecond.

You can run more complex queries against these table. For example, the average connection time per host is given by this query:

mysql> SELECT hostname,port, AVG(connect_success_time)/1000 avg_ms
    -> FROM monitor.mysql_server_connect_log
    -> WHERE connect_error IS NULL
    -> GROUP BY hostname,port;
+-----------------+------+---------+
| hostname        | port | avg_ms  |
+-----------------+------+---------+
| 192.168.124.101 | 3306 | 1.43025 |
| 192.168.124.102 | 3306 | 1.5365  |
| 192.168.124.103 | 3306 | 1.61875 |
+-----------------+------+---------+
3 rows in set (0.00 sec)



Replication lag monitoring


In the previous paragraph we said that ProxySQL doesn't perform any proactive monitoring. This is always true with only one exception so far: replication lagging.
In fact, ProxySQL can be configure to monitor replication lag and temporary shun slaves when they lag beyond a certain threshold.
As reported in other post, ProxySQL is not replication aware and therefore ProxySQL monitors replication lag only for hosts where it was specific a maximum replication lag through mysql_servers.max_replication_lag .

To configure replication lag monitoring we need:
a) monitor user (as specified in mysql-monitor_username) with SUPER or REPLICATION CLIENT privileges (we are already using "root") ;
b) mysql_servers.max_replication_lag is greater than zero ;
c) tune mysql-monitor_replication_lag_interval accordingly: as the name suggests, this is how often replication lag is checked.

As mysql_servers.max_replication_lag applies only to slaves, we can configure it safely to all hosts either slaves or masters (where won't have any effect).

mysql> SELECT * FROM mysql_servers WHERE max_replication_lag>0;
Empty set (0.00 sec)

mysql> UPDATE mysql_servers SET max_replication_lag=5;
Query OK, 3 rows affected (0.00 sec)

mysql> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)

mysql> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE global_variables SET variable_value="1500"
    -> WHERE variable_name="mysql-monitor_replication_lag_interval";
Query OK, 1 row affected (0.00 sec)

mysql> SAVE MYSQL VARIABLES TO DISK;
Query OK, 47 rows affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)


After this configuration, ProxySQL will monitor replication lag on every host every 1500ms , and if their replication lag is over 5 seconds the host will be temporary shunned.

Please be very caution on how you configure this, because if misconfigured it could lead to unexpected behavior.
For example, if all the slaves are lagging and reads go to slaves, proxysql will wait until some slave catch up and then execute the query, resulting in increased latency. Or, if the slaves do not catch up within mysql-connect_timeout_server_max millisecond, proxysql will return an error saying it can't find any host.
A workaround to solve this is to configure also the master in the same hostgroup with the slaves, but with a weight a lot lower than the slaves: in this way, if the slaves are up and running the queries will be send to slaves most of the time and occasionally to the master, while if the slaves are not available they will be send to the master.

We can check status using monitor table mysql_server_replication_lag_log :

mysql> SELECT * FROM monitor.mysql_server_replication_lag_log
    -> ORDER BY time_start DESC LIMIT 10;
+-----------------+------+------------------+--------------+----------+-------+
| hostname        | port | time_start       | success_time | repl_lag | error |
+-----------------+------+------------------+--------------+----------+-------+
| 192.168.124.102 | 3306 | 1442156260020471 | 830          | 0        | NULL  |
| 192.168.124.103 | 3306 | 1442156260020471 | 784          | 0        | NULL  |
| 192.168.124.102 | 3306 | 1442156258518415 | 537          | 0        | NULL  |
| 192.168.124.103 | 3306 | 1442156258518415 | 448          | 0        | NULL  |
| 192.168.124.102 | 3306 | 1442156257015337 | 844          | 0        | NULL  |
| 192.168.124.103 | 3306 | 1442156257015337 | 722          | 0        | NULL  |
| 192.168.124.102 | 3306 | 1442156255514011 | 837          | 0        | NULL  |
| 192.168.124.103 | 3306 | 1442156255514011 | 691          | 0        | NULL  |
| 192.168.124.102 | 3306 | 1442156254012744 | 726          | 0        | NULL  |
| 192.168.124.103 | 3306 | 1442156254012744 | 605          | 0        | NULL  |
+-----------------+------+------------------+--------------+----------+-------+
10 rows in set (0.00 sec)

mysql> SELECT hostname,repl_lag,COUNT(*)
    -> FROM monitor.mysql_server_replication_lag_log
    -> WHERE error IS NULL GROUP BY hostname\G
*************************** 1. row ***************************
hostname: 192.168.124.102
repl_lag: 0
COUNT(*): 400
*************************** 2. row ***************************
hostname: 192.168.124.103
repl_lag: 0
COUNT(*): 400
2 rows in set (0.00 sec)

Note that there are 400 entries per host, that is because the check is performed every 1500ms and the log table hold entries for 600s .
As soon as an entry becomes older than 600s it is deleted.



This tutorial has reached an end.
In this tutorial we described:
  • how to install ProxySQL
  • how to perform some basic configuration adding users and backends
  • the concept of hostgroups
  • how to configure query routing and read/write split
  • how to retrieve some of the many information and statistics available from within ProxySQL
  • how to configure global variables
  • how to configure the Monitor module
  • how to monitor replication lag
Please stay tuned for the next tutorial, titled "seamless replication switchover in less than one second"