The post was quite long, and we covered several topics, from installation and configuration of the cluster to installation of proxysql and creation of users and backends, from configuration of query rules for routing and their statistics to monitoring module and replication lag.
But ProxySQL can do more than this.
Among the most interesting features of proxysql is the ability to hold traffic until a backend is available (within a configurable time limit).
For instance, if a query should be send to hostgroup 0 but there is no servers in hostgroup 0, ProxySQL will wait until a server is available in hostgroup 0 or a timeout expires.
This feature has the very interesting effect that using ProxySQL it is possible to perform a planned failover without any error to the application.
In a nutshell, the process is the following:
- ProxySQL is configured to disable the master from hostgroup 0 ;
- the failover is performed ;
- ProxySQL is configured to demote the old master to become a slave, and the new master is moved to hostgroup 0.
1. ProxySQL is configured to disable the master from hostgroup 0
To make an example, we can run something like this:
vagrant@app:~$ export OldMaster='192.168.124.101' vagrant@app:~$ export NewMaster='192.168.124.102' vagrant@app:~$ mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\" AND hostgroup_id=0; LOAD MYSQL SERVERS TO RUNTIME;"
What the above command is simple: the master is disabled (status=OFFLINE_SOFT).
At this stage, no more queries can be sent to old master.
2. Perform the failover
The reason why we configured GTID for replication is to simplify the failover process. In fact, we can use MySQL Utilities to perform the failover. Here an example:
vagrant@app:~$ mysqlrpladmin --discover-slaves-login=root:root --rpl-user=repl:replpass --master=root:root@$OldMaster --new-master=root:root@$NewMaster --demote-master switchover # Discovering slaves for master at 192.168.124.101:3306 # Discovering slave at 192.168.124.102:3306 # Found slave: 192.168.124.102:3306 # Discovering slave at 192.168.124.103:3306 # Found slave: 192.168.124.103:3306 # Checking privileges. # Performing switchover from master at 192.168.124.101:3306 to slave at 192.168.124.102:3306. # Checking candidate slave prerequisites. # Checking slaves configuration to master. # Waiting for slaves to catch up to old master. # Stopping slaves. # Performing STOP on all slaves. # Demoting old master to be a slave to the new master. # Switching slaves to new master. # Starting all slaves. # Performing START on all slaves. # Checking slaves for errors. # Switchover complete. # # Replication Topology Health: +------------------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------------+-------+---------+--------+------------+---------+ | 192.168.124.102 | 3306 | MASTER | UP | ON | OK | | 192.168.124.101 | 3306 | SLAVE | UP | ON | OK | | 192.168.124.103 | 3306 | SLAVE | UP | ON | OK | +------------------+-------+---------+--------+------------+---------+ # ...done. vagrant@app:~$
3. ProxySQL is configured to demote the old master to become a slave, and the new master is moved to hostgroup 0
Similarly, we can quickly reconfigure ProxySQL to perform the traffic switch configuring the old master as a slave and enabling it, and configuring the new master.
vagrant@app:~$ ( > echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=1 WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;" > echo "UPDATE mysql_servers SET hostgroup_id=0 WHERE hostname=\"$NewMaster\" AND hostgroup_id=1;" > echo "LOAD MYSQL SERVERS TO RUNTIME;" > ) | mysql -u admin -padmin -h 127.0.0.1 -P6032 Warning: Using a password on the command line interface can be insecure. vagrant@app:~$
Seems very easy, right?
Let's try to put everything together in a failover script.
vagrant@app:~$ cat ./switchover.sh
#!/bin/sh if [ $# -ne 2 ]; then echo "Usage: ./switchover.sh oldmaster newmaster" exit 1 fi OldMaster=$1 NewMaster=$2 ( echo "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;" echo "LOAD MYSQL SERVERS TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032 sleep 1 # let's give some time to current transactions to complete mysqlrpladmin --discover-slaves-login=root:root --rpl-user=repl:replpass \ --master=root:root@$OldMaster --new-master=root:root@$NewMaster \ --demote-master switchover ( echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=1 WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;" echo "UPDATE mysql_servers SET hostgroup_id=0 WHERE hostname=\"$NewMaster\" AND hostgroup_id=1;" echo "LOAD MYSQL SERVERS TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032
Note that the script above doesn't perform any sanity check or error handling. This is intentionally, to make the script short and easy to understand.
Seamless replication switchover
At this point we have a script (switchover.sh) that is able to perform a switch in a very short period of time.
Let's put it in action, and while running sysbench (below) in another terminal I will be running:
./switchover.sh 192.168.124.102 192.168.124.101
./switchover.sh 192.168.124.101 192.168.124.102
Here is the output of sysbench while performing 2 switchovers :
vagrant@app:~$ sysbench --report-interval=1 \ --num-threads=4 --max-requests=0 --max-time=20 \ --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua \ --mysql-user=utest --mysql-password=ptest \ --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-db=test \ --oltp-read-only=off run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Report intermediate results every 1 second(s) Random number generator seed is 0 and will be ignored Threads started! [ 1s] threads: 4, tps: 137.94, reads: 1958.22, writes: 551.78, response time: 42.52ms (95%), errors: 0.00, reconnects: 0.00 [ 2s] threads: 4, tps: 156.01, reads: 2205.11, writes: 635.03, response time: 32.96ms (95%), errors: 0.00, reconnects: 0.00 [ 3s] threads: 4, tps: 155.93, reads: 2172.96, writes: 612.71, response time: 36.52ms (95%), errors: 0.00, reconnects: 0.00 [ 4s] threads: 4, tps: 156.05, reads: 2200.76, writes: 628.22, response time: 32.25ms (95%), errors: 0.00, reconnects: 0.00 [ 5s] threads: 4, tps: 75.00, reads: 996.06, writes: 296.02, response time: 30.29ms (95%), errors: 0.00, reconnects: 0.00 [ 6s] threads: 4, tps: 12.00, reads: 197.01, writes: 52.00, response time: 1493.45ms (95%), errors: 0.00, reconnects: 0.00 [ 7s] threads: 4, tps: 160.99, reads: 2230.92, writes: 639.98, response time: 33.70ms (95%), errors: 0.00, reconnects: 0.00 [ 8s] threads: 4, tps: 158.99, reads: 2275.84, writes: 649.95, response time: 30.82ms (95%), errors: 0.00, reconnects: 0.00 [ 9s] threads: 4, tps: 160.02, reads: 2240.22, writes: 642.06, response time: 31.89ms (95%), errors: 0.00, reconnects: 0.00 [ 10s] threads: 4, tps: 155.99, reads: 2170.91, writes: 612.97, response time: 34.46ms (95%), errors: 0.00, reconnects: 0.00 [ 11s] threads: 4, tps: 132.00, reads: 1825.05, writes: 523.01, response time: 39.40ms (95%), errors: 0.00, reconnects: 0.00 [ 12s] threads: 4, tps: 157.00, reads: 2207.00, writes: 628.00, response time: 34.01ms (95%), errors: 0.00, reconnects: 0.00 [ 13s] threads: 4, tps: 160.99, reads: 2271.83, writes: 643.95, response time: 31.08ms (95%), errors: 0.00, reconnects: 0.00 [ 14s] threads: 4, tps: 164.01, reads: 2286.18, writes: 660.05, response time: 30.51ms (95%), errors: 0.00, reconnects: 0.00 [ 15s] threads: 4, tps: 102.01, reads: 1405.07, writes: 405.02, response time: 34.48ms (95%), errors: 1.00, reconnects: 0.00 [ 16s] threads: 4, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [ 17s] threads: 4, tps: 152.99, reads: 2187.93, writes: 618.98, response time: 32.53ms (95%), errors: 0.00, reconnects: 0.00 [ 18s] threads: 4, tps: 162.00, reads: 2277.96, writes: 652.99, response time: 30.47ms (95%), errors: 0.00, reconnects: 0.00 [ 19s] threads: 4, tps: 160.78, reads: 2250.98, writes: 640.14, response time: 33.41ms (95%), errors: 0.00, reconnects: 0.00 [ 20s] threads: 4, tps: 160.22, reads: 2225.01, writes: 631.85, response time: 32.45ms (95%), errors: 0.00, reconnects: 0.00 OLTP test statistics: queries performed: read: 37604 write: 10741 other: 5371 total: 53716 transactions: 2685 (134.13 per sec.) read/write requests: 48345 (2415.02 per sec.) other operations: 5371 (268.30 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 20.0185s total number of events: 2685 total time taken by event execution: 80.0231s response time: min: 13.86ms avg: 29.80ms max: 1505.34ms approx. 95 percentile: 34.40ms Threads fairness: events (avg/stddev): 671.2500/4.15 execution time (avg/stddev): 20.0058/0.00
The provided script, using proxysql and mysqlrpladmin, is able to perform a switch in a very short period of time (roughly 1.5 second) without stopping the application and without generating any error.
The application (sysbench in this case) doesn't report any error/failure!
Yet, it is possible to note that sysbench pauses at seconds 5-6 and again at second ~16 . Furthermore, sysbench also reports that the maximum response time is 1505ms : this is exactly the time it took to perform a switch. The trade off for no error is increased latency, within a configurable threshold.
ProxySQL is able to give us some more information about what happened during the switch:
mysql> SELECT max_time, digest_text FROM stats_mysql_query_digest ORDER BY max_time DESC LIMIT 3; +----------+-------------------------------------+ | max_time | digest_text | +----------+-------------------------------------+ | 1523629 | BEGIN | | 157867 | COMMIT | | 151759 | UPDATE sbtest1 SET k=k+? WHERE id=? | +----------+-------------------------------------+ 3 rows in set (0.00 sec)
BEGIN is the statement that was the slowest among all (1523ms). Let's see why.
When a server is put into OFFLINE_SOFT mode, new incoming connections aren't accepted anymore, while the existing connections are used until they became inactive. In other words, connections are kept in use until the current transaction is completed. This allows to gracefully detach a backend.
With the sysbench executed above, all statements are executed within a transaction that starts with BEGIN , therefore during the failover only the BEGIN statements were blocked: the other statements weren't sent yet.
Even faster seamless replication switchover
In the previous switchover script there is a completely arbitrary "sleep 1" to wait transactions to complete.
This means that if any transaction takes longer than 1 second it is killed, but also that it is possible that the switchover if paused for too long and could be faster.
ProxySQL tracks the number of active connections (in use) in stats_mysql_connection_pool.ConnUsed .
Therefore, if SUM(ConnUsed)==0 it means that no connections are in use.
Let's use this information to improve the script, replacing "sleep 1" with a loop waiting that all connections get free within 1 second.
vagrant@app:~$ cat ./switchover.sh
#!/bin/sh if [ $# -ne 2 ]; then echo "Usage: ./switchover.sh oldmaster newmaster" exit 1 fi OldMaster=$1 NewMaster=$2 ( echo "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;" echo "LOAD MYSQL SERVERS TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032 CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND hostgroup=0' -B -N 2> /dev/null` TRIES=0 while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ] do CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND hostgroup=0' -B -N 2> /dev/null` TRIES=$(($TRIES+1)) if [ $CONNUSED -ne "0" ]; then sleep 0.05 fi done mysqlrpladmin --discover-slaves-login=root:root --rpl-user=repl:replpass \ --master=root:root@$OldMaster --new-master=root:root@$NewMaster \ --demote-master switchover ( echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=1 WHERE hostname=\"$OldMaster\" AND hostgroup_id=0;" echo "UPDATE mysql_servers SET hostgroup_id=0 WHERE hostname=\"$NewMaster\" AND hostgroup_id=1;" echo "LOAD MYSQL SERVERS TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032
We will now re-run sysbench, while on another terminal we perform 2 switchover:
vagrant@app:~$ sysbench --report-interval=1 --num-threads=4 --max-requests=0 --max-time=20 --test=/home/vagrant/sysbench-0.5/sysbench/tests/db/oltp.lua --mysql-user=utest --mysql-password=p test --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-db=test --oltp-read-only=off run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Report intermediate results every 1 second(s) Random number generator seed is 0 and will be ignored Threads started! [ 1s] threads: 4, tps: 132.01, reads: 1885.13, writes: 532.04, response time: 40.72ms (95%), errors: 0.00, reconnects: 0.00 [ 2s] threads: 4, tps: 139.95, reads: 1955.36, writes: 555.82, response time: 37.97ms (95%), errors: 0.00, reconnects: 0.00 [ 3s] threads: 4, tps: 130.06, reads: 1820.88, writes: 524.25, response time: 43.67ms (95%), errors: 0.00, reconnects: 0.00 [ 4s] threads: 4, tps: 138.02, reads: 1935.22, writes: 550.06, response time: 38.49ms (95%), errors: 0.00, reconnects: 0.00 [ 5s] threads: 4, tps: 75.00, reads: 1013.98, writes: 297.99, response time: 41.33ms (95%), errors: 0.00, reconnects: 0.00 [ 6s] threads: 4, tps: 103.00, reads: 1474.96, writes: 417.99, response time: 46.80ms (95%), errors: 0.00, reconnects: 0.00 [ 7s] threads: 4, tps: 140.00, reads: 1955.00, writes: 557.00, response time: 37.95ms (95%), errors: 0.00, reconnects: 0.00 [ 8s] threads: 4, tps: 137.99, reads: 1938.81, writes: 548.95, response time: 38.18ms (95%), errors: 0.00, reconnects: 0.00 [ 9s] threads: 4, tps: 135.89, reads: 1919.42, writes: 543.55, response time: 36.65ms (95%), errors: 0.00, reconnects: 0.00 [ 10s] threads: 4, tps: 133.09, reads: 1846.30, writes: 536.38, response time: 41.17ms (95%), errors: 0.00, reconnects: 0.00 [ 11s] threads: 4, tps: 141.03, reads: 1970.43, writes: 560.12, response time: 35.89ms (95%), errors: 0.00, reconnects: 0.00 [ 12s] threads: 4, tps: 137.00, reads: 1926.02, writes: 548.01, response time: 39.55ms (95%), errors: 0.00, reconnects: 0.00 [ 13s] threads: 4, tps: 141.99, reads: 1961.80, writes: 567.94, response time: 37.37ms (95%), errors: 0.00, reconnects: 0.00 [ 14s] threads: 4, tps: 125.01, reads: 1777.13, writes: 504.04, response time: 43.84ms (95%), errors: 0.00, reconnects: 0.00 [ 15s] threads: 4, tps: 95.88, reads: 1336.31, writes: 379.52, response time: 59.99ms (95%), errors: 0.00, reconnects: 0.00 [ 16s] threads: 4, tps: 122.16, reads: 1716.26, writes: 495.65, response time: 46.88ms (95%), errors: 0.00, reconnects: 0.00 [ 17s] threads: 4, tps: 136.99, reads: 1917.90, writes: 543.97, response time: 42.31ms (95%), errors: 0.00, reconnects: 0.00 [ 18s] threads: 4, tps: 125.01, reads: 1734.07, writes: 499.02, response time: 45.23ms (95%), errors: 0.00, reconnects: 0.00 [ 19s] threads: 4, tps: 135.00, reads: 1912.97, writes: 541.99, response time: 44.89ms (95%), errors: 0.00, reconnects: 0.00 [ 20s] threads: 4, tps: 127.00, reads: 1773.07, writes: 512.02, response time: 48.10ms (95%), errors: 0.00, reconnects: 0.00 OLTP test statistics: queries performed: read: 35784 write: 10224 other: 5112 total: 51120 transactions: 2556 (127.71 per sec.) read/write requests: 46008 (2298.83 per sec.) other operations: 5112 (255.43 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 20.0137s total number of events: 2556 total time taken by event execution: 79.9988s response time: min: 16.54ms avg: 31.30ms max: 687.98ms approx. 95 percentile: 43.43ms Threads fairness: events (avg/stddev): 639.0000/5.48 execution time (avg/stddev): 19.9997/0.01
The switchover are at second ~5 and ~15 . The complete failover switches took around 0.58s , of which ~0.51s running mysqlrpladmin while ~0.07s to reconfigure proxysql .
Conclusion
In this tutorial we described how it is possible to use ProxySQL to perform a master switchover and reconfiguration of all the slaves in less than one second.
Using ProxySQL to perform a master switchover provides extremely high availability and without returning any error to the application. That is: the application won't be aware that a switchover happened as it doesn't need to be reconfigured, and neither it will get any error. All the application will notice is a subsecond latency while the switchover is running.
Last, a friendly reminder that I will be speaking at PerconaLive Amsterdam on 23rd September about ProxySQL : please come to me session and ask all the questions you have!
Great writeup -- last "ProxySQL" link is broken
ReplyDeleteThanks Shlomi. Link fixed
Deleteproxysql there is a BUG, can not judge IO_THREAD and SQL THREAD, not automated OFFLINE_SOFT
ReplyDeleteProxySQL doesn't monitor the status of IO_THREAD and SQL_THREAD. This article shows how to perform switchover using mysqlrpladmin.
DeleteHi René,
ReplyDeleteFirst of all, thanks for ProxySQL. I'm just studying it and sounds interesting to use it on our environment. The switchover script could help a lot for us on our 1 master 1 slave database replication. I'm just thinking can we make switchover.sh to run automatically somehow when ProxySQL detects the master is down?
Thanks in advance.