Verifying our XtraDB Cluster

In my last blog post, I showed you how easy it is to set up a three node Percona XtraDB Cluster.  But how do we know if the cluster is working or not?  In this post I will go over the built-in variables that ship with Galera you can use to check on the status of your cluster.  Before we dive in, it is important to realize that Galera is a multi-master cluster.  This means that you can read and write to all the nodes.  One of the implications of the manic that makes Galera work is that each node can be independently configured. As such, it is important that you check all the nodes, not just one, to determine the overall health of your cluster.  For the rest of this post I will be only running the checks on one of the nodes for brevity.  But you should really run these checks on all the nodes to ensure a fully functioning cluster.

Probably the most important variable is wsrep_cluster_status. This variable will tell you if the node is up and fully part of the cluster.  If the node is fully part of the cluster this variable will return the status of Primary.  Any other response and the node is not fully part of the cluster.  You can check the status by running:

node1 mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+
1 row in set (0.00 sec)

But checking the variable alone is not enough, because you can bootstrap all three nodes and have three seperate clusters. In this case, all three would return Primary but you would still not have a three node cluster. In order to tell if you if the node is part of the cluster you can use wsrep_cluster_size. This will tell you the number of nodes in the cluster.

node1 mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)

For our simple three node example, this is probably good enough. But what if you work in an environment with multiple clusters. How can you tell if the node is part of the right cluster? To be sure we can use wsrep_cluster_state_uuid. All the nodes fully participating in the cluster should return the same value.

node1 mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_state_uuid';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_state_uuid | c0803766-24db-11e5-ac7a-abcded9c6987 |
+--------------------------+--------------------------------------+
1 row in set (0.00 sec)

For more information on how to use wsrep variables to check on the health of your cluster you can use the Galera documentation.  In future posts we will dive more into Galera.

Installing Percona XtraDB Cluster on CentOS 6 using VirtualBox

In this post I will be walking you through installing a 3 node test XtraDB Cluster (PXC) environment using VirtualBox.  The version of VirtualBox I will be using is 4.3.26. And I will be using CentOS 6.6 for the operating system on the three nodes. To get started you will want to perform a minimal install of 6.6 on 3 virtual machines. I named mine pxc-node1, pxc-node2, and pxc-node3. I also set up port forwarding in VirtualBox to allow me to ssh to the machines via a local port on the host. I set up 22201 to point to pxc-node1, 22202 to point to pxc-node2, and 22203 to point to pxc-node3. The last thing that I did was to add a second host only network interface to each of the nodes and ensure the same name was set for all the nodes.  You can do all of this work yourselves, or simply download an already configured appliance from here.

Now that we have the images created, it is time to discuss the plan for installing our Percona XtraDB Cluster. The actual installation process is very similar to the process that I used for installing Percona Server. But once we have the binaries laid down the configuration of the instances will be different and we will need to start the first node of the cluster a special way. While this quick overview does not give a lot of detail, I think it best to learn by doing. With that in mind, here we go.

First ensure that all of the nodes are running. To keep things simple we will be installing as root. But I have done installs at work on RHEL 6 using sudo and the process is pretty much identical. First thing we will need to do is to enable the network interfaces on each machine and disable SELinux. Galera does not work with SELinux and since XtraDB Cluster is just Percona’s Galera implementation we will need to disable it. In the console for each of the nodes you will need to run the following commands.

[root@localhost ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth0

You will want to modify the ONBOOT to be yes. The file should look like this when finished.

DEVICE=eth0
HWADDR=08:00:27:24:FD:19
TYPE=Ethernet
UUID=b3c128fe-335e-437d-bf3b-32f3706d9273
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=dhcp

Now save the file.  We also need to add a config for the secondary NIC which we want to not set up for DHCP, but use static ip addresses.  We will use 192.168.70.11 for pxc-node1, 192.168.70.12 for pxc-node2, and 192.168.70.13 for pxc-node3.  The contents of /etc/sysconfig/network-scripts/ifcfg-eth1 for pxc-node1 would look like.

NM_CONTROLLED=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.70.11
NETMASK=255.255.255.0
DEVICE=eth1

PEERDNS=no

To disable SELInux we need to run.

[root@localhost ~] vi /etc/selinux/config

You will want to set it to disabled. The file should look like this when finished.

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Now save the file. We also will be stopping the firewall and setting it not to restart.  In production we would want to add rules to allow traffic, but this is just a test environment so we’ll take the short cut.

[root@localhost ~]$ service iptables stop
[root@localhost ~]$ chkconfig iptables off

To make it easier to know what system you are on I would modify the shell to show the node name. To do that edit the bash configuration file using.

vi /etc/bashrc

Find the line that says

[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "

Comment that line out and add the following

PS1='\u@pxc-node1:\w\$ '

The section should look like this when finished then save the file.

# Turn on checkwinsize
  shopt -s checkwinsize
#  [ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "
  PS1='\u@pxc-node1:\w\$ '

After making these changes you will want to restart the node using the following command and then perform the same steps on the other two nodes.

[root@localhost ~] shutdown -r 0

Once we are finished with the prep of the three nodes we are ready to start the installation of the PXC cluster. First we want to make sure the the MySQL libraries are not already on the box.

root@pxc-node1:~# yum -y remove mysql-libs

Then we want to import the EPEL and the Percona Repositories.

root@pxc-node1:~# rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
root@pxc-node1:~# rpm -Uvh http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm

Now that we have the repository registered it is time to lay down the binaries.

root@pxc-node1:~# yum install -y socat
root@pxc-node1:~# yum install -y Percona-XtraDB-Cluster-server-56 Percona-XtraDB-Cluster-client-56 Percona-XtraDB-Cluster-shared-56 percona-toolkit percona-xtrabackup
root@pxc-node1:~# touch /etc/my.cnf
root@pxc-node1:~# /usr/bin/mysql_install_db --defaults-file=/etc/my.cnf --force --datadir=/var/lib/mysql --basedir=/usr/ --user=mysql

You will need to run these same steps on all three nodes to install the binaries. This will also create the data directory and populate it with the system tables.
Now that we have the binaries on each machine it is time to set up the cluster. This involves editing the /etc/my.cnf file. To start with, let’s configure the first node in our cluster. On pxc-node1 edit the /etc/my.cnf file so that it looks like this.

[mysqld]
datadir                         = /var/lib/mysql
log_error                       = error.log

log-bin
server-id = 1

query_cache_size=0
query_cache_type=0

innodb_buffer_pool_size                 = 48M
innodb_log_file_size            = 24M
innodb_flush_method                             = O_DIRECT
innodb_file_per_table
innodb_flush_log_at_trx_commit  = 0

performance_schema=OFF

binlog_format = ROW

# galera settings
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_cluster_name = mycluster
wsrep_cluster_address = gcomm://192.168.70.11,192.168.70.12,192.168.11.13
wsrep_node_name = node1
wsrep_node_address = 192.168.70.11
wsrep_sst_auth = sst:secret
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = ON


[mysql]
prompt                          = "node1 mysql> "

[client]
user                            = root​

If we try to start the cluster now using the same method for a normal MySQL instance it will fail.

root@pxc-node1:~# service mysql start
Starting MySQL (Percona XtraDB Cluster)................................... ERROR! The server quit without updating PID file (/var/lib/mysql/localhost.localdomain.pid).
 ERROR! MySQL (Percona XtraDB Cluster) server startup failed!

This is because the first node in a cluster has to be forced online as a safety feature since you cannot have quorum with only one node. To force it online run the following.

root@pxc-node1:~# service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster)...... SUCCESS!

You may see a message similar to the following since we just tried to start the cluster without bootstrapping first.

root@pxc-node1:~# service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster) ERROR! MySQL (Percona XtraDB Cluster) is not running, but lock file (/var/lock/subsys/mysql) exists
Starting MySQL (Percona XtraDB Cluster)......... SUCCESS!

Either way you now have a cluster up and running with one node. Before we add any more nodes, let’s put some things in place to verify the cluster once it is up. We will be installing sysbench to allow us to drive a load at the machine, and we will be installing myq_tools to check the flow of transactions in the cluster. Run this on all three nodes to set up sysbench and myq_tools and get them ready for our use.

root@pxc-node1:~# yum install -y sysbench
root@pxc-node1:~# mysql
node1 mysql> CREATE USER 'test'@'localhost' IDENTIFIED BY 'test';
node1 mysql> GRANT ALL PRIVILEGES ON test.* TO 'test'@'localhost';
node1 mysql> \q
root@pxc-node1:~# yum install -y wget
root@pxc-node1:~# cd /usr/local/bin​ ​
root@pxc-node1:~# ​wget https://github.com/jayjanssen/myq-tools/releases/download/v0.5/myq_tools.tgz ​
​root@pxc-node1:~# tar -xzvf myq_tools.tgz ​
root@pxc-node1:~# cd bin ​
root@pxc-node1:~# mv * ../ ​
root@pxc-node1:~# cd .. ​
root@pxc-node1:~# rm -rf​ bin​ ​
​root@pxc-node1:~# ln -s /usr/local/bin/myq_status.linux-amd64 myq_status​​
​root@pxc-node1:~# vi /usr/local/bin/run_sysbench_oltp.sh

sysbench --db-driver=mysql --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-user=test --mysql-password=test --mysql-db=test --mysql-host=localhost --mysql-ignore-errors=all --oltp-tables-count=1 --oltp-table-size=250000 --oltp-auto-inc=off --num-threads=1 --report-interval=1 --max-requests=0 --tx-rate=10 run | grep tps

root@pxc-node1:~# chmod +x run_sysbench_oltp.sh
root@pxc-node1:~# sysbench --db-driver=mysql --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-user=test --mysql-password=test --mysql-db=test --mysql-host=localhost --mysql-ignore-errors=all --oltp-table-size=250000 --num-threads=1 prepare

You can verify that you have sysbench set up correctly but running the following.  You should not see any errors.  There should also be numbers in the reads and writes columns.

root@pxc-node1:/usr/local/bin# run_sysbench_oltp.sh
[   1s] threads: 1, tps: 10.99, reads: 153.85, writes: 43.96, response time: 12.64ms (95%), errors: 0.00, reconnects:  0.00
[   2s] threads: 1, tps: 17.02, reads: 238.28, writes: 68.08, response time: 12.84ms (95%), errors: 0.00, reconnects:  0.00
[   3s] threads: 1, tps: 10.00, reads: 153.02, writes: 40.01, response time: 10.10ms (95%), errors: 0.00, reconnects:  0.00
[   4s] threads: 1, tps: 19.00, reads: 253.00, writes: 76.00, response time: 10.26ms (95%), errors: 0.00, reconnects:  0.00
[   5s] threads: 1, tps: 9.00, reads: 125.99, writes: 36.00, response time: 11.42ms (95%), errors: 0.00, reconnects:  0.00
[   6s] threads: 1, tps: 6.00, reads: 83.98, writes: 23.99, response time: 12.19ms (95%), errors: 0.00, reconnects:  0.00
[   7s] threads: 1, tps: 6.00, reads: 84.01, writes: 24.00, response time: 63.33ms (95%), errors: 0.00, reconnects:  0.00

Now that we have the cluster running we need to add the other two nodes. But before we do that, we need to create a user with permissions for the SST process. We’ll talk more about SST in the future, but for now its important to know this process allows the other nodes to get a copy of the database when they join the cluster. To do that we need to run the following on node 1.

root@pxc-node1:/usr/local/bin# mysql
node1 mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst'@'localhost' IDENTIFIED BY 'secret';​​
node1 mysql> \q

Since we already have the binaries installed we just need to configure the other two instances and start mysql on the nodes. The instructions for node 2 are below. Notice that we don’t need to bootstrap these instances. In fact, if we do bootstrap them we would end up with 3 clusters instead of just one.

root@pxc-node2:~#vi /etc/my.cnf

[mysqld]
datadir                         = /var/lib/mysql
log_error                       = error.log

log-bin
server-id = 2

query_cache_size=0
query_cache_type=0

innodb_buffer_pool_size                 = 48M
innodb_log_file_size            = 24M
innodb_flush_method                             = O_DIRECT
innodb_file_per_table
innodb_flush_log_at_trx_commit  = 0

performance_schema=OFF

binlog_format = ROW

# galera settings
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_cluster_name = mycluster
wsrep_cluster_address = gcomm://192.168.70.11,192.168.70.12,192.168.70.13
wsrep_node_name = node2
wsrep_node_address = 192.168.70.12
wsrep_sst_auth = sst:secret
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = ON


[mysql]
prompt                          = "node2 mysql> "

[client]
user                            = root​

root@pxc-node2:~# service mysql restart
Shutting down MySQL (Percona XtraDB Cluster).. SUCCESS!
Starting MySQL (Percona XtraDB Cluster).....State transfer in progress, setting sleep higher
... SUCCESS!

At this point we now have a two node cluster. To add the last node you need to modify /etc/my.cnf on that node and then start the mysql service. You can use the configuration file for node 2 as an example. You will need to modify the server-id, wsrep_node_name, wsrep_node_address, and prompt to the values for pxc-node3. If you have any issues let me know.

In a future post, I will show you how to set up HAProxy to provide a single connection (actually we will use 2 connections but more on that later) for your applications. We will also play around with some of the more interesting features of Galera and Percona XtraDB Cluster.

Big Transactions on Galera Cluster – Percona Live 2015

I attended the session “Big Transactions on Galera Cluster” Seppo Jaakola from Codership (folks that write Galera).

Huge Transactions are:

  • Large read set – no problems
  • Large Write Set roughly >10K rows modified can be slow
  • Long term transaction – ok but vulnerable for multi-master conflicts

A large transaction can cause flow control to kick in and prevent any transaction to commit before it finishes.  This is due to strict ordering of transactions to keep certification working deterministically.  Long term transactions are vulnerable to multi master writes.  Galera 3 and earlier can use wsrep_max_ws_rows and _size to limit the size of transactions.  2GB is upper limit for transaction size.

If you have a huge transaction be careful killing it because of rollback.  You may want to adjust flow control to avoid the other nodes from freezing.  Seppo did a demo showing the impact of huge transactions on the cluster.  Solutions could be to skip flow control using wsrep_desync.  This will fix the master, but slaves will just lag further and further behind.  You can also relax the commit order and allow the nodes to be temporarily out of sync (not supported by Galera just an idea).

Streaming replication helps and is added in Galera 4.  wsrep_trx_fragment_unit and wsrep_trx_fragement_size are used to configure.  This basically divides a huge transaction into a series of smaller transactions that will not interfere with Galera flow control.  Seppo set up streaming replication and then reran his demo showing the improvement of the cluster performance.  This will allow Galera to break the 2 GB write set limit by breaking up large transactions into smaller chunks.  These changes also help Galera to handle hot-spot transactions by setting transaction fragment size to 1 statement.

There are still issues with certification is large and small transactions are being written to multiple nodes, but this appears to be a huge improvement for Galera.  I look forward to playing with it in the future.  Galera 4 should be out first half of this year. Additional features Non blocking DDL. streaming replication, and optimized consistency alert shutdown (protocol to avoid cluster crash).

Managing MySQL with Puppet – Percona Live 2015

I attended the “Managing MySQL with Puppet” session at Percona Live 2015 with Jaako Pesonen from Spil Games.  He is using Diffy, Sheldon (provides information about environment, host, etc to Puppet), OpenStack, MySQL, Percona XtraDB Cluster, and Puppet.  In this session he concentrated on Puppet and Percona XtraDB Cluster.  Puppetlabs mysql is provided by Puppet labs.  It used to not be that good, but is getting better.  Puppet contains 4 mysql providors (server, client, bindings,and backups).  Here are my notes:

  • He recommends using Hiera to make puppet more useful and readable
  • https://github.com/spilgames/spilgames-mysql to get access to their module you can modify as needed (not set up for general use)
  • Nothing special needed for puppet and MHA or regular replication
  • Galera is special as that the first node is different
  • They use different users between production and staging for backups to avoid having production credentials in staging

There was a lot of puppet configuration slides that I will need to dive into more.  My takeaways is that we can manage Galera with Puppet, but it will take some customization.  I look forward to diving into Jaako’s GitHub repository and trying to figure out what will work best for us.

Percona XtraDB Cluster in Practice – Percona Live 2015

I knew right off this was going to be an awesome session.  When you sit down and the slide on the screen has instructions on how to set up an actual environment in Virtual Box you know it is going to be good.  The session was presented by Jay Janssen at Percona.  The session was broken up into two parts.  First we migrated a master slave environment to Galera and then we went over Galera issues and some admin information.

Most of the morning was spent setting up the environment by migrating from traditional replication to Galera.  I am sure there were some in the room who thought this was too basic, but it was great for me.  My team has been working on Galera for several weeks, but this was my first time putting my hands on it.  First we started with a 3 node asynchronous replication environment (1 master and 2 slaves).  We then replaced XtraDB server with XtraDB Cluster on one of the slaves.  Then we changed the configuration on that node to make it a single node cluster and started it with bootstrap.  At this point we had a one node Galera cluster getting transactions from the master.  Next we did the same on the second slave node and added it to the Galera cluster.  We then had to add a grant on the first Galera node in MySQL to allow backups to run so that the new Galera node could receive SST.  Then we did the same for the last node.  The tutorial is open source and available here https://github.com/percona/xtradb-cluster-tutorial.

After the hands on part we dived into Galera specific issues.  Highlights from that part of the talk were:

  • Need to look up the grant for /usr/bin/clustercheck and run it to allow the script to work
  • wsrep_cluster_address list of addresses that the node looks at to find and join its cluster in wsrep_cluster_name
  • Setting up gmcast.segment doesn’t magically convert the traffic to async between the two segments.  You will need to use traditional replication between one of the nodes on the cluster and a stand alone MySQL or one node on another cluster instead.
  • Synchronous part means that all nodes have the transaction and acknowledge that it has been received.   Then transaction is certified on each nodes.  Commit finalized node by node.  Commits are certified asynchronously on the other nodes.  Race conditions are handled by issuing a deadlock error while it is certifying.
  • Certification is deterministic.  If it passes on one node it will pass on all nodes.  Certification happens in order as commits happen on each node.
  • app needs to test for deadlocks on commit
  • first commit wins
  • locks are not replicated amongst other nodes because replication only happens on commit
  • there is a slave lag  This is not two phase commit.  No data loss on node failure
  • Slow nodes cause an issue because of flow control.
  • Replication stops during DDL to ensure that certification remains deterministic
  • Rolling Schema Upgrade mode allows you to upgrade one node at a time.  Takes node out of flow control and get behind and does not replicate the change. Then catches back up.  Same as master slave and change on slave first.  Cannot drop a column, rename a column, or insert a column in the middle.  All the issues with Alter table on replicated slaves exist with rolling schema upgrades. pt-online-schema-change fixes the issue but can cause and issue with foreign keys because of the renaming done by pt-osc which can cause foreign keys not to fire properly for a short period of time.
  • All Galera settings are node specific
  • Galera.cache does not persist past restarts.  So having multiple nodes down can cause SST if you don’t have a node up will they are down.
  • On some linux versions you can tell a node what donor node to use by running: service mysql start –wsrep-sst-door=node1 or you can set the option in config file start and then modify config file
  • Suspect timeout allows nodes to be evicted and a new quorum vote is taken based on the previous node count.  If greater than 50% then cluster is still primary (P in myq_status) otherwise non-primary (n in myq_status).
  • You can connect to a non-primary node but you cannot perform selects, DML, or DDL with the connection but you can look at variables.  You can bootstrap the node but now you have two clusters.
  • Writes are frozen during the timeout (10 seconds) when a node fails
  • The arbitrator node gets all the replication traffic.  This allows it to forward the traffic if the link between the nodes in the other data centers is down.  Need to take this into account when selecting the DC for the arbitrator node as latency and security of this matters.
  • If all nodes go down you will have to bootstrap the node with the highest GTID (cat /var/lib/mysql/grastate.dat).  Only way to cleanly shut it down is to stop the application from writing before you shutdown.
  • If the shutdown was not clean you will not have GTIDs in grastate.dat.  You will need to use “mysqld_safe –wsrep-recover”  GTID is the stuff after the : on the recovery position line.
  • You can avoid SST by using a backup to initialize a node if your galera.cache is sized log enough to contain the transactions since the backup used.  You will need to have used the –galera-info option with your backup and then use the file created by the backup to create a grastate.dat before you start mysql.
  • If cat /var/lib/mysql/grastate.dat shows 00000000-0000-00000-0000000 then it will SST just as if the file is missing.
  • If node won’t SST and the data dir is corrupt you need to remove the data dir and mysql install_db –user=mysql to create a clean empty data dir and then start mysql to SST.  Make sure you remove the .sst.
  • flush tables with read lock; will stall your cluster because of flow control.
  • You can tune flow control using gcs.fc_limit to adjust how deep the queue is before flow control kicks in.
  • The problem node with flow control is the node with the non zero queue depth.  Kill it or fix it and the cluster will resume.
  • wsrep_sync_wait will cause reads to wait on applies to happen before the read.  Slows down reads but ensures reads on all nodes are consistent.

I was able to keep up all morning and for most of the afternoon, but Jay lost me with the cluster check part and so I had to stop doing the hands on parts.  But we plan on using an F5 and I understand the use so I will plat with it more when I get back home.  I was able to fix the issue on the break.  I had miss heard what option to put in the mysqlchk xinetd config file for parameter type.  It needed to be UNLISTED not UNCHECKED as I had heard.  This allowed me to get it running on all the nodes.

So my brain is fried from all this.  I will definitely be playing with this more when i get back from conference.  So you might see more posts around this content.  Stay tuned for more from Percona Live and on Galera.