The Froogle DBA – Management Data Warehouse Part 1

Originally Posted 8/26/2014

So far we have discussed two of the three core components of the Froogle DBA’s toolbox (Central Management Servers and Extended Properties).  With these two features we can manage systems at scale by improving our ability to apply scripts to all or a portion of our environment, make our environment more self-documenting, and improve communication between the members of our DBA team.  In my last post I showed you how you can populate folders in CMS using the Extended Properties you have defined on your databases using PowerShell.  There is an issue with this approach however.  It makes it more difficult (not impossible) to answer the question “Which databases don’t have Extended Properties set?”.  This is a limitation that the Management Data Warehouse can help us overcome.

The Management Data Warehouse (MDW) is a database that collects data sent by data collectors running on the same, or different, instance(s) of SQL Server.  That is to say that it is really just like any other database, except this database is populated by Data Collectors configured in Instances of SQL Server to gather information of interest.  Out of the box, SQL Server provides Data Collectors that can be configured to collect information of interest in troubleshooting performance problems in SQL Server.  It also ships with some reports designed to run inside of SQL Server Management Studio to analyze this data.

Setting up Management Data Warehouse involves two steps.  First, we need to create the Management Data Warehouse database on the server we want to consolidate data in our environment on.  For the Froogle DBA we will use the same instance of SQL Server we use to host Central Management Server to host the Management Data Warehouse database.  We will do this to facilitate some of the other integrations that make the Froogle DBA framework easier to support and maintain.  Secondly, on each of the instances you want to gather data on, you will need to configure Data Collection to report to the Management Data Warehouse database.

In order to configure the Management Data Warehouse database you will need to follow the images below.

Now that we have a place to put data we collect, we need to tell SQL Server to collect data we can later use.  In order to do this we need to configure Data Collection.  The Data Collection is going to be run as jobs on the instances of SQL Server that are collecting.  As such you need to be aware of the account that the SQL Server Agent is running on each of the machines and ensure the accounts have the appropriate permissions in the MDW database, or you can configure an account for Data Collection to run as and use proxies to run Data Collection as that account.  Some data collection relies on caching data in a folder in the file system.  Go ahead and create a folder for this purpose (i.e., C:\MDWCache) and then make sure that the account that is going to perform Data Collection has full control on the folder.  In order to configure collection follow the steps in the following images.

At this point you have Management Data Warehouse up and running and the default collectors collecting data.  Don’t forget to configure Data Collection following the second set of images on all the instances you wish to manage.  We will show how to use the data collected and some custom collectors important to the Froogle DBA later.

Gathering Extended Properties into your Management Data Warehouse database

Originally Posted 8/27/2014

In the first post on the Management Data Warehouse we looked at how to set it up and start collecting default data using Data Collection.  In this post, we will see how to create a custom collector to gather all Extended Properties at the database level on our Instances configured for Data Collection.   This shows how easy it is to create a custom collection set and collect information about our SQL Server instances.

Before we dive into the process, lets discuss a little more about what we are trying to accomplish.  If you have been following along with the Froogle DBA posts, then you know that I recommend you document your database inside the databases themselves using Extended Properties.  By doing so, you can use Central Management Servers to report on your environment.  But sometimes it is just easier and faster to query this information from a single source.  Since we are already leveraging Management Data Warehouse why don’t we use it to consolidate this information from across our environment.  We are going to gather all Extended Properties set on databases, once a day and store them in the Management Data Warehouse database.  We are going to decide we are not going to keep a lot of history on this information, so we will just keep one weeks worth of data.

With this information let’s start working on building our collector.  To start with we need to understand that the way data collection works is that a collector is going to run on each of our instances and report back to the Management Data Warehouse.  This is a distributed collection model and means we will eventually need to “install” this collector on each of our instances.  The implication of this is we will only need to worry about writing a collector to gather information for one instance and then deploy that collector to all of our instances.  We can also further simplify the process of creating the collector by using the data collection’s ability to run a script against all user database on an instance.  With this we only need to write a query to gather the database level Extended Properties for a given database.  The query we will use is shown below.


WITH NULLRECORD AS (
SELECT 0 AS class
)
SELECT
sys.extended_properties.name
, sys.extended_properties.value
FROM NULLRECORD
LEFT JOIN sys.extended_properties ON NULLRECORD.class = sys.extended_properties.class;
The reason we perform the left join is to ensure that we get at least one row per database.  The default collectors that come with SQL Server are gathering data at the instance level and not the database level.  We could use the query below, but we would not be able to answer the question in MDW of which databases are missing Extended Properties.

SELECT
sys.extended_properties.name
, sys.extended_properties.value
FROM sys.extended_properties
WHERE sys.extended_properties.class = 0;

So now that we have our query let’s dive into the process of creating a Data Collection Set.  A Data Collection Set is made up of a Collection Schedule, and a series of Collection Items.  We will have just the one collection item which will gather the Extended Properties.  If we had other things we would like to collect once a day it is best practice to add them as additional items to a single Collection Set vs. having individual Collection Sets for each.  This greatly improves the scalability of Management Data Warehouse.  So let’s address each of these one at a time.  We’ll tackle them in reverse order starting with the Collection Set.

How we build the Collection Set is determined by the Data Collector Type we wish to use.  There are four Data Collector Types provided by SQL Server; Generic T-SQL Query, Generic SQL Trace, Performance Counters, and Query
Activity.  Since the information we wish to gather is based on a T-SQL query we will be using the Generic T-SQL Query Collector Type.  Each collector type has a different set of information it needs which is provided in an XML segment to the Data Collection engine.

In this case we need to tell it the Query to run, the Table to load the data into, and what database(s) to run the query
against.  You can look at the Generic T-SQL Query Collector Type link for more specifics, but the XML segment we want to use is as follows.

<ns:TSQLQueryCollector xmlns:ns=”DataCollectorType”>
<Query>
<Value>
WITH NULLRECORD AS (
SELECT 0 AS class
)
SELECT
sys.extended_properties.name
, sys.extended_properties.value
FROM NULLRECORD
LEFT JOIN sys.extended_properties ON NULLRECORD.class = sys.extended_properties.class
</Value>
<OutputTable>ExtendedProperties</OutputTable>
</Query>
<Databases
UseUserDatabases=”true”/>
</ns:TSQLQueryCollector>

This segment says to run the query against all user databases (<Databases UseUserDatabases=”true” />) and store the results in a table in the custom_snapshots schema in the Management Data Warehouse called “ExtendedProperties” (<OutputTable>ExtendedProperties</OutputTable>).  We will need to wrap this XML in a stored procedure call later, but before we put this together we need to address the issue of scheduling.

SQL Server ships with 6 schedules for Data Collection Sets.  They are all designed to run multiple times a day.  There are collection schedules available for collecting at 5, 10, 15 and 30 minutes.  There are also collection schedules available to collect at 1 and 6 hours.  We could just use the 6 hour collection and collect multiple times a day.  But since we don’t have a business need to do so we will take advantage of the fact that we can create our own schedules. All of the schedules created for data collection all start with “CollectorSchedule_”.  To avoid confusion (and the chance that SQL Server adds a daily collection schedule in the future) we will avoid using that convention.  We’ll use “Collector_Schedule_” instead.

In order to create a new schedule we need to use the sp_add_schedule stored procedure in the msdb databases.  If you are interested in all the options available you can refer to the information in BOL.  But for our purposes to create a schedule to run daily at 01:00 we would use the following script.  We will intentionally not include the time of day in the name of the schedule, because as our environment grows (or if you have a large environment already) we may need to stagger the times that instances report into Management Data Warehouse to improve performance.

USE msdb;

IF NOT EXISTS(
SELECT schedule_uid
FROM [msdb].[dbo].[sysschedules]
WHERE name = N’Collector_Schedule_Daily’
)
EXEC sp_add_schedule @schedule_name =  ‘Collector_Schedule_Daily’,
@enabled = 1
, @freq_type = 4
, @freq_interval = 1
, @freq_subday_type = 1
, @freq_recurrence_factor = 1
, @active_start_time = 010000
, @owner_login_name = ‘sa’;

So now that we have the xml segment for the collector and the schedule, lets put it all together into a single script which will create the schedule if needed, drop the collection set if it exists, create the collection set, and then finally start up the collection set.  This is the script that we can use with Central Management Server to get all of our Instances collecting Extended Properties and reporting them to the Management Data Warehouse.

USE msdb;

DECLARE @collectorname VARCHAR(128);
SET @collectorname = ‘Database Extended Properties’;

IF NOT EXISTS(
SELECT schedule_uid
FROM [msdb].[dbo].[sysschedules]
WHERE name = N’Collector_Schedule_Daily’
)
EXEC sp_add_schedule @schedule_name =  N’Collector_Schedule_Daily’,
@enabled = 1
, @freq_type = 4
, @freq_interval = 1
, @freq_subday_type = 1
, @freq_recurrence_factor = 1
, @active_start_time = 010000
, @owner_login_name = ‘sa’;

IF EXISTS(
SELECT name
FROM [syscollector_collection_items_internal]
WHERE name = @collectorname

)
EXEC sp_syscollector_delete_collection_set @name = @collectorname;

DECLARE @collector_type UNIQUEIDENTIFIER;
SELECT @collector_type = collector_type_uid
FROM [msdb].[dbo].[syscollector_collector_types]
WHERE name = N’Generic T-SQL Query Collector Type’;

DECLARE @schedule UNIQUEIDENTIFIER;
SELECT @schedule = schedule_uid
FROM [msdb].[dbo].[sysschedules]
WHERE name = N’Collector_Schedule_Daily’;

DECLARE @collectionsetid INT;
DECLARE @collectionsetuid UNIQUEIDENTIFIER;
EXEC [msdb].[dbo].[sp_syscollector_create_collection_set]
@name=@collectorname,
@collection_mode=1, — Non-cached mode
@description=N’Records Database Extended Properties’,
@days_until_expiration=7,  — 1 week
@schedule_uid=@schedule,
@collection_set_id=@collectionsetid OUTPUT,
@collection_set_uid=@collectionsetuid OUTPUT;

DECLARE @parameters XML;
SET @parameters=‘<ns:TSQLQueryCollector xmlns:ns=”DataCollectorType”>
<Query>
<Value>
WITH NULLRECORD AS (
SELECT 0 AS class
)
SELECT
sys.extended_properties.name
, sys.extended_properties.value
FROM NULLRECORD
LEFT JOIN sys.extended_properties ON NULLRECORD.class = sys.extended_properties.class
</Value>
<OutputTable>ExtendedProperties</OutputTable>
</Query>
<Databases
UseUserDatabases=”true”/>
</ns:TSQLQueryCollector> ‘
;

DECLARE @collectionitem INT;
EXEC [msdb].[dbo].[sp_syscollector_create_collection_item]
@name=@collectorname,
@parameters=@parameters,
@collection_item_id=@collectionitem OUTPUT,
@collection_set_id=@collectionsetid,
@collector_type_uid=@collector_type;

EXEC sp_syscollector_start_collection_set @name = @collectorname;

So now we have a collector we can use to collect Database Extended Properties for all our user databases.  Once we deploy this to all of our SQL Server instances we will have a central repository of this information we can use.  There are many other collectors we could write to gather information on our environment.  We will save those for another time.

I hope you have been enjoying the Froogle DBA series.  I would love to hear any feedback you might have.

Verifying our XtraDB Cluster

In my last blog post, I showed you how easy it is to set up a three node Percona XtraDB Cluster.  But how do we know if the cluster is working or not?  In this post I will go over the built-in variables that ship with Galera you can use to check on the status of your cluster.  Before we dive in, it is important to realize that Galera is a multi-master cluster.  This means that you can read and write to all the nodes.  One of the implications of the manic that makes Galera work is that each node can be independently configured. As such, it is important that you check all the nodes, not just one, to determine the overall health of your cluster.  For the rest of this post I will be only running the checks on one of the nodes for brevity.  But you should really run these checks on all the nodes to ensure a fully functioning cluster.

Probably the most important variable is wsrep_cluster_status. This variable will tell you if the node is up and fully part of the cluster.  If the node is fully part of the cluster this variable will return the status of Primary.  Any other response and the node is not fully part of the cluster.  You can check the status by running:

node1 mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+
1 row in set (0.00 sec)

But checking the variable alone is not enough, because you can bootstrap all three nodes and have three seperate clusters. In this case, all three would return Primary but you would still not have a three node cluster. In order to tell if you if the node is part of the cluster you can use wsrep_cluster_size. This will tell you the number of nodes in the cluster.

node1 mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)

For our simple three node example, this is probably good enough. But what if you work in an environment with multiple clusters. How can you tell if the node is part of the right cluster? To be sure we can use wsrep_cluster_state_uuid. All the nodes fully participating in the cluster should return the same value.

node1 mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_state_uuid';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_state_uuid | c0803766-24db-11e5-ac7a-abcded9c6987 |
+--------------------------+--------------------------------------+
1 row in set (0.00 sec)

For more information on how to use wsrep variables to check on the health of your cluster you can use the Galera documentation.  In future posts we will dive more into Galera.

Installing Percona XtraDB Cluster on CentOS 6 using VirtualBox

In this post I will be walking you through installing a 3 node test XtraDB Cluster (PXC) environment using VirtualBox.  The version of VirtualBox I will be using is 4.3.26. And I will be using CentOS 6.6 for the operating system on the three nodes. To get started you will want to perform a minimal install of 6.6 on 3 virtual machines. I named mine pxc-node1, pxc-node2, and pxc-node3. I also set up port forwarding in VirtualBox to allow me to ssh to the machines via a local port on the host. I set up 22201 to point to pxc-node1, 22202 to point to pxc-node2, and 22203 to point to pxc-node3. The last thing that I did was to add a second host only network interface to each of the nodes and ensure the same name was set for all the nodes.  You can do all of this work yourselves, or simply download an already configured appliance from here.

Now that we have the images created, it is time to discuss the plan for installing our Percona XtraDB Cluster. The actual installation process is very similar to the process that I used for installing Percona Server. But once we have the binaries laid down the configuration of the instances will be different and we will need to start the first node of the cluster a special way. While this quick overview does not give a lot of detail, I think it best to learn by doing. With that in mind, here we go.

First ensure that all of the nodes are running. To keep things simple we will be installing as root. But I have done installs at work on RHEL 6 using sudo and the process is pretty much identical. First thing we will need to do is to enable the network interfaces on each machine and disable SELinux. Galera does not work with SELinux and since XtraDB Cluster is just Percona’s Galera implementation we will need to disable it. In the console for each of the nodes you will need to run the following commands.

[root@localhost ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth0

You will want to modify the ONBOOT to be yes. The file should look like this when finished.

DEVICE=eth0
HWADDR=08:00:27:24:FD:19
TYPE=Ethernet
UUID=b3c128fe-335e-437d-bf3b-32f3706d9273
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=dhcp

Now save the file.  We also need to add a config for the secondary NIC which we want to not set up for DHCP, but use static ip addresses.  We will use 192.168.70.11 for pxc-node1, 192.168.70.12 for pxc-node2, and 192.168.70.13 for pxc-node3.  The contents of /etc/sysconfig/network-scripts/ifcfg-eth1 for pxc-node1 would look like.

NM_CONTROLLED=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.70.11
NETMASK=255.255.255.0
DEVICE=eth1

PEERDNS=no

To disable SELInux we need to run.

[root@localhost ~] vi /etc/selinux/config

You will want to set it to disabled. The file should look like this when finished.

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Now save the file. We also will be stopping the firewall and setting it not to restart.  In production we would want to add rules to allow traffic, but this is just a test environment so we’ll take the short cut.

[root@localhost ~]$ service iptables stop
[root@localhost ~]$ chkconfig iptables off

To make it easier to know what system you are on I would modify the shell to show the node name. To do that edit the bash configuration file using.

vi /etc/bashrc

Find the line that says

[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "

Comment that line out and add the following

PS1='\u@pxc-node1:\w\$ '

The section should look like this when finished then save the file.

# Turn on checkwinsize
  shopt -s checkwinsize
#  [ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "
  PS1='\u@pxc-node1:\w\$ '

After making these changes you will want to restart the node using the following command and then perform the same steps on the other two nodes.

[root@localhost ~] shutdown -r 0

Once we are finished with the prep of the three nodes we are ready to start the installation of the PXC cluster. First we want to make sure the the MySQL libraries are not already on the box.

root@pxc-node1:~# yum -y remove mysql-libs

Then we want to import the EPEL and the Percona Repositories.

root@pxc-node1:~# rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
root@pxc-node1:~# rpm -Uvh http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm

Now that we have the repository registered it is time to lay down the binaries.

root@pxc-node1:~# yum install -y socat
root@pxc-node1:~# yum install -y Percona-XtraDB-Cluster-server-56 Percona-XtraDB-Cluster-client-56 Percona-XtraDB-Cluster-shared-56 percona-toolkit percona-xtrabackup
root@pxc-node1:~# touch /etc/my.cnf
root@pxc-node1:~# /usr/bin/mysql_install_db --defaults-file=/etc/my.cnf --force --datadir=/var/lib/mysql --basedir=/usr/ --user=mysql

You will need to run these same steps on all three nodes to install the binaries. This will also create the data directory and populate it with the system tables.
Now that we have the binaries on each machine it is time to set up the cluster. This involves editing the /etc/my.cnf file. To start with, let’s configure the first node in our cluster. On pxc-node1 edit the /etc/my.cnf file so that it looks like this.

[mysqld]
datadir                         = /var/lib/mysql
log_error                       = error.log

log-bin
server-id = 1

query_cache_size=0
query_cache_type=0

innodb_buffer_pool_size                 = 48M
innodb_log_file_size            = 24M
innodb_flush_method                             = O_DIRECT
innodb_file_per_table
innodb_flush_log_at_trx_commit  = 0

performance_schema=OFF

binlog_format = ROW

# galera settings
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_cluster_name = mycluster
wsrep_cluster_address = gcomm://192.168.70.11,192.168.70.12,192.168.11.13
wsrep_node_name = node1
wsrep_node_address = 192.168.70.11
wsrep_sst_auth = sst:secret
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = ON


[mysql]
prompt                          = "node1 mysql> "

[client]
user                            = root​

If we try to start the cluster now using the same method for a normal MySQL instance it will fail.

root@pxc-node1:~# service mysql start
Starting MySQL (Percona XtraDB Cluster)................................... ERROR! The server quit without updating PID file (/var/lib/mysql/localhost.localdomain.pid).
 ERROR! MySQL (Percona XtraDB Cluster) server startup failed!

This is because the first node in a cluster has to be forced online as a safety feature since you cannot have quorum with only one node. To force it online run the following.

root@pxc-node1:~# service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster)...... SUCCESS!

You may see a message similar to the following since we just tried to start the cluster without bootstrapping first.

root@pxc-node1:~# service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster) ERROR! MySQL (Percona XtraDB Cluster) is not running, but lock file (/var/lock/subsys/mysql) exists
Starting MySQL (Percona XtraDB Cluster)......... SUCCESS!

Either way you now have a cluster up and running with one node. Before we add any more nodes, let’s put some things in place to verify the cluster once it is up. We will be installing sysbench to allow us to drive a load at the machine, and we will be installing myq_tools to check the flow of transactions in the cluster. Run this on all three nodes to set up sysbench and myq_tools and get them ready for our use.

root@pxc-node1:~# yum install -y sysbench
root@pxc-node1:~# mysql
node1 mysql> CREATE USER 'test'@'localhost' IDENTIFIED BY 'test';
node1 mysql> GRANT ALL PRIVILEGES ON test.* TO 'test'@'localhost';
node1 mysql> \q
root@pxc-node1:~# yum install -y wget
root@pxc-node1:~# cd /usr/local/bin​ ​
root@pxc-node1:~# ​wget https://github.com/jayjanssen/myq-tools/releases/download/v0.5/myq_tools.tgz ​
​root@pxc-node1:~# tar -xzvf myq_tools.tgz ​
root@pxc-node1:~# cd bin ​
root@pxc-node1:~# mv * ../ ​
root@pxc-node1:~# cd .. ​
root@pxc-node1:~# rm -rf​ bin​ ​
​root@pxc-node1:~# ln -s /usr/local/bin/myq_status.linux-amd64 myq_status​​
​root@pxc-node1:~# vi /usr/local/bin/run_sysbench_oltp.sh

sysbench --db-driver=mysql --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-user=test --mysql-password=test --mysql-db=test --mysql-host=localhost --mysql-ignore-errors=all --oltp-tables-count=1 --oltp-table-size=250000 --oltp-auto-inc=off --num-threads=1 --report-interval=1 --max-requests=0 --tx-rate=10 run | grep tps

root@pxc-node1:~# chmod +x run_sysbench_oltp.sh
root@pxc-node1:~# sysbench --db-driver=mysql --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-user=test --mysql-password=test --mysql-db=test --mysql-host=localhost --mysql-ignore-errors=all --oltp-table-size=250000 --num-threads=1 prepare

You can verify that you have sysbench set up correctly but running the following.  You should not see any errors.  There should also be numbers in the reads and writes columns.

root@pxc-node1:/usr/local/bin# run_sysbench_oltp.sh
[   1s] threads: 1, tps: 10.99, reads: 153.85, writes: 43.96, response time: 12.64ms (95%), errors: 0.00, reconnects:  0.00
[   2s] threads: 1, tps: 17.02, reads: 238.28, writes: 68.08, response time: 12.84ms (95%), errors: 0.00, reconnects:  0.00
[   3s] threads: 1, tps: 10.00, reads: 153.02, writes: 40.01, response time: 10.10ms (95%), errors: 0.00, reconnects:  0.00
[   4s] threads: 1, tps: 19.00, reads: 253.00, writes: 76.00, response time: 10.26ms (95%), errors: 0.00, reconnects:  0.00
[   5s] threads: 1, tps: 9.00, reads: 125.99, writes: 36.00, response time: 11.42ms (95%), errors: 0.00, reconnects:  0.00
[   6s] threads: 1, tps: 6.00, reads: 83.98, writes: 23.99, response time: 12.19ms (95%), errors: 0.00, reconnects:  0.00
[   7s] threads: 1, tps: 6.00, reads: 84.01, writes: 24.00, response time: 63.33ms (95%), errors: 0.00, reconnects:  0.00

Now that we have the cluster running we need to add the other two nodes. But before we do that, we need to create a user with permissions for the SST process. We’ll talk more about SST in the future, but for now its important to know this process allows the other nodes to get a copy of the database when they join the cluster. To do that we need to run the following on node 1.

root@pxc-node1:/usr/local/bin# mysql
node1 mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst'@'localhost' IDENTIFIED BY 'secret';​​
node1 mysql> \q

Since we already have the binaries installed we just need to configure the other two instances and start mysql on the nodes. The instructions for node 2 are below. Notice that we don’t need to bootstrap these instances. In fact, if we do bootstrap them we would end up with 3 clusters instead of just one.

root@pxc-node2:~#vi /etc/my.cnf

[mysqld]
datadir                         = /var/lib/mysql
log_error                       = error.log

log-bin
server-id = 2

query_cache_size=0
query_cache_type=0

innodb_buffer_pool_size                 = 48M
innodb_log_file_size            = 24M
innodb_flush_method                             = O_DIRECT
innodb_file_per_table
innodb_flush_log_at_trx_commit  = 0

performance_schema=OFF

binlog_format = ROW

# galera settings
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_cluster_name = mycluster
wsrep_cluster_address = gcomm://192.168.70.11,192.168.70.12,192.168.70.13
wsrep_node_name = node2
wsrep_node_address = 192.168.70.12
wsrep_sst_auth = sst:secret
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = ON


[mysql]
prompt                          = "node2 mysql> "

[client]
user                            = root​

root@pxc-node2:~# service mysql restart
Shutting down MySQL (Percona XtraDB Cluster).. SUCCESS!
Starting MySQL (Percona XtraDB Cluster).....State transfer in progress, setting sleep higher
... SUCCESS!

At this point we now have a two node cluster. To add the last node you need to modify /etc/my.cnf on that node and then start the mysql service. You can use the configuration file for node 2 as an example. You will need to modify the server-id, wsrep_node_name, wsrep_node_address, and prompt to the values for pxc-node3. If you have any issues let me know.

In a future post, I will show you how to set up HAProxy to provide a single connection (actually we will use 2 connections but more on that later) for your applications. We will also play around with some of the more interesting features of Galera and Percona XtraDB Cluster.

More MySQL Auto Increment Fun

Originally Posted 10/21/2014

Last blog post I started to explore some of, what I found to be, the interesting behaviors surrounding the AUTO_INCREMENT attribute in MySQL.  With strict mode not set it does some really strange things.  Like allowing you to add the max value in the range over and over again as long as you delete the record that used to hold that value.  This could cause some pretty interesting issues in applications that are not storing their foreign key relationships in the database.  But I wanted to see how strict mode would impact the behavior.  So I rebuilt the environment using the following script file.

CREATE DATABASE IF NOT EXISTS moderndba;

 USE moderndba;

 DROP TABLE IF EXISTS autofun;

 CREATE TABLE IF NOT EXISTS autofun
 (
       employee_id        TINYINT SIGNED NOT NULL AUTO_INCREMENT
     , employee_name      VARCHAR(150) NOT NULL
     , employee_level     TINYINT UNSIGNED NULL
     , PRIMARY KEY (employee_id)
 );

 INSERT INTO autofun (employee_name, employee_level)
 VALUES
  ('Joe Smith', 0)
, ('Tim Bradford', 1)
, ('Betty Stafford', 1);

Notice, I changed the data type for the employee_id field from an unsigned int to a signed tiny int. This was just to make the math a little easier for me as this brings the max value in the range down to a easy to remember (and type) 127. Last time we saw that if you inserted a row and provided a value for employee_id at the end of the range you can not insert another row without specifying a value for employee_id that is not already used in the table. If you insert a row and do not provide an employee_id, or provide 0, or provide NULL, the row will not insert. But the error is not that the value is out of range. Instead the error is that there is a conflict in the primary key. This is because MySQL is noticing that the next value in the sequence is outside the range and picking the closest value in the range and trying to insert it. But, I know that STRICT_ALL_TABLES mode makes MySQL less forgiving with range values as well as in other ways. So, I ran the following to see what would happen with script mode turned on.

mysql> USE moderndba;
 Reading table information for completion of table and column names
 You can turn off this feature to get a quicker startup with -A

 Database changed
 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 +-------------+----------------+----------------+
 3 rows in set (0.00 sec)

 mysql> SET sql_mode='STRICT_ALL_TABLES';
 Query OK, 0 rows affected (0.04 sec)

 mysql>  INSERT INTO autofun
     -> (employee_id, empoyee_name)
     -> VALUES
     -> (127, 'John Wiley');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |         127 | John Wiley     |           NULL |
 +-------------+----------------+----------------+
 4 rows in set (0.00 sec)

 mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 ERROR 1062 (23000): Duplicate entry '127' for key 'PRIMARY'

Oh well. So much for strict mode helping us get a better error message. It still is only chocking because of the duplicate key. I was hoping we would get a message that the next value was out of range, but no such luck.

Last time we saw that if you delete the last row and then perform an insert without specifying the employee_id that you will get the max value again. But while reading up more on the AUTO_INCREMENT attribute, I found this documentation on how it works on an InnoDB table. I thought this was interesting. So to test it our I ran the following statement to remove the last row in the table.

mysql> DELETE FROM autofun WHERE employee_id = 127;
 Query OK, 1 row affected (0.09 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 +-------------+----------------+----------------+
 3 rows in set (0.00 sec)

So last time when we performed the insert we would get an employee_id of 127. But what happens if the server bounces between the delete and the subsequent insert. To find out I exited out of the MySQL prompt and ran the following from the shell prompt.

[student@percona56 ~]$ sudo service mysql restart

Then after the restart I ran the following.

mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           4 | Amanda King    |           NULL |
 +-------------+----------------+----------------+
 4 rows in set (0.00 sec)

So if the server is restarted the next value in the sequence is recalculated based on the maximum value stored in the column. This could be both beneficial (if you need to resolve the situation where you have removed some values you want to reuse) or not so beneficial (if you are not managing your foreign key relationships carefully). I still have a lot more questions about the AUTO_INCREMENT attribute in MySQL, but I think this is enough for the moment. I hope you are finding this behavior just as interesting as I am.

MySQL Auto Increment Fun

Originally Posted 10/21/2014

Coming from a heavy Microsoft SQL Server background the concept of the AUTO_INCREMENT attribute in MySQL is very familiar.  In SQL Server we call it the IDENTITY attribute, but they are conceptually very similar.  When a row is inserted into the table the column specified with the AUTO_INCREMENT attribute automatically gets the next value in a sequence.  But while they are conceptually very similar, there are a number of differences that a SQL Server Database Administrator should be aware of.  To start with lets create a database and a table we will use as we explore the AUTO_INCREMENT attribute.  You can copy the following script to a file on your system named “auto_increment.sql”.  Or you can simply run the commands from the mysql prompt to generate the objects and seed the table with data.

CREATE DATABASE IF NOT EXISTS moderndba;

 USE moderndba;

 CREATE TABLE IF NOT EXISTS autofun
 (
       employee_id        INT UNSIGNED NOT NULL AUTO_INCREMENT
     , employee_name      VARCHAR(150) NOT NULL
     , employee_level     TINYINT UNSIGNED NULL
     , PRIMARY KEY (employee_id)
 );

 INSERT INTO autofun (employee_name, employee_level)
 VALUES
  ('Joe Smith', 0)
, ('Tim Bradford', 1)
, ('Betty Stafford', 1);

If you chose to copy the script and run it you can run the script as follows.

[student@percona56 ~]# mysql -u root -p < auto_increment.sql

In SQL Server you can insert directly into an IDENTITY column, but you have to turn on IDENTITY_INSERT first. You can only have it turned on for one table at a time and you need elevated permissions to do so. Let’s see what happens when we try to insert directly into a AUTO_INCREMENT column in MySQL. Run the following statements from the MySQL prompt.

[student@percona56 ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.24-72.2-log Percona Server (GPL), Release 72.2, Revision 8d0f85b

Copyright (c) 2009-2015 Percona LLC and/or its affiliates
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> USE moderndba;
 Reading table information for completion of table and column names
 You can turn off this feature to get a quicker startup with -A

 Database changed
 mysql> INSERT INTO autofun
     -> (employee_id, empoyee_name)
     -> VALUES
     -> (10, 'John Wiley');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |          10 | John Wiley     |           NULL |
 +-------------+----------------+----------------+
 4 rows in set (0.00 sec)

 mysql>

As you can see, in MySQL you can insert directly into the identity table as the default behavior. But what happens when we go to add the next row. Will the employee_id be a 4, an 11, or some other number? Let’s see.

mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 +-------------+----------------+----------------+
 5 rows in set (0.00 sec)

 mysql>

Ok. That makes sense. The last row inserted was a 10, so the next row would be safest obviously to use the last value plus one. But if this is the case, surely when you insert a value into the column explicitly it must be greater than the largest value in the column, correct? Let’s find out. We can run the following commands and test out this theory.

mysql> INSERT INTO autofun
     -> (employee_id, empoyee_name)
     -> VALUES
     -> (5, 'Sarah Wagoner');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 +-------------+----------------+----------------+
 6 rows in set (0.00 sec)

 mysql>

Alright, you can insert into the middle of the series. Well, for safety it will increment the column from the largest value and not the last then, right?

mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Mia Brown');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 |          11 | Mia Brown      |           NULL |
 +-------------+----------------+----------------+
 7 rows in set (0.00 sec)

 mysql>

Yes it does. That seams pretty safe, since you could easily run into problems using the number after the last number used if you allowed inserts into the middle of the series. But it brings up a specific risk that DBAs and Developers need to be aware of. Remember from the script above we made the employee_id field an unsigned integer. The maximum value we can store in an unsigned integer in MySQL is 4,294,967,295. The risk is probably a lot lower on an unsigned integer than on a signed tiny integer (1 byte with a maximum value of 127), but what will happen if someone accidentally inserts a row with a value for employee_id towards the end of the range. Let’s take a look and see what happens.

mysql> INSERT INTO autofun
     -> (employee_id, empoyee_name)
     -> VALUES
     -> (4294967294, 'Brooke Allen');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 |          11 | Mia Brown      |           NULL |
 |  4294967294 | Brooke Allen   |           NULL |
 +-------------+----------------+----------------+
 8 rows in set (0.00 sec)

 mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Chris Walters');
 Query OK, 1 row affected (0.03 sec)


 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 |          11 | Mia Brown      |           NULL |
 |  4294967294 | Brooke Allen   |           NULL |
 |  4294967295 | Chris Walters  |           NULL |
 +-------------+----------------+----------------+
 9 rows in set (0.00 sec)

 mysql>

So everything is cool so far, but we are now up against the limit of the range for an unsigned integer, and we got there with only two inserts. (We could have gotten there with one insert if we really wanted.) So what happens when we go to add the next auto incrementing row to the table. Lets find out.

mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Ben Forest');
 ERROR 1062 (23000): Duplicate entry '4294967295' for key 'PRIMARY'
 mysql>

So we have reached the end of the range and when MySQL tries to grab the next value larger than the largest value in the column it can’t and therefore grabs the same maximum value for the data type. This leads to the error of duplicate primary keys. But what if we remove the rows from the table, then what? Is it really looking at the maximum value in the table, or the maximum value ever inserted. Lets see.

mysql> DELETE FROM autofun WHERE employee_name = 'Chris Walters';
 Query OK, 1 row affected (0.05 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 |          11 | Mia Brown      |           NULL |
 |  4294967294 | Brooke Allen   |           NULL |
 +-------------+----------------+----------------+
 8 rows in set (0.00 sec)

 mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Ben Forest');
 Query OK, 1 row affected (0.03 sec)

 mysql>  SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 |          11 | Mia Brown      |           NULL |
 |  4294967294 | Brooke Allen   |           NULL |
 |  4294967295 | Ben Forest     |           NULL |
 +-------------+----------------+----------------+
 9 rows in set (0.00 sec)

 mysql> DELETE FROM autofun WHERE employee_name IN ('Ben Forest', 'Brooke Allen');
 Query OK, 2 row affected (0.02 sec)

 mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Nina Gardner');
 Query OK, 1 row affected (0.03 sec)

 mysql>  SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           5 | Sarah Wagoner  |           NULL |
 |          10 | John Wiley     |           NULL |
 |          11 | Amanda King    |           NULL |
 |          11 | Mia Brown      |           NULL |
 |  4294967295 | Nina Gardner   |           NULL |
 +-------------+----------------+----------------+
 8 rows in set (0.00 sec)

 mysql>

So it is not using the largest value in the column plus one. Instead it really is using the next value after the largest value ever inserted. The only issue is that when you get to the end of the range, instead of raising an error that the next value is out of range it takes the closest value in the range and tries to use it. This can be an issue since we are dealing with the primary key on the table, what does this behavior mean to other tables with relationships back to this key.

I understand that this is an edge case, and most of the time you will not need to worry about this. But it is an interesting behavior and one that I have not seen in other relational database management systems. I hope you found this as interesting as I have.

Installing Percona Server 5.6 on CentOS 6

Installing Percona Server on CentOS 6 couldn’t be easier using yum and the Percona repository.  We start with a minimal install of CentOS 6. I am using CentOS 6 because I am using RHEL 6 at work and want to stay closer to that environment when I work on testing things at home.

First we need to remove the MySQL libraries that ship with CentOS

[student@percona56 ~]$ sudo yum -y remove mysql-libs

Next import the Percona repository using the following command.

[student@percona56 ~]$ sudo yum -y install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm

Now that we have the repository installed we can verify by running:

[student@percona56 ~]$ sudo yum list | grep percona

This should return a listing of the yum packages in the Percona repository. For this example we will be installing not only the server but the client and some other packages that are helpful to have around when running mysql. We will be installing the 5.6 version of Percona Server as other versions are available at this time in the repository. To install our packages we will run:

[student@percona56 ~]$ sudo yum install Percona-Server-shared-56.x86_64 Percona-Server-client-56.x86_64 Percona-Server-server-56.x86_64 percona-toolkit percona-xtrabackup

Now that we have the binaries laid down we will need to create the data directory and start MySQL. To do that we run:

[student@percona56 ~]$ sudo touch /etc/my.cnf
[student@percona56 ~]$ sudo /usr/bin/mysql_install_db --defaults-file=/etc/my.cnf --force --datadir=/var/lib/mysql --basedir=/usr/ --user=mysql
[student@percona56 ~]$ sudo service mysql start

We now have MySQL running and the first thing I like to do is secure the installation using the built in script provided. We can run the script using the command below. Just hit Enter when it asks for the current root password. We will want to set the root password, remove anonymous users, disallow remote login by root, and remove the test database. We will also want to reload privileges when finished.

[student@percona56 ~]$ /usr/bin/mysql_secure_installation

I also like to install the sys schema. The sys schema is a set of objects that help you analyze data in the Performance Schema. This schema is included by default in 5.7, and is needed by Workbench reports. If you don’t install it now, you can do so through Workbench. I like to go ahead and get it out of the way.

[student@percona56 ~]$ sudo yum -y install git
[student@percona56 ~]$ cd /tmp
[student@percona56 tmp]$ git clone https://github.com/MarkLeith/mysql-sys
[student@percona56 tmp]$ cd mysql-sys
[student@percona56 mysql-sys]$ mysql -u root -p < ./sys_56.sql

But we still can’t use the sys schema because we have not turned on Performance Schema on our instance. In fact we haven’t configured our instance at all. Lets do that now by editing our /etc/my.cnf file. We want our file to look like this when finished.

[mysql]

# CLIENT #
port                           = 3306
socket                         = /var/lib/mysql/mysql.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/lib/mysql/mysql.sock
pid-file                       = /var/lib/mysql/mysql.pid

# MyISAM #
key-buffer-size                = 32M
myisam-recover                 = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000
sql-mode                       = STRICT_ALL_TABLES,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# BINARY LOGGING #
log-bin                        = /var/lib/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 500
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache               = 4096

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 256M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 1536M     #see https://www.percona.com/blog/2007/11/03/choosing-innodb_buffer_pool_size/ for guidance

# LOGGING #
log-error                      = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log                 = 1
slow-query-log-file            = /var/lib/mysql/mysql-slow.log

# PERFORMANCE SCHEMA #
performance_schema
performance_schema_instrument = '%=on'
performance_schema_consumer_events_stages_current = ON
performance_schema_consumer_events_stages_history = ON
performance_schema_consumer_events_stages_history_long = ON
performance_schema_consumer_events_statements_current = ON
performance_schema_consumer_events_statements_history = ON
performance_schema_consumer_events_statements_history_long = ON
performance_schema_consumer_events_waits_current = ON
performance_schema_consumer_events_waits_history = ON
performance_schema_consumer_events_waits_history_long = ON
performance_schema_consumer_global_instrumentation = ON
performance_schema_consumer_thread_instrumentation = ON
performance_schema_consumer_statements_digest = ON

At this point we should be ready to restart MySQL and get started.

sudo service mysql restart

I will use this base image going forward as the starting point for all my MySQL explorations.

Percona Live 2015 Wrap Up

With Percona Live 2015 in the books I thought I should take the time to gather my thoughts.  There was a ton of information shared at the conference.  We sent 3 people to the main conference and one to the new MySQL 101 conference and we still had to leave excellent presentations on the table.  But we got what we came to get.  We feel more comfortable with MySQL in general and with Galera in particular.  We are rolling out a Percona XtraDB Cluster soon and not only did this conference clear up some questions, it also helped us tweak our architecture to provide better availability for our customers.

I could go on and on about all the technical knowledge we gained here, but I have already posted notes on each of the sessions I attended.  But beyond the technical, I was impressed by the vibrant and innovated community surrounding MySQL.  We spoke to people solving some of the most challenging OLTP workloads out there.  Not only did they show us that MySQL can support these workloads, they showed us how MySQL is better suited than proprietary database platforms.  MySQL gives you the ability to scale at a cost point that proprietary RDBMs can’t.

I know some will say that it takes application changes in order reach these levels of scale with MySQL.  This is true today, but I saw several presentations where the MySQL community is working on solving this problem and developing a layer of abstraction that “traditional” applications can connect to and not have to worry about the complexity.  In fact, MariaDB‘s MaxScale is already providing this type of functionality.

Heading back home and thinking about how to use all this knowledge for good I realize that I will have a lot of resistance.  We have a lot invested in our existing proprietary database platforms.  But talking to folks who have been where I sit, I realized this week that if we don’t adapt to speed up the rate of innovation we will become irrelevant.  I for one plan to evolve!

My Session Notes

JSON support in MySQL 5.7 – Percona Live 2015

I attended the “JSON support in MySQL 5.7” session given by Evgeny Potemkin and Manyi Lu from Oracle.  This was my last session of the conference and my head was swimming with knowledge at this point.  In 5.7 they are adding json data type, json functions and the ability to index json data.  Before your options were to store json as either text or binary but you had to manage all the validation, parsing, etc.  The new JSON datatype is

  • optimized for read intensive workload
  • parse and validation on insert only
  • dictionary (sorted object keys, fast access to array cells by index)
  • In-place updated
  • Smart size 64K & 4G
  • UTF8 encoding
  • Supports all native JSON types (numer, string, bool, object, arrays, etc.)
  • Also supports Date, time, datetime, timestamp, and others

They gave some sample inserts that I will need to grab from their slides.  JSON comparator has polymorphic behavior and provides seamless and consistent comparison.  So you can compare JSON to JSON or JSON to SQL.  Different data types will always be non-equal so no automatic type conversion.  They then showed a sample select using the JSON datatype.  They also have introduced the concept of JSON path which is a way to identify how to get JSON data.  The details are on the slides and it was too much to record so I will have to grab it later for the developers.  But $ is used for the document root, [] define an array, and they use * for a search.  Other functions

  • JSN_VALID()
  • JSN_TYPE()
  • JSN_KEYS()
  • JSN_LENGTH()
  • JSN_DEPTH()
  • JSN_CONTAINS_PATH()
  • JSN_REMOVE()
  • JSN_APPEND()
  • JSN_SET()
  • JSN_INSERT()
  • JSN_REPLACE()
  • JSN_MERGE()
  • JSN_ARRAY()
  • JSN_OBJECT()
  • JSN_EXTRACT()
  • JSN_SEARCH()
  • JSN_QUOTE()
  • JSN_UNQUOTE()

They then showed how you can use a select to return the data as JSON using these function (again need to refer to slides).  They will support both Stored and Virtual indexes on JSON data.  Primary index must be stored.  Stored indexes can be BTREE, FTs, or GIS and Virtual can only be BTREE.  They plan to add Instant ALTER to Virtual indexes soon.  They have a bunch more with JSON on the roadmap like Full-text indexing and GIS indexes.

Incremental backups available with Xtrabackup and Percona Server – Percona Live 2015

In the session “Incremental backups available with Xtrabackup and Percona Server” Vladislav Lesin and George Lorch from Percona went over differential backups.  There are three ways to get the delta; full scan, use redo logs, or a combination of the two.  For full scan you scan all pages and backup any pages with last change LSN newer than a specific LSN.  Advantages are that you only store changed pages and there are no server changes required.  The downside if you have the overhead of a full scan. The redo logs can be used to restore a database to/from a specific point since it has the change data.  Advantage of using redo log are that there is no full scan.  The downsize is the space consumed.

In the combined method they store the LSN information on pages in the server in a RB-tree utilizing a separate thread that checks the redo logs on checkpoint.  This is to avoid the full scan.  They shared an edge case where size can be an issue with this approach.