More MySQL Auto Increment Fun

Originally Posted 10/21/2014

Last blog post I started to explore some of, what I found to be, the interesting behaviors surrounding the AUTO_INCREMENT attribute in MySQL.  With strict mode not set it does some really strange things.  Like allowing you to add the max value in the range over and over again as long as you delete the record that used to hold that value.  This could cause some pretty interesting issues in applications that are not storing their foreign key relationships in the database.  But I wanted to see how strict mode would impact the behavior.  So I rebuilt the environment using the following script file.

CREATE DATABASE IF NOT EXISTS moderndba;

 USE moderndba;

 DROP TABLE IF EXISTS autofun;

 CREATE TABLE IF NOT EXISTS autofun
 (
       employee_id        TINYINT SIGNED NOT NULL AUTO_INCREMENT
     , employee_name      VARCHAR(150) NOT NULL
     , employee_level     TINYINT UNSIGNED NULL
     , PRIMARY KEY (employee_id)
 );

 INSERT INTO autofun (employee_name, employee_level)
 VALUES
  ('Joe Smith', 0)
, ('Tim Bradford', 1)
, ('Betty Stafford', 1);

Notice, I changed the data type for the employee_id field from an unsigned int to a signed tiny int. This was just to make the math a little easier for me as this brings the max value in the range down to a easy to remember (and type) 127. Last time we saw that if you inserted a row and provided a value for employee_id at the end of the range you can not insert another row without specifying a value for employee_id that is not already used in the table. If you insert a row and do not provide an employee_id, or provide 0, or provide NULL, the row will not insert. But the error is not that the value is out of range. Instead the error is that there is a conflict in the primary key. This is because MySQL is noticing that the next value in the sequence is outside the range and picking the closest value in the range and trying to insert it. But, I know that STRICT_ALL_TABLES mode makes MySQL less forgiving with range values as well as in other ways. So, I ran the following to see what would happen with script mode turned on.

mysql> USE moderndba;
 Reading table information for completion of table and column names
 You can turn off this feature to get a quicker startup with -A

 Database changed
 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 +-------------+----------------+----------------+
 3 rows in set (0.00 sec)

 mysql> SET sql_mode='STRICT_ALL_TABLES';
 Query OK, 0 rows affected (0.04 sec)

 mysql>  INSERT INTO autofun
     -> (employee_id, empoyee_name)
     -> VALUES
     -> (127, 'John Wiley');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |         127 | John Wiley     |           NULL |
 +-------------+----------------+----------------+
 4 rows in set (0.00 sec)

 mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 ERROR 1062 (23000): Duplicate entry '127' for key 'PRIMARY'

Oh well. So much for strict mode helping us get a better error message. It still is only chocking because of the duplicate key. I was hoping we would get a message that the next value was out of range, but no such luck.

Last time we saw that if you delete the last row and then perform an insert without specifying the employee_id that you will get the max value again. But while reading up more on the AUTO_INCREMENT attribute, I found this documentation on how it works on an InnoDB table. I thought this was interesting. So to test it our I ran the following statement to remove the last row in the table.

mysql> DELETE FROM autofun WHERE employee_id = 127;
 Query OK, 1 row affected (0.09 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 +-------------+----------------+----------------+
 3 rows in set (0.00 sec)

So last time when we performed the insert we would get an employee_id of 127. But what happens if the server bounces between the delete and the subsequent insert. To find out I exited out of the MySQL prompt and ran the following from the shell prompt.

[student@percona56 ~]$ sudo service mysql restart

Then after the restart I ran the following.

mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 +-------------+----------------+----------------+
 | employee_id | employee_name  | employee_level |
 +-------------+----------------+----------------+
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           4 | Amanda King    |           NULL |
 +-------------+----------------+----------------+
 4 rows in set (0.00 sec)

So if the server is restarted the next value in the sequence is recalculated based on the maximum value stored in the column. This could be both beneficial (if you need to resolve the situation where you have removed some values you want to reuse) or not so beneficial (if you are not managing your foreign key relationships carefully). I still have a lot more questions about the AUTO_INCREMENT attribute in MySQL, but I think this is enough for the moment. I hope you are finding this behavior just as interesting as I am.

The Great Wall of Maintenance

Originally Posted 9/23/2104

1411458148The Great Wall. A symbol of strength that has lasted for centuries. But how did it last? Was it the engineering that allowed it to weather the test of time? Was it the native rock that allowed it to survive? Was it the loving care of individuals that allowed it to stay intact?

I never really thought about these questions before I visited the Mutianyu section of the Great Wall on a recent trip to Beijing. It was my first time to Beijing and I knew that a trip to the wall was a must. I had not had time to research before my trip (which is not like me), so I was not sure what to expect. I thought I would be impressed by the engineering, or the scenery. Instead, as I walked the wall and saw all the new bricks replacing the original granite, I started to think about the power of maintenance.

What does this have to do with the modern DBA? It is another reminder that all things need maintenance. As the pace of IT increases, and Database Administrators focus more and more on delivering systems, we need to remember that long term success only comes with the care and feeding of what has been entrusted to us. As we deliver database systems we need to ensure that maintenance plans are part of the deliverable. All systems need to have regular index and statistical maintenance to ensure that the database engine has the information it needs to generate efficient plans and minimize the resources needed to execute them.

And we can’t deliver these maintenance plans in a set it and forget it mind set. The Mutianyu section of the Great Wall follows the ridge line of a mountain chain. It was originally built using the granite quarried near by. But, along the way they switched from using the granite to bricks made from what appeared to be concrete. Why did they make the switch? I don’t know for sure. But I know that if they continued to repair the wall using the mountain itself, they would eventually erode the foundation on which the wall was built. Similarly, if you put plans in place and do not monitor them the ever shifting environment of IT can make the foundation of your maintenance unsound.

To ensure that you avoid this problem, you need to monitor how long your maintenance plans are taking. Are they still completing in your allotted windows? Also, you need to check in with your customers periodically to ensure that the windows for maintenance haven’t shifted or, as more likely the case, shrunk. Also, how effective is your maintenance? Do you track run times for important quires throughout the day? If not, this information can help you decide if maintenance can be run less frequently if your windows do shrink.

Spend the time to build an effective maintenance strategy and a plan to monitor your maintenance and who knows, your system may still be running in 600 years. You’ll be out of support, but it might just still be running.

The Changing Role of the Database Administrator

Originally Published 10/31/2014

I have been doing this Database Administration gig, in one form or fashion, for more years than I care to admit.  In that time I have seen a lot of change.  Change in the tools and platforms I have supported, and change in the role that the Database Administrator plays in the development and operation of software systems.  But, it has felt like the pace of change has really sped up over the last 2-3 years.  I thought it would be good to go over some of the changes I think will have the largest impact on the Database Administrator and what I think it will mean to our role.

The Cloud

Probably the biggest change is the adoption of the cloud.  Whether the cloud is public, private, or a hybrid cloud infrastructure is fundamentally changing the way that applications are designed and built.  A lot of people looked at the cloud early on, through the eyes of old development models, and felt like it would just be a way for small companies to compete with larger ones without having to have the financial resources to build and maintain their own data centers.  While that is one advantage of the cloud (one equally advantages to large companies), it is not the largest advantage the cloud has to offer.  The smart companies are realizing that if you rethink the way you architect and build applications you can achieve levels of elasticity in your application which allows you to meet the ever changing market without having to maintain infrastructure to meet peak demand all the time.

There are a lot of things that go into making applications capable of achieving this level of elasticity.  If you are interested in learning more, a great starting point is the The Twelve-Factor App.  But a key piece is to integrate via API and not in the database.  A lot of legacy applications rely on the database as an integration point.  But modern applications see the database as just a place to persist data.  In many ways this reduces the complexity of databases and makes it easier for contributors, other than the Database Administrator, to serve as the expert on the data storage.

Most cloud vendors have also added Database as a Service offerings to make it easier for companies without Database Administrators on staff to leverage database platforms.  These services offer installation, patching, backups, monitoring and other services that would traditionally be performed by Database Administrators.  These services combined with the change in how applications are being built make it easier for a project to get off the ground without the help of a Database Administrator.

NoSQL

NoSQL, or Not Only SQL, is the name for a class of data stores that do not store data in the tabular relationships associated with traditional Relational Database Management Systems (RDBMS).  Some popular examples include Redis,MongoDB, Couchbase, and Cassandra.  Each serves a specific data storage niche, and allow you to meet your data storage needs without the complexities of a full blown RDBMS.  But while you give up some of the features of a RDBMS you gain a lot in simplicity.  NoSQL platforms are typically much easier to maintain and have fewer “moving parts” than a traditional RDBMS.  This, along with their ability to easily scale horizontally, makes them very popular when building applications for the cloud.

I have not seen the whole sale movement of data from Relational Database Management Systems to NoSQL platforms, but I have seen a lot of new applications leverage NoSQL as the only, or at least primary, data store.  As a side note, I have also seen a lot of noise lately about applications moving off of NoSQL as the needs of the application change.  Like any other tool NoSQL has its role and the industry is still figuring out where it is best leveraged.

Big Data

Unless you have been hiding under a rock, you have probably heard the term Big Data.  What exactly is Big Data is still open for some debate.  But you can think of Big Data as problems involving data that is either too large or complex to analyse using traditional techniques.  The data is “big” because of it’s volume, velocity, variability, or complexity.  Volume is the term used to describe the quantity of observations being analyzed.  Velocity is the speed with which new observations are being generated and need to be processed.  Variability is what is used to describe the level of inconsistency in the observations being analyzed.  And complexity is the term used to describe the overall complexity in gathering the observations and combining them to make useful information.

The tool that most companies are moving to for Big Data problems is Hadoop.  One of the big advantages of Hadoop over traditional database platforms is that you need not define the structure of the data prior to it being stored.  Instead, you can define its structure at the time of analysis.  This makes the collection and storage of data much quicker from a process perspective than with a traditional data warehouse.  It also is easily scaled out by adding additional nodes to a cluster.  These capabilities easily solve the volume, variability, and complexity associated with Big Data.  Schema design has historically been a a service most Database Administrators have been involved with.  With Hadoop, these services are less in demand.

Much like NoSQL, Hadoop is a platform that does not require a Database Administrator in order for it to run.  Large environments do benefit from administration however, so if a Database Administrator is not responsible someone will probably need to step in.

The Future DBA

So, what does all of this mean for the Database Administrator of the future?  Well, I don’t have a crystal ball, but I think it is clear that the role will have to change.  First, many tasks that have traditionally been staples of the DBA role will either cease to be needed, or move primarily to developers or general system administrators.  Backups, monitoring, and patching will be either fully automated, or so easy that anyone will be able to “click the button” to perform the actions.  Database code deployment will no longer be the purview of the Database Administrator as the deployment process will be fully automated in order to allow for the elasticity that the cloud offers.  So, what will be left for the Database Administrator?

First off, I see all the major database platforms coming out with much better scale out stories over the next several years.  Developers are moving to NoSQL for the simplicity of development, but one big reason they stay is because of the ability to scale out.  I see the market moving back more towards traditional database platforms once the scale out story is a better one.  Current NoSQL platforms are good for certain thing, but non of them can compete with traditional database platforms for flexibility of handling diverse workloads.  Here is a great article comparing some common NoSQL platforms and how they server different purposes at the Weather Co.

Secondly, no matter the future of NoSQL, Database Administrators will still be needed to address the hard to automate aspects of managing data platforms for the foreseeable future.  The biggest one of these is performance.  Database Management Systems are complex pieces of software and it is still easy for performance to change without any changes to application or database code.  Ad systems grow in size or use the performance characteristics shift.  Database Administrators will still be needed to help developers and system administrators understand and solve these issues.

I hope you have found this post interesting and I would love to hear your feedback.  What services do you see the Database Administrator of the future offering?  What other factors do you see shaping the role of the DBA?

Moving to WordPress

So I started the ModernDBA blog on Weebly and it served me well in the beginning.  But I wanted a little more control than they provided, so I am trying out WordPress to see how I like it.  I’ll be working on moving my existing blog posts over to the new site over the next week or two.  I apologize in advance for any inconvenience as I work through this migration or content and domain name.