OpenWorld 2017: MySQL Automatic Diagnostics: System, Mechanism, and Usage

Shangshun Lei and Lixun Peng from Alibaba Cloud discussed in this session a system they have built called CloudDBA to automate a lot of traditional DBA roles.  The system sounds pretty cool and is what my team should be aspiring to do.  But there was not a lot of information on the how, and a lot of the what for CloudDBA.  My notes from the session are:

  • Why CloudDBA
    • Reduce Costs –
      • 80% of time spent on finding root cause, optimizing performance, scaling hardware and resources
      • 20% on database platform
    • Focus your resources on business
    • Provide best technology
  • Architecture
    • Kafka/JStorm for log collection
    • Offline Data repository
      • Error log
      • slow log
      • audit log
      • cpu/ios/status
    • Offline diagnostics
      • Top SQL Analyss
      • Trx Analysis
      • SQL Review
      • Deadlock Analysis
    • Online diagnostics
      • Knowledge base – rule engine
      • Inference Engine – matches for conditions and runs execution to resolve or provide advise to users
    • Realtime Event and advise
      • Slave delay
      • Config Tuning
      • Active Session
      • Lock and Transaction
      • Resource
  • Rule Engine
    • Immediate detection of useful changes with low cost
    • Choose correct inference model
      • Database global status is mature and easy to get
      • High frequency monitoring to make sure no useful info is missed
      • Real time state change detection algorithms
      • Importance of database experience
  • Knowledge Base and inference engine
    • Ability to accumulate DBA experts’ experience in short time
    • Accurate issue detection & corresponding advice
  • Offline diagnosis
    • Audit log does matter
    • Record full SQLs for database
    • A feature of AliSQL, no performance impact
  • Transaction analyiss
    • uncommitted transactions
    • long transactions
    • long interval between transactions statements
    • big transactions
  • SQL review
    • how many types of sql
    • how many types of transactions
    • sqls or sequence in transaction is expected or not
    • scan rows, return rows, elapsed time and sql advise
  • Top SQL
    • top sql before optimize
    • help explain questions such as why my cpu is 100%
    • different statistics dimensions and performance metrics
  • SQL Advisor
    • Not a database optimizer
    • built outside of MySQL kernel
    • query rewriter
    • follow rules to create indexes that works for the lowest cost

OpenWorld 2017: Kafka, Data Streaming and Analytic Microservices

My company has been wrestling with the need for a more scalable and standardized method of integrating applications for a while.  We have taken steps toward moving to an enterprise data bus to improve how we do this.  Stewart Bryson’s session was a great example of how Kafka can be used to do this.  While his focus was on BI and data integration, he did a great job of showing how Kafka improves the story for other use cases as well.   My notes from his session are:


  • History Lesson
    • Traditional Data Warehouse
      • ETL -> Data Warehouse -> Analytics
    • All analytic problems can’t be solved by this paradigm
      • Realtime events
      • Mobile Analytics
      • Search
      • Machine Learning – Old BI prejudices pattern matching
  • Analytics is not just people sitting in front of dashboards making decisions
  • Microservices
    • “The microservice architecture is an approach to developing a single application as a suite of small services, each running its own process and communicating with lightweight mechanisms, often an HTTP resource API.  These services are built around business capabilities and independently deployable by fully automated deployment machinery.” – Martin Fowler
    • Opposite of the Monolith
    • separate but connected
    • API contracts
  • Proposed Kafka as single place to ingest your data
  • Use Kafka as our data bus
  • “Analytics applications” are different than “BI platforms”
    • Data is the product
    • Needs SLAs and up times (internal or external)
  • Apache Kafka
    • Kafka – not state of the table but more like the redo log (true data of the database)
    • capturing events not state
    • sharded big data solution
    • topics – schemaless “table”
    • schema on read not on write
      • If you have schema at all
      • just bytes on a message
      • you choose the serialization
    • topics replicated by partitions across multiple nodes (named broker in kafka)
    • confluent is open sourced kafka with add ons (also have paid versions)
    • producers put data into kafak
    • consumers read data from kafka
    • producers and consumers are decoupled (no need to know who consumes the data to put it in)
  • kafka connect
    • framework for writing connectors for kafka
    • sources and sinks (producers and consumers)
  • kafka knows where you are at in the stream using a consumer group
  • Can reset your offsets to reload data
  • makes reloading data warehouse much easier
  • makes testing new platforms easier
  • Event Hub Cloud service – Oracle’s kafka in the cloud

Thanks to Robin Moffatt (@rmoff) for correcting the notes around schema.  He pointed out that with Kafka it is just bytes on a message, so you choose the serialization format and thus whetehr there is a schema.

OpenWorld 2017: Getting the Most out of Oracle Data Guard!

We have been working to roll out Data Guard in our Oracle environment for a few years now.   The issues have not been technical, so much as other priorities getting in the way.  This session grabbed my attention as I was interested to see other use cases (other than HA and DR) that we could solve with Data Guard.  Ludovico Caldara not only put on a great session, but he did an outstanding job of making the materials available prior to the session on his blog.

Because he did such a great job, I won’t bore you with a lot of the details.  But he shows you how to use Data Guard to perform online database migrations, use the standby for reporting, and perform database clones.  It is all really cool stuff that my team could leverage if we finish rolling our Data Guard.


OpenWorld 2017: Best Practices for Getting Started with Oracle Database In-Memory 12c

Kicked off Oracle Open World with a great session on Oracle Database In-Memory.  The speaker (xinghua wei) did a great job of presenting both the technology itself as well as use cases and things to watch out for.  My team has started to look at the future of Exadata inside of our support model.  We have loved the performance we have gotten out of the appliances, but the coordination required to organize patching with Oracle and the hyper consolidation needed to justify the additional costs of running Exadata.

The biggest push back that I get from my team (and customer) is concerns with analytic workloads moving off the Exadata.  While it may not be the solution, Oracle’s In-Memory is one tool we have that could help with those concerns.  In this session I learned some new things that I did not know about In-Memory that will help us to better understand how it might play in our environment.  First, the feature is an extra sku, and a quick search gave $23K per CPU unit as the book price.  While not exactly cheap, it sure beats the heck out buying a whole Exadata.  More notes are included below.  I’m still not sure if this is something that we will use, but it’s nice to have a better understanding of the technology as we look at options other than Exadata.


  • In-Memory stores data in Columnar Format
  • Data exists along with traditional row based tables
  • 20% impact in TPS on row-based table associated with In-memory tables
    • Can be partially mitigated by removal of indexes on the tables that were used to support OLAP workloads
    • Results will vary based on a number of factors: design, workloads, etc.
  • Works with RAC and Data Guard
  • Differentiator vs. SAP HANA, SQL Server, DB2
    • 2 formats in 1 database (row and columnar)
    • 100% application transparency

Percona Live Data Performance Conference 2016

I’m sitting in the San Fransisco airport trying to get enough caffein in me to feel alive again.  A sure sign of a great conference.  Last year I live blogged the conference but this year there was so much stuff that I wanted to absorb that I failed to make time for that.  There were a ton of great sessions.  It did feel like there were fewer folks in attendance, but that only made the learning feel more personal.  This year the folks at Percona opened up the conference to encourage content across all platforms.  I wonder if that made some of the MySQL folks worry that the conference didn’t have anything to offer them.  I can tell you that there was plenty of great MySQL content.

Working in an enterprise database administration shop, the expanded content was perfect for me.  While we have historically only supported 2 platforms, we currently support 4 (about to be 5) and are constantly being pushed to support more.  This is a natural extension of modern application design and have a conference where we can hit on several platforms really helps.  We all have tight budgets these days and its almost like having 3-4 conferences in one.  But the soul of the conference is still very much MySQL.  With content geared towards first time DBAs through sessions on the InnoDB and the new MySQL Document Store, you can’t go wrong with this conference to expand your MySQL knowledge.

I am coming home with a bunch of great ideas.  The hard part now is to pick which things to work on implementing first.  Some will be changes my team can make ourselves and others will need to involve change in other teams as well.  I was very intrigued by the “chatops” idea.  I will try to see how we can integrate the concept into our operations.  I similarly was interested in the rise of integrating chat bots into database automation.  We are hitting database automation real hard right now.  Thus far our focus has been on self service APIs, but I think we will add a natural text interface into these APIs to further improve the experience for our customers.

Well, I am out of coffee so I need to run and refill the cup.  More to come on Percona Live and all the projects that come out of it.

It’s Been Awhile

We’ll life happens, and like everyone else I’ve been busy.  I have, however, been working on some exciting stuff so I thought it was time to get back to blogging to share all the cool stuff going on.  First, I have been working in management.  This means that I have to steal cycles to do the techie stuff these days.  Second, my son has started playing travel soccer which stills most of my early evenings and a good part of my weekends.  I do love to see him compete however, so no complaints here.  All this is to say that I have had to be more deliberate in what I have been working on.

I plan for this to be the first in a series of blogs and I thought it best to go over some of the topics that have been taking up my time.  I will list out both technical and non-technical issues that I have been wrestling with and provide a little context.  My hope is this information with help me to prioritize my subsequent blog posts as well as give some context to them when they come out.


Last year, my team stepped up to the plate and went from zero to fully supporting Highly Available MySQL in about 9 months.  We spent time learning the platform, sharing knowledge and testing different configurations.  We also worked with Percona to have an independent expert come out and verify that we were on the right track.  When it was all said and done we not only have a handful of production applications on stand-alone MySQL instances and an application running on top of Galera, we also realize that MySQL provides us the ability to control data platform much better than our primary platforms (Oracle and Microsoft SQL Server).  My technical contribution to this project was providing a one day hands on Galera training for the team so they could better understand that technology.


After spending last year rolling out MySQL,  I had planned to spend this year cleaning up some remaining issues and working on standardizing our Oracle and SQL Server environments more.  Life has a way of changing my priorities.  Our company had a major initiative that required the use of MongoDB as the data store.  I had been reading up on MongoDB and had attended some local user group meetings, but as a team we were basically at ground zero as far as MongoDB was concerned.  Once again, we leaned on Percona to help us get some replica sets up in support of the aggressive project goals, and I had two of my folks work through the wonderful online training provided by MongoDB Inc.

We are still very much in the middle of this project, but so far I am excited by what I have seen.  I think that developers here are going to love somethings about MongoDB, and it will help us solve some issues we have in being more agile with our data stores.

Database as a Service

Another big initiative my team has going on right now is transitioning to a true Database as a Service provider.  We currently have a self-help website I built years ago that lets our customers spin up development and test databases, copy databases between instances, as well as deploy code without worrying about the documentation needed in the CMDB (the application handles all that).  It has served us well, but it was built without an API.  In order to integrate with other automation initiatives within the company we need to provide all this and more via an API.

We looked at Cloud Foundry and Trove as options to “buy” our way out to this.  But with all the other change going on we decided that it would be best to implement our own API and allowing others to plug in as needed.  This allows us to better control the back-end services and keep our existing investments in monitoring, backups, and other processes.  I am working with another member of my team to build a Restful API for this.  We have chosen to leverage NodeJS as our development platform.  For a “.Net” guy this is a steep learning curve, but so far I am digging it.

I’ll try to keep you up to speed on these areas.  I apologize in advance if post come at you in a seemingly random way from here on out.  I just hope to make the time to share and with all this going on the topics are bound to blend together.


More MySQL Auto Increment Fun

Originally Posted 10/21/2014

Last blog post I started to explore some of, what I found to be, the interesting behaviors surrounding the AUTO_INCREMENT attribute in MySQL.  With strict mode not set it does some really strange things.  Like allowing you to add the max value in the range over and over again as long as you delete the record that used to hold that value.  This could cause some pretty interesting issues in applications that are not storing their foreign key relationships in the database.  But I wanted to see how strict mode would impact the behavior.  So I rebuilt the environment using the following script file.


 USE moderndba;


     , employee_name      VARCHAR(150) NOT NULL
     , employee_level     TINYINT UNSIGNED NULL
     , PRIMARY KEY (employee_id)

 INSERT INTO autofun (employee_name, employee_level)
  ('Joe Smith', 0)
, ('Tim Bradford', 1)
, ('Betty Stafford', 1);

Notice, I changed the data type for the employee_id field from an unsigned int to a signed tiny int. This was just to make the math a little easier for me as this brings the max value in the range down to a easy to remember (and type) 127. Last time we saw that if you inserted a row and provided a value for employee_id at the end of the range you can not insert another row without specifying a value for employee_id that is not already used in the table. If you insert a row and do not provide an employee_id, or provide 0, or provide NULL, the row will not insert. But the error is not that the value is out of range. Instead the error is that there is a conflict in the primary key. This is because MySQL is noticing that the next value in the sequence is outside the range and picking the closest value in the range and trying to insert it. But, I know that STRICT_ALL_TABLES mode makes MySQL less forgiving with range values as well as in other ways. So, I ran the following to see what would happen with script mode turned on.

mysql> USE moderndba;
 Reading table information for completion of table and column names
 You can turn off this feature to get a quicker startup with -A

 Database changed
 mysql> SELECT *
     -> FROM autofun;
 | employee_id | employee_name  | employee_level |
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 3 rows in set (0.00 sec)

 mysql> SET sql_mode='STRICT_ALL_TABLES';
 Query OK, 0 rows affected (0.04 sec)

 mysql>  INSERT INTO autofun
     -> (employee_id, empoyee_name)
     -> VALUES
     -> (127, 'John Wiley');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 | employee_id | employee_name  | employee_level |
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |         127 | John Wiley     |           NULL |
 4 rows in set (0.00 sec)

 mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 ERROR 1062 (23000): Duplicate entry '127' for key 'PRIMARY'

Oh well. So much for strict mode helping us get a better error message. It still is only chocking because of the duplicate key. I was hoping we would get a message that the next value was out of range, but no such luck.

Last time we saw that if you delete the last row and then perform an insert without specifying the employee_id that you will get the max value again. But while reading up more on the AUTO_INCREMENT attribute, I found this documentation on how it works on an InnoDB table. I thought this was interesting. So to test it our I ran the following statement to remove the last row in the table.

mysql> DELETE FROM autofun WHERE employee_id = 127;
 Query OK, 1 row affected (0.09 sec)

 mysql> SELECT *
     -> FROM autofun;
 | employee_id | employee_name  | employee_level |
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 3 rows in set (0.00 sec)

So last time when we performed the insert we would get an employee_id of 127. But what happens if the server bounces between the delete and the subsequent insert. To find out I exited out of the MySQL prompt and ran the following from the shell prompt.

[student@percona56 ~]$ sudo service mysql restart

Then after the restart I ran the following.

mysql> INSERT INTO autofun (empoyee_name)
     -> VALUES ('Amanda King');
 Query OK, 1 row affected (0.03 sec)

 mysql> SELECT *
     -> FROM autofun;
 | employee_id | employee_name  | employee_level |
 |           1 | Joe Smith      |              0 |
 |           2 | Tim Bradford   |              1 |
 |           3 | Betty Stafford |              1 |
 |           4 | Amanda King    |           NULL |
 4 rows in set (0.00 sec)

So if the server is restarted the next value in the sequence is recalculated based on the maximum value stored in the column. This could be both beneficial (if you need to resolve the situation where you have removed some values you want to reuse) or not so beneficial (if you are not managing your foreign key relationships carefully). I still have a lot more questions about the AUTO_INCREMENT attribute in MySQL, but I think this is enough for the moment. I hope you are finding this behavior just as interesting as I am.