OpenWorld 2017: MySQL Automatic Diagnostics: System, Mechanism, and Usage

Shangshun Lei and Lixun Peng from Alibaba Cloud discussed in this session a system they have built called CloudDBA to automate a lot of traditional DBA roles.  The system sounds pretty cool and is what my team should be aspiring to do.  But there was not a lot of information on the how, and a lot of the what for CloudDBA.  My notes from the session are:

  • Why CloudDBA
    • Reduce Costs –
      • 80% of time spent on finding root cause, optimizing performance, scaling hardware and resources
      • 20% on database platform
    • Focus your resources on business
    • Provide best technology
  • Architecture
    • Kafka/JStorm for log collection
    • Offline Data repository
      • Error log
      • slow log
      • audit log
      • cpu/ios/status
    • Offline diagnostics
      • Top SQL Analyss
      • Trx Analysis
      • SQL Review
      • Deadlock Analysis
    • Online diagnostics
      • Knowledge base – rule engine
      • Inference Engine – matches for conditions and runs execution to resolve or provide advise to users
    • Realtime Event and advise
      • Slave delay
      • Config Tuning
      • Active Session
      • Lock and Transaction
      • Resource
  • Rule Engine
    • Immediate detection of useful changes with low cost
    • Choose correct inference model
      • Database global status is mature and easy to get
      • High frequency monitoring to make sure no useful info is missed
      • Real time state change detection algorithms
      • Importance of database experience
  • Knowledge Base and inference engine
    • Ability to accumulate DBA experts’ experience in short time
    • Accurate issue detection & corresponding advice
  • Offline diagnosis
    • Audit log does matter
    • Record full SQLs for database
    • A feature of AliSQL, no performance impact
  • Transaction analyiss
    • uncommitted transactions
    • long transactions
    • long interval between transactions statements
    • big transactions
  • SQL review
    • how many types of sql
    • how many types of transactions
    • sqls or sequence in transaction is expected or not
    • scan rows, return rows, elapsed time and sql advise
  • Top SQL
    • top sql before optimize
    • help explain questions such as why my cpu is 100%
    • different statistics dimensions and performance metrics
  • SQL Advisor
    • Not a database optimizer
    • built outside of MySQL kernel
    • query rewriter
    • follow rules to create indexes that works for the lowest cost

OpenWorld 2017: Kafka, Data Streaming and Analytic Microservices

My company has been wrestling with the need for a more scalable and standardized method of integrating applications for a while.  We have taken steps toward moving to an enterprise data bus to improve how we do this.  Stewart Bryson’s session was a great example of how Kafka can be used to do this.  While his focus was on BI and data integration, he did a great job of showing how Kafka improves the story for other use cases as well.   My notes from his session are:


  • History Lesson
    • Traditional Data Warehouse
      • ETL -> Data Warehouse -> Analytics
    • All analytic problems can’t be solved by this paradigm
      • Realtime events
      • Mobile Analytics
      • Search
      • Machine Learning – Old BI prejudices pattern matching
  • Analytics is not just people sitting in front of dashboards making decisions
  • Microservices
    • “The microservice architecture is an approach to developing a single application as a suite of small services, each running its own process and communicating with lightweight mechanisms, often an HTTP resource API.  These services are built around business capabilities and independently deployable by fully automated deployment machinery.” – Martin Fowler
    • Opposite of the Monolith
    • separate but connected
    • API contracts
  • Proposed Kafka as single place to ingest your data
  • Use Kafka as our data bus
  • “Analytics applications” are different than “BI platforms”
    • Data is the product
    • Needs SLAs and up times (internal or external)
  • Apache Kafka
    • Kafka – not state of the table but more like the redo log (true data of the database)
    • capturing events not state
    • sharded big data solution
    • topics – schemaless “table”
    • schema on read not on write
      • If you have schema at all
      • just bytes on a message
      • you choose the serialization
    • topics replicated by partitions across multiple nodes (named broker in kafka)
    • confluent is open sourced kafka with add ons (also have paid versions)
    • producers put data into kafak
    • consumers read data from kafka
    • producers and consumers are decoupled (no need to know who consumes the data to put it in)
  • kafka connect
    • framework for writing connectors for kafka
    • sources and sinks (producers and consumers)
  • kafka knows where you are at in the stream using a consumer group
  • Can reset your offsets to reload data
  • makes reloading data warehouse much easier
  • makes testing new platforms easier
  • Event Hub Cloud service – Oracle’s kafka in the cloud

Thanks to Robin Moffatt (@rmoff) for correcting the notes around schema.  He pointed out that with Kafka it is just bytes on a message, so you choose the serialization format and thus whetehr there is a schema.

OpenWorld 2017: Getting the Most out of Oracle Data Guard!

We have been working to roll out Data Guard in our Oracle environment for a few years now.   The issues have not been technical, so much as other priorities getting in the way.  This session grabbed my attention as I was interested to see other use cases (other than HA and DR) that we could solve with Data Guard.  Ludovico Caldara not only put on a great session, but he did an outstanding job of making the materials available prior to the session on his blog.

Because he did such a great job, I won’t bore you with a lot of the details.  But he shows you how to use Data Guard to perform online database migrations, use the standby for reporting, and perform database clones.  It is all really cool stuff that my team could leverage if we finish rolling our Data Guard.


OpenWorld 2017: Best Practices for Getting Started with Oracle Database In-Memory 12c

Kicked off Oracle Open World with a great session on Oracle Database In-Memory.  The speaker (xinghua wei) did a great job of presenting both the technology itself as well as use cases and things to watch out for.  My team has started to look at the future of Exadata inside of our support model.  We have loved the performance we have gotten out of the appliances, but the coordination required to organize patching with Oracle and the hyper consolidation needed to justify the additional costs of running Exadata.

The biggest push back that I get from my team (and customer) is concerns with analytic workloads moving off the Exadata.  While it may not be the solution, Oracle’s In-Memory is one tool we have that could help with those concerns.  In this session I learned some new things that I did not know about In-Memory that will help us to better understand how it might play in our environment.  First, the feature is an extra sku, and a quick search gave $23K per CPU unit as the book price.  While not exactly cheap, it sure beats the heck out buying a whole Exadata.  More notes are included below.  I’m still not sure if this is something that we will use, but it’s nice to have a better understanding of the technology as we look at options other than Exadata.


  • In-Memory stores data in Columnar Format
  • Data exists along with traditional row based tables
  • 20% impact in TPS on row-based table associated with In-memory tables
    • Can be partially mitigated by removal of indexes on the tables that were used to support OLAP workloads
    • Results will vary based on a number of factors: design, workloads, etc.
  • Works with RAC and Data Guard
  • Differentiator vs. SAP HANA, SQL Server, DB2
    • 2 formats in 1 database (row and columnar)
    • 100% application transparency

Percona Live Data Performance Conference 2016

I’m sitting in the San Fransisco airport trying to get enough caffein in me to feel alive again.  A sure sign of a great conference.  Last year I live blogged the conference but this year there was so much stuff that I wanted to absorb that I failed to make time for that.  There were a ton of great sessions.  It did feel like there were fewer folks in attendance, but that only made the learning feel more personal.  This year the folks at Percona opened up the conference to encourage content across all platforms.  I wonder if that made some of the MySQL folks worry that the conference didn’t have anything to offer them.  I can tell you that there was plenty of great MySQL content.

Working in an enterprise database administration shop, the expanded content was perfect for me.  While we have historically only supported 2 platforms, we currently support 4 (about to be 5) and are constantly being pushed to support more.  This is a natural extension of modern application design and have a conference where we can hit on several platforms really helps.  We all have tight budgets these days and its almost like having 3-4 conferences in one.  But the soul of the conference is still very much MySQL.  With content geared towards first time DBAs through sessions on the InnoDB and the new MySQL Document Store, you can’t go wrong with this conference to expand your MySQL knowledge.

I am coming home with a bunch of great ideas.  The hard part now is to pick which things to work on implementing first.  Some will be changes my team can make ourselves and others will need to involve change in other teams as well.  I was very intrigued by the “chatops” idea.  I will try to see how we can integrate the concept into our operations.  I similarly was interested in the rise of integrating chat bots into database automation.  We are hitting database automation real hard right now.  Thus far our focus has been on self service APIs, but I think we will add a natural text interface into these APIs to further improve the experience for our customers.

Well, I am out of coffee so I need to run and refill the cup.  More to come on Percona Live and all the projects that come out of it.

It’s Been Awhile

We’ll life happens, and like everyone else I’ve been busy.  I have, however, been working on some exciting stuff so I thought it was time to get back to blogging to share all the cool stuff going on.  First, I have been working in management.  This means that I have to steal cycles to do the techie stuff these days.  Second, my son has started playing travel soccer which stills most of my early evenings and a good part of my weekends.  I do love to see him compete however, so no complaints here.  All this is to say that I have had to be more deliberate in what I have been working on.

I plan for this to be the first in a series of blogs and I thought it best to go over some of the topics that have been taking up my time.  I will list out both technical and non-technical issues that I have been wrestling with and provide a little context.  My hope is this information with help me to prioritize my subsequent blog posts as well as give some context to them when they come out.


Last year, my team stepped up to the plate and went from zero to fully supporting Highly Available MySQL in about 9 months.  We spent time learning the platform, sharing knowledge and testing different configurations.  We also worked with Percona to have an independent expert come out and verify that we were on the right track.  When it was all said and done we not only have a handful of production applications on stand-alone MySQL instances and an application running on top of Galera, we also realize that MySQL provides us the ability to control data platform much better than our primary platforms (Oracle and Microsoft SQL Server).  My technical contribution to this project was providing a one day hands on Galera training for the team so they could better understand that technology.


After spending last year rolling out MySQL,  I had planned to spend this year cleaning up some remaining issues and working on standardizing our Oracle and SQL Server environments more.  Life has a way of changing my priorities.  Our company had a major initiative that required the use of MongoDB as the data store.  I had been reading up on MongoDB and had attended some local user group meetings, but as a team we were basically at ground zero as far as MongoDB was concerned.  Once again, we leaned on Percona to help us get some replica sets up in support of the aggressive project goals, and I had two of my folks work through the wonderful online training provided by MongoDB Inc.

We are still very much in the middle of this project, but so far I am excited by what I have seen.  I think that developers here are going to love somethings about MongoDB, and it will help us solve some issues we have in being more agile with our data stores.

Database as a Service

Another big initiative my team has going on right now is transitioning to a true Database as a Service provider.  We currently have a self-help website I built years ago that lets our customers spin up development and test databases, copy databases between instances, as well as deploy code without worrying about the documentation needed in the CMDB (the application handles all that).  It has served us well, but it was built without an API.  In order to integrate with other automation initiatives within the company we need to provide all this and more via an API.

We looked at Cloud Foundry and Trove as options to “buy” our way out to this.  But with all the other change going on we decided that it would be best to implement our own API and allowing others to plug in as needed.  This allows us to better control the back-end services and keep our existing investments in monitoring, backups, and other processes.  I am working with another member of my team to build a Restful API for this.  We have chosen to leverage NodeJS as our development platform.  For a “.Net” guy this is a steep learning curve, but so far I am digging it.

I’ll try to keep you up to speed on these areas.  I apologize in advance if post come at you in a seemingly random way from here on out.  I just hope to make the time to share and with all this going on the topics are bound to blend together.


The Froogle DBA – Management Data Warehouse Part 1

Originally Posted 8/26/2014

So far we have discussed two of the three core components of the Froogle DBA’s toolbox (Central Management Servers and Extended Properties).  With these two features we can manage systems at scale by improving our ability to apply scripts to all or a portion of our environment, make our environment more self-documenting, and improve communication between the members of our DBA team.  In my last post I showed you how you can populate folders in CMS using the Extended Properties you have defined on your databases using PowerShell.  There is an issue with this approach however.  It makes it more difficult (not impossible) to answer the question “Which databases don’t have Extended Properties set?”.  This is a limitation that the Management Data Warehouse can help us overcome.

The Management Data Warehouse (MDW) is a database that collects data sent by data collectors running on the same, or different, instance(s) of SQL Server.  That is to say that it is really just like any other database, except this database is populated by Data Collectors configured in Instances of SQL Server to gather information of interest.  Out of the box, SQL Server provides Data Collectors that can be configured to collect information of interest in troubleshooting performance problems in SQL Server.  It also ships with some reports designed to run inside of SQL Server Management Studio to analyze this data.

Setting up Management Data Warehouse involves two steps.  First, we need to create the Management Data Warehouse database on the server we want to consolidate data in our environment on.  For the Froogle DBA we will use the same instance of SQL Server we use to host Central Management Server to host the Management Data Warehouse database.  We will do this to facilitate some of the other integrations that make the Froogle DBA framework easier to support and maintain.  Secondly, on each of the instances you want to gather data on, you will need to configure Data Collection to report to the Management Data Warehouse database.

In order to configure the Management Data Warehouse database you will need to follow the images below.

Now that we have a place to put data we collect, we need to tell SQL Server to collect data we can later use.  In order to do this we need to configure Data Collection.  The Data Collection is going to be run as jobs on the instances of SQL Server that are collecting.  As such you need to be aware of the account that the SQL Server Agent is running on each of the machines and ensure the accounts have the appropriate permissions in the MDW database, or you can configure an account for Data Collection to run as and use proxies to run Data Collection as that account.  Some data collection relies on caching data in a folder in the file system.  Go ahead and create a folder for this purpose (i.e., C:\MDWCache) and then make sure that the account that is going to perform Data Collection has full control on the folder.  In order to configure collection follow the steps in the following images.

At this point you have Management Data Warehouse up and running and the default collectors collecting data.  Don’t forget to configure Data Collection following the second set of images on all the instances you wish to manage.  We will show how to use the data collected and some custom collectors important to the Froogle DBA later.