OpenWorld 2017: MySQL Automatic Diagnostics: System, Mechanism, and Usage

Shangshun Lei and Lixun Peng from Alibaba Cloud discussed in this session a system they have built called CloudDBA to automate a lot of traditional DBA roles.  The system sounds pretty cool and is what my team should be aspiring to do.  But there was not a lot of information on the how, and a lot of the what for CloudDBA.  My notes from the session are:

  • Why CloudDBA
    • Reduce Costs –
      • 80% of time spent on finding root cause, optimizing performance, scaling hardware and resources
      • 20% on database platform
    • Focus your resources on business
    • Provide best technology
  • Architecture
    • Kafka/JStorm for log collection
    • Offline Data repository
      • Error log
      • slow log
      • audit log
      • cpu/ios/status
    • Offline diagnostics
      • Top SQL Analyss
      • Trx Analysis
      • SQL Review
      • Deadlock Analysis
    • Online diagnostics
      • Knowledge base – rule engine
      • Inference Engine – matches for conditions and runs execution to resolve or provide advise to users
    • Realtime Event and advise
      • Slave delay
      • Config Tuning
      • Active Session
      • Lock and Transaction
      • Resource
  • Rule Engine
    • Immediate detection of useful changes with low cost
    • Choose correct inference model
      • Database global status is mature and easy to get
      • High frequency monitoring to make sure no useful info is missed
      • Real time state change detection algorithms
      • Importance of database experience
  • Knowledge Base and inference engine
    • Ability to accumulate DBA experts’ experience in short time
    • Accurate issue detection & corresponding advice
  • Offline diagnosis
    • Audit log does matter
    • Record full SQLs for database
    • A feature of AliSQL, no performance impact
  • Transaction analyiss
    • uncommitted transactions
    • long transactions
    • long interval between transactions statements
    • big transactions
  • SQL review
    • how many types of sql
    • how many types of transactions
    • sqls or sequence in transaction is expected or not
    • scan rows, return rows, elapsed time and sql advise
  • Top SQL
    • top sql before optimize
    • help explain questions such as why my cpu is 100%
    • different statistics dimensions and performance metrics
  • SQL Advisor
    • Not a database optimizer
    • built outside of MySQL kernel
    • query rewriter
    • follow rules to create indexes that works for the lowest cost

An Introduction to Database as a Service with an emphasis on OpenStack using Trove – Percona Live 2015

My first real session of the conference was “An introduction to Database as a Service with an emphasis on OpenStack using Trove” by Amrith Kumar (Tesora) and Matt Griffin (Percona).  I had originally planned on attending “MySQL and Openstack deep dive” by Peter Boros (Percona), but after speaking with him last night in the vendor hall I realized that this session could have more immediate applicability to my team.

The session started with an overview of the cloud landscape which showed AWS and Microsoft at the top of the cloud market for Enterprises.  But there are a lot of other clouds that are getting evaluated and the market is growing fast and changing.  68% of enterprises have less than 20% of their application in a public cloud so a lot of room for growth.

Clouds provide a lot of value but this session concentrated on Speed, Low Cost, and Flexibility.  They pointed out that most of the cost savings come in the early phases of a project/application because you don’t have to oversize your infrastructure.  Database as a Service provides simplified database provisioning and the ability to manage at scale.  OpenStack and Trove help companies provide Database as a Service.

Next they provided a good overview of OpenStack and the services it provides.  Then they introduced Sahara (Hadoop as a Service) and Trove (Database as s Service).  Since this was a session on Trove there was very little information on Sahara.  Trove has to run ontop of OpenStack today, but there are discussions of breaking that dependency in the future.  Trove supports multiple databases (MySQL, PostgreSQL, MongoDB, Counchbase, ….).  It provides framework for higher level operations like backup and recovery, replication, clustering, HA, etc.  They are adding new database platforms with just about every release (twice a year release cycle).  The real value of Trove is that the configurations contributed are from industry experts.  Trove does not remove the need for expertise, but it frees organizations to roll out quicker because you can get expertise as you go.  This helps with dev ops because you don’t have to have a platform expert on each project.

Trove supports a number of different database platforms, but they are all at different levels of maturity.  There is some loss of control because o the standardization of instances, but Trove is open source so you can modify it as needed.  Configuration groups in Trove is one way you can modify the default behaviors.  Trove will add Galera clustering in Liberty.  Oracle support in Trove can be purchased.  Biggest advantage provided by Trove and OpenStack is a common API to provision resources from private, to public, to hybrid cloud deployments.

All in all this looks like a good project that we should dive into to see how it can increase our effectiveness.