Percona Live 2015 Wrap Up

With Percona Live 2015 in the books I thought I should take the time to gather my thoughts.  There was a ton of information shared at the conference.  We sent 3 people to the main conference and one to the new MySQL 101 conference and we still had to leave excellent presentations on the table.  But we got what we came to get.  We feel more comfortable with MySQL in general and with Galera in particular.  We are rolling out a Percona XtraDB Cluster soon and not only did this conference clear up some questions, it also helped us tweak our architecture to provide better availability for our customers.

I could go on and on about all the technical knowledge we gained here, but I have already posted notes on each of the sessions I attended.  But beyond the technical, I was impressed by the vibrant and innovated community surrounding MySQL.  We spoke to people solving some of the most challenging OLTP workloads out there.  Not only did they show us that MySQL can support these workloads, they showed us how MySQL is better suited than proprietary database platforms.  MySQL gives you the ability to scale at a cost point that proprietary RDBMs can’t.

I know some will say that it takes application changes in order reach these levels of scale with MySQL.  This is true today, but I saw several presentations where the MySQL community is working on solving this problem and developing a layer of abstraction that “traditional” applications can connect to and not have to worry about the complexity.  In fact, MariaDB‘s MaxScale is already providing this type of functionality.

Heading back home and thinking about how to use all this knowledge for good I realize that I will have a lot of resistance.  We have a lot invested in our existing proprietary database platforms.  But talking to folks who have been where I sit, I realized this week that if we don’t adapt to speed up the rate of innovation we will become irrelevant.  I for one plan to evolve!

My Session Notes

JSON support in MySQL 5.7 – Percona Live 2015

I attended the “JSON support in MySQL 5.7” session given by Evgeny Potemkin and Manyi Lu from Oracle.  This was my last session of the conference and my head was swimming with knowledge at this point.  In 5.7 they are adding json data type, json functions and the ability to index json data.  Before your options were to store json as either text or binary but you had to manage all the validation, parsing, etc.  The new JSON datatype is

  • optimized for read intensive workload
  • parse and validation on insert only
  • dictionary (sorted object keys, fast access to array cells by index)
  • In-place updated
  • Smart size 64K & 4G
  • UTF8 encoding
  • Supports all native JSON types (numer, string, bool, object, arrays, etc.)
  • Also supports Date, time, datetime, timestamp, and others

They gave some sample inserts that I will need to grab from their slides.  JSON comparator has polymorphic behavior and provides seamless and consistent comparison.  So you can compare JSON to JSON or JSON to SQL.  Different data types will always be non-equal so no automatic type conversion.  They then showed a sample select using the JSON datatype.  They also have introduced the concept of JSON path which is a way to identify how to get JSON data.  The details are on the slides and it was too much to record so I will have to grab it later for the developers.  But $ is used for the document root, [] define an array, and they use * for a search.  Other functions

  • JSN_VALID()
  • JSN_TYPE()
  • JSN_KEYS()
  • JSN_LENGTH()
  • JSN_DEPTH()
  • JSN_CONTAINS_PATH()
  • JSN_REMOVE()
  • JSN_APPEND()
  • JSN_SET()
  • JSN_INSERT()
  • JSN_REPLACE()
  • JSN_MERGE()
  • JSN_ARRAY()
  • JSN_OBJECT()
  • JSN_EXTRACT()
  • JSN_SEARCH()
  • JSN_QUOTE()
  • JSN_UNQUOTE()

They then showed how you can use a select to return the data as JSON using these function (again need to refer to slides).  They will support both Stored and Virtual indexes on JSON data.  Primary index must be stored.  Stored indexes can be BTREE, FTs, or GIS and Virtual can only be BTREE.  They plan to add Instant ALTER to Virtual indexes soon.  They have a bunch more with JSON on the roadmap like Full-text indexing and GIS indexes.

Incremental backups available with Xtrabackup and Percona Server – Percona Live 2015

In the session “Incremental backups available with Xtrabackup and Percona Server” Vladislav Lesin and George Lorch from Percona went over differential backups.  There are three ways to get the delta; full scan, use redo logs, or a combination of the two.  For full scan you scan all pages and backup any pages with last change LSN newer than a specific LSN.  Advantages are that you only store changed pages and there are no server changes required.  The downside if you have the overhead of a full scan. The redo logs can be used to restore a database to/from a specific point since it has the change data.  Advantage of using redo log are that there is no full scan.  The downsize is the space consumed.

In the combined method they store the LSN information on pages in the server in a RB-tree utilizing a separate thread that checks the redo logs on checkpoint.  This is to avoid the full scan.  They shared an edge case where size can be an issue with this approach.

How DevOps Met Gaming Needs – Percona Live 2015

I attended the “How DevOps Met Gaming Needs” session by Sean Chighizola and Peter Garbes at Big Fish Games.  They talked about the migration process to a DevOps model.  They went over the history of Big Fish Games which started in 2002 with PC/Mac games.  They use Akimai for their delivery.  They use MySQL to run their site with physicall partitioning by feature/service.  They used a the following technologies, Nagios, cacti, php, perl, subversion, bash scripts, etc.  Their releases were site down deployments with all hands on deck.

The transition was hard to manage.  Change was difficult, there has to be a driver.  The addition of the free to play market was the driver for Big Fish.  They used Fairway Solitaire as an example of a game the made the transition from the digital download to the free to play model.  Even after the migration it would take 20-120 minutes to push out a change.  A change was pushed every day.

Big Fish moved their daily courses to a self-service process.  Was very successful.  But it was not all roses.  They had a lot of non standard changes that did not match their expected workflow.  They saw the DevOps model as a way to solve these residual pains.  They leveraged Jez Humble’s Continuous Delivery book as a guide.  They were able to increase velocity while maintaining team size.  They also were able to decrease failure rate.

They discovered that giving developers operational responsibility will make them think like operators.  They use MEM 3.0 to provide self-service with Query Analyzer being a key driver for that decision.  IT is not in the business of provisioning, we are in the business of engineering processes which put developers in control.  They also went over some MySQL issues they have run into.  I’ll need to grab their slides to get the links to dive in deeper.

They have been moving from Cacti to Graphite.  They have not yet implemented database changes but have a proof of concept in place.  They restrict the types of changes allowed and framework for where to put scripts.

Encrypting MySQL data at Google – Percona Live 2015

I attended the “Encrypting MySQL data at Google” session with Jeremy Cole and Jonas Oreland both from Google.  They started with the why of encrypting data.  The threat they are trying to protect from are:

  • access through network APIs (mysql client …) — not protecting against this
  • Access within from a running server (ptrace, memory dumping, etc.)– not protecting against this
  • Lost or misplaced disks — ARE PROTECTING
  • Backups — ARE PROTECTING

Not all threats are feasible to protect against.  An attacker with unlimited network access and time will be able own you.  And if they can get root they can own you.  You could encrypt data in the columns from the application but a lot of work and you cannot access your data via SQL.  It is also incompatible with 3rd party products.  You can purchase middleware products to provide column encryption.  (MyDiamo and CryptDB were given as examples).  Indexing because an issue with these approaches.  You could just encrypt all the disks but when mounted it would be unencrypted.

In their approach they wanted to encrypt all user data to include temporary tables and InnoDB data and log files.  They wanted to make drive-by data exfiltration impossible.  An attacker will need data and keys to decrypt.  InnoDB organized in 16K pages and they encrypt all pages except for page 0.  They do not encrypt page headers or page trailers.  They also do not use the same key to encrypt multiple similar pieces of data to avoid being able to make guesses on known attributes of InnoDB pages (i.e. lots of zeros in the middle of a page due to initialization).  They encrypt all log blocks but the first 4.  Each block is encrypted with a different key.

They have a Google key management infrastructure so you will need to role your own as theirs is not open source.  Keys are never stored on disk.  Only used in memory.  They did however public a key management plugin interface to allow for someone else to write an open source solution.  They rotate keys on the redo logs by writing dummy records to age out blocks as needed.  Temporary Tables should age themselves out within MySQL.  Binary and relay logs are encrypted by the latest key and similarly age out.  For InnoDB data they keep the key version in the page header and they have background threads that re-encrypt pages as needed to rotate keys.  The number of threads and how much IOPs should be used for rotation are both configurable.

The encryption works with replication as well.  Code available at code.google.com/p/google-mysql.

Key Notes Day 3 – Percona Live 2015

They announced Percona Live Europe in Amsterdam 21- 22nd September.  Call for papers is now open.  They also announced the Percona Live dates for next year as April 18-21 2016.  The conference will be back in Santa Clara.

Rob Young from Percona came up and discussed Percona’s Take on Smart Data Management (What’s Next for Products and Services).  They announced the formation of a customer advisory board to provide feedback to Percona.  They see that we need to evolve or die in the MySQL space.  Data has to perform and must provide flexibility.  It is no longer acceptable for the data to shape us.  We have to shape the data.  The most important thing is how compelling the application is not how maintainable it is.

Today’s data problems do not fit a single solution shaped hole.  The solution requires hard to find talent and skillsets.  Customers have to be the focus, not the technology.  Technology is just a means to an end.  He drew attention to the jbs board and how hard it is to find talent.  An example is craigslist which is using MySQL for active listings and MongoDB for their archives.  No longer is one solution able to effectively handle all data workloads.  Then Rob shared statistics showing the explosive growth of MongoDB.  He then announced the Percona is becoming the first MySQL and MongoDB Data Performance Company.  Their acquisition of TokuDB puts them in a great position to pull this off.  For information on TokuMX see http://www.tokutek.com/tokumx-for-mongodb/.

Rob then went over a high level overview of Percona Server 5.7.  They plan to integrate TokuDB into the next version of Percona XtraDB Cluster.  Then he went over Percona’s commitment to open source solutions for the MySQL and soon the MongoDB platforms.  Rob shared Percona’s plans to leverage Percona Cloud Tools to speed up and improve their consulting and support solutions.

Amrith Kumar from Tesora came up to discuss Building Databases for the “-as-a-Service” World.  I had attended his session on Trove earlier in the week.  And Trove has a lot of promise to improve DBAs scale of influence.  Amrith compared the conflict of design and assembly line workers that build cars with the development and operation groups inside of IT.  If you have two organizations in your organization and you have conflict then combine them.  This leads to DevOps.  But hiring for DevOps is like trying to hire a superhero.  The answer is automation.  Automation improves quality and drives down costs.  This frees up your engineers for higher level issues.  It allows DBAs to work more closely with developers and provide what they need to innovate.

OpenStack and Trove allow you to provide this type of Database automation.  Not only can you use Trove to automate it can also be used to provide self service.  IT also helps automate backups and maintenance.  Trove’s intends to be a full lifecycle automation framework for database services.  Tesora works with experts on different platforms to develop their automations.  Today they support a number of different data platforms both relational and nosql.  Amrith discussed the collaboration between Percona and Tesora to make Percona products easier to automate.  He really shared a compelling presentation on why automation is important and we will look at Trove more closely.

MySQL at Twitter – 2015 – Percona Live 2015

My last session of the day was “MySQL at Twitter – 2015” by 3 engineers at Twitter (Calvin Sun, Inaam Rana, and Anton Kuraev).  They mainly use MySQL for Timeline, Social graphs, users, and other uses for internal operations.  They use Master-Master replication but mostly Master-Slave.  They are running on Twitter WebScale 5.6.23.  They went over a lot of their improvements they have put into MySQL5.5.  Two of interest were auditing and query throttling.

Some of the changes are not practical for our workloads, but they have ported a lot of 5.7 features back into WebScale 5.6.  They showed that in an I/O bound cluster the node upgraded to 5.6 had much fewer page flushes.  They are working on a copy of write to improve incremental backups and provide consistent snapshots.  They also announced MySQL on Mesos named Mysos.  Mesos is a distributed kernel that makes for very large clusters of computers.  This looks pretty cool and is worth a deep dive after I get back from conference.

The Copy on Write has the potential to really help the MySQL backup story.