Percona Live 2015 Wrap Up

With Percona Live 2015 in the books I thought I should take the time to gather my thoughts.  There was a ton of information shared at the conference.  We sent 3 people to the main conference and one to the new MySQL 101 conference and we still had to leave excellent presentations on the table.  But we got what we came to get.  We feel more comfortable with MySQL in general and with Galera in particular.  We are rolling out a Percona XtraDB Cluster soon and not only did this conference clear up some questions, it also helped us tweak our architecture to provide better availability for our customers.

I could go on and on about all the technical knowledge we gained here, but I have already posted notes on each of the sessions I attended.  But beyond the technical, I was impressed by the vibrant and innovated community surrounding MySQL.  We spoke to people solving some of the most challenging OLTP workloads out there.  Not only did they show us that MySQL can support these workloads, they showed us how MySQL is better suited than proprietary database platforms.  MySQL gives you the ability to scale at a cost point that proprietary RDBMs can’t.

I know some will say that it takes application changes in order reach these levels of scale with MySQL.  This is true today, but I saw several presentations where the MySQL community is working on solving this problem and developing a layer of abstraction that “traditional” applications can connect to and not have to worry about the complexity.  In fact, MariaDB‘s MaxScale is already providing this type of functionality.

Heading back home and thinking about how to use all this knowledge for good I realize that I will have a lot of resistance.  We have a lot invested in our existing proprietary database platforms.  But talking to folks who have been where I sit, I realized this week that if we don’t adapt to speed up the rate of innovation we will become irrelevant.  I for one plan to evolve!

My Session Notes

JSON support in MySQL 5.7 – Percona Live 2015

I attended the “JSON support in MySQL 5.7” session given by Evgeny Potemkin and Manyi Lu from Oracle.  This was my last session of the conference and my head was swimming with knowledge at this point.  In 5.7 they are adding json data type, json functions and the ability to index json data.  Before your options were to store json as either text or binary but you had to manage all the validation, parsing, etc.  The new JSON datatype is

  • optimized for read intensive workload
  • parse and validation on insert only
  • dictionary (sorted object keys, fast access to array cells by index)
  • In-place updated
  • Smart size 64K & 4G
  • UTF8 encoding
  • Supports all native JSON types (numer, string, bool, object, arrays, etc.)
  • Also supports Date, time, datetime, timestamp, and others

They gave some sample inserts that I will need to grab from their slides.  JSON comparator has polymorphic behavior and provides seamless and consistent comparison.  So you can compare JSON to JSON or JSON to SQL.  Different data types will always be non-equal so no automatic type conversion.  They then showed a sample select using the JSON datatype.  They also have introduced the concept of JSON path which is a way to identify how to get JSON data.  The details are on the slides and it was too much to record so I will have to grab it later for the developers.  But $ is used for the document root, [] define an array, and they use * for a search.  Other functions

  • JSN_VALID()
  • JSN_TYPE()
  • JSN_KEYS()
  • JSN_LENGTH()
  • JSN_DEPTH()
  • JSN_CONTAINS_PATH()
  • JSN_REMOVE()
  • JSN_APPEND()
  • JSN_SET()
  • JSN_INSERT()
  • JSN_REPLACE()
  • JSN_MERGE()
  • JSN_ARRAY()
  • JSN_OBJECT()
  • JSN_EXTRACT()
  • JSN_SEARCH()
  • JSN_QUOTE()
  • JSN_UNQUOTE()

They then showed how you can use a select to return the data as JSON using these function (again need to refer to slides).  They will support both Stored and Virtual indexes on JSON data.  Primary index must be stored.  Stored indexes can be BTREE, FTs, or GIS and Virtual can only be BTREE.  They plan to add Instant ALTER to Virtual indexes soon.  They have a bunch more with JSON on the roadmap like Full-text indexing and GIS indexes.

Incremental backups available with Xtrabackup and Percona Server – Percona Live 2015

In the session “Incremental backups available with Xtrabackup and Percona Server” Vladislav Lesin and George Lorch from Percona went over differential backups.  There are three ways to get the delta; full scan, use redo logs, or a combination of the two.  For full scan you scan all pages and backup any pages with last change LSN newer than a specific LSN.  Advantages are that you only store changed pages and there are no server changes required.  The downside if you have the overhead of a full scan. The redo logs can be used to restore a database to/from a specific point since it has the change data.  Advantage of using redo log are that there is no full scan.  The downsize is the space consumed.

In the combined method they store the LSN information on pages in the server in a RB-tree utilizing a separate thread that checks the redo logs on checkpoint.  This is to avoid the full scan.  They shared an edge case where size can be an issue with this approach.

How DevOps Met Gaming Needs – Percona Live 2015

I attended the “How DevOps Met Gaming Needs” session by Sean Chighizola and Peter Garbes at Big Fish Games.  They talked about the migration process to a DevOps model.  They went over the history of Big Fish Games which started in 2002 with PC/Mac games.  They use Akimai for their delivery.  They use MySQL to run their site with physicall partitioning by feature/service.  They used a the following technologies, Nagios, cacti, php, perl, subversion, bash scripts, etc.  Their releases were site down deployments with all hands on deck.

The transition was hard to manage.  Change was difficult, there has to be a driver.  The addition of the free to play market was the driver for Big Fish.  They used Fairway Solitaire as an example of a game the made the transition from the digital download to the free to play model.  Even after the migration it would take 20-120 minutes to push out a change.  A change was pushed every day.

Big Fish moved their daily courses to a self-service process.  Was very successful.  But it was not all roses.  They had a lot of non standard changes that did not match their expected workflow.  They saw the DevOps model as a way to solve these residual pains.  They leveraged Jez Humble’s Continuous Delivery book as a guide.  They were able to increase velocity while maintaining team size.  They also were able to decrease failure rate.

They discovered that giving developers operational responsibility will make them think like operators.  They use MEM 3.0 to provide self-service with Query Analyzer being a key driver for that decision.  IT is not in the business of provisioning, we are in the business of engineering processes which put developers in control.  They also went over some MySQL issues they have run into.  I’ll need to grab their slides to get the links to dive in deeper.

They have been moving from Cacti to Graphite.  They have not yet implemented database changes but have a proof of concept in place.  They restrict the types of changes allowed and framework for where to put scripts.

Encrypting MySQL data at Google – Percona Live 2015

I attended the “Encrypting MySQL data at Google” session with Jeremy Cole and Jonas Oreland both from Google.  They started with the why of encrypting data.  The threat they are trying to protect from are:

  • access through network APIs (mysql client …) — not protecting against this
  • Access within from a running server (ptrace, memory dumping, etc.)– not protecting against this
  • Lost or misplaced disks — ARE PROTECTING
  • Backups — ARE PROTECTING

Not all threats are feasible to protect against.  An attacker with unlimited network access and time will be able own you.  And if they can get root they can own you.  You could encrypt data in the columns from the application but a lot of work and you cannot access your data via SQL.  It is also incompatible with 3rd party products.  You can purchase middleware products to provide column encryption.  (MyDiamo and CryptDB were given as examples).  Indexing because an issue with these approaches.  You could just encrypt all the disks but when mounted it would be unencrypted.

In their approach they wanted to encrypt all user data to include temporary tables and InnoDB data and log files.  They wanted to make drive-by data exfiltration impossible.  An attacker will need data and keys to decrypt.  InnoDB organized in 16K pages and they encrypt all pages except for page 0.  They do not encrypt page headers or page trailers.  They also do not use the same key to encrypt multiple similar pieces of data to avoid being able to make guesses on known attributes of InnoDB pages (i.e. lots of zeros in the middle of a page due to initialization).  They encrypt all log blocks but the first 4.  Each block is encrypted with a different key.

They have a Google key management infrastructure so you will need to role your own as theirs is not open source.  Keys are never stored on disk.  Only used in memory.  They did however public a key management plugin interface to allow for someone else to write an open source solution.  They rotate keys on the redo logs by writing dummy records to age out blocks as needed.  Temporary Tables should age themselves out within MySQL.  Binary and relay logs are encrypted by the latest key and similarly age out.  For InnoDB data they keep the key version in the page header and they have background threads that re-encrypt pages as needed to rotate keys.  The number of threads and how much IOPs should be used for rotation are both configurable.

The encryption works with replication as well.  Code available at code.google.com/p/google-mysql.

Key Notes Day 3 – Percona Live 2015

They announced Percona Live Europe in Amsterdam 21- 22nd September.  Call for papers is now open.  They also announced the Percona Live dates for next year as April 18-21 2016.  The conference will be back in Santa Clara.

Rob Young from Percona came up and discussed Percona’s Take on Smart Data Management (What’s Next for Products and Services).  They announced the formation of a customer advisory board to provide feedback to Percona.  They see that we need to evolve or die in the MySQL space.  Data has to perform and must provide flexibility.  It is no longer acceptable for the data to shape us.  We have to shape the data.  The most important thing is how compelling the application is not how maintainable it is.

Today’s data problems do not fit a single solution shaped hole.  The solution requires hard to find talent and skillsets.  Customers have to be the focus, not the technology.  Technology is just a means to an end.  He drew attention to the jbs board and how hard it is to find talent.  An example is craigslist which is using MySQL for active listings and MongoDB for their archives.  No longer is one solution able to effectively handle all data workloads.  Then Rob shared statistics showing the explosive growth of MongoDB.  He then announced the Percona is becoming the first MySQL and MongoDB Data Performance Company.  Their acquisition of TokuDB puts them in a great position to pull this off.  For information on TokuMX see http://www.tokutek.com/tokumx-for-mongodb/.

Rob then went over a high level overview of Percona Server 5.7.  They plan to integrate TokuDB into the next version of Percona XtraDB Cluster.  Then he went over Percona’s commitment to open source solutions for the MySQL and soon the MongoDB platforms.  Rob shared Percona’s plans to leverage Percona Cloud Tools to speed up and improve their consulting and support solutions.

Amrith Kumar from Tesora came up to discuss Building Databases for the “-as-a-Service” World.  I had attended his session on Trove earlier in the week.  And Trove has a lot of promise to improve DBAs scale of influence.  Amrith compared the conflict of design and assembly line workers that build cars with the development and operation groups inside of IT.  If you have two organizations in your organization and you have conflict then combine them.  This leads to DevOps.  But hiring for DevOps is like trying to hire a superhero.  The answer is automation.  Automation improves quality and drives down costs.  This frees up your engineers for higher level issues.  It allows DBAs to work more closely with developers and provide what they need to innovate.

OpenStack and Trove allow you to provide this type of Database automation.  Not only can you use Trove to automate it can also be used to provide self service.  IT also helps automate backups and maintenance.  Trove’s intends to be a full lifecycle automation framework for database services.  Tesora works with experts on different platforms to develop their automations.  Today they support a number of different data platforms both relational and nosql.  Amrith discussed the collaboration between Percona and Tesora to make Percona products easier to automate.  He really shared a compelling presentation on why automation is important and we will look at Trove more closely.

MySQL at Twitter – 2015 – Percona Live 2015

My last session of the day was “MySQL at Twitter – 2015” by 3 engineers at Twitter (Calvin Sun, Inaam Rana, and Anton Kuraev).  They mainly use MySQL for Timeline, Social graphs, users, and other uses for internal operations.  They use Master-Master replication but mostly Master-Slave.  They are running on Twitter WebScale 5.6.23.  They went over a lot of their improvements they have put into MySQL5.5.  Two of interest were auditing and query throttling.

Some of the changes are not practical for our workloads, but they have ported a lot of 5.7 features back into WebScale 5.6.  They showed that in an I/O bound cluster the node upgraded to 5.6 had much fewer page flushes.  They are working on a copy of write to improve incremental backups and provide consistent snapshots.  They also announced MySQL on Mesos named Mysos.  Mesos is a distributed kernel that makes for very large clusters of computers.  This looks pretty cool and is worth a deep dive after I get back from conference.

The Copy on Write has the potential to really help the MySQL backup story.

Advanced MySQL Query Tuning – Percona Live 2015

I attended the “Advanced MySQL Query Tuning” session by Alexander Rubin from Percona.  Alex just finished helping us out with a Galera evaluation and he is just top notch.  He also helps organize the Triangle MySQL Meetup which I attend.  The talk was specific to tuning Queries and did not cover configuration.  He went over GROUP BY, ORDER BY, DISTINCT queries and calculated fields (introduced in 5.7).

Indexes can be used to scan to leaf node on equality search.  It can also be scanned at the leaf level for range searches.  MySQL can use the leftmost part of combined indexes (pt_duplicate_index_checker can be used to look for duplicate indexes).  Alex showed how you can tell in the EXPLAIN if MySQL is only using part of the index based on key_len and the columns in the index.  MySQL cannot use a combined index if you are not using the leftmost part of the index.  A covered index covers all the columns in a query (WHERE and SELECT).  A covering index avoids row lookups.

MySQL cannot use an index if you use a calculated expression.  You will need to remove the expression to use an index.  You can also add a field to the table and create a trigger to keep it up to date and then you can use an index on that columns.  In 5.7.7 they add GATHERED field to create a derived column and they can be indexed.  The insert speed is 2x based on Alex’s example.  And he showed that the select does in fact use the index.  You can also define the derived table to be stored, but it increases the size of your table.

Temporary tables are slow and are created when you have group by, range + order by and in a few other cases.  There are both in memory and on disk temporary tables.  The tmp_table_size and max_heap_table_size configuration options are used to control spillage.  If the table is larger than either it will spill to disk.  Memory temporary table do not support BLOB/TEXT.  If you have a covering combined index it can avoid temporary table creation.

With a range scan the covering index will not help.  You can hack it by converting it to a union of multiple queries with covering indexes.  Still creates a temporary table but much much smaller.

When creating a covering index for filesort you need to start with the where condition and then the group by.

A loose index scan considers only a fraction of the key in an index.  The query cannot have joins, the group by should be the leftmost prefix of the index, only allows MIN() and MAX() aggregate functions.

Joining can cause duplicates and adding the DISTINCT can cause a slowdown.  By moving the DISTINCT to an outer query you can avoid spilling to disk if your select does not contain the columns with BLOG/TEXT.

In 5.7 Group By will no longer automatically cause an ORDER BY.  So Developers will need to add the ORDER BY if that is the desired result.

Big Transactions on Galera Cluster – Percona Live 2015

I attended the session “Big Transactions on Galera Cluster” Seppo Jaakola from Codership (folks that write Galera).

Huge Transactions are:

  • Large read set – no problems
  • Large Write Set roughly >10K rows modified can be slow
  • Long term transaction – ok but vulnerable for multi-master conflicts

A large transaction can cause flow control to kick in and prevent any transaction to commit before it finishes.  This is due to strict ordering of transactions to keep certification working deterministically.  Long term transactions are vulnerable to multi master writes.  Galera 3 and earlier can use wsrep_max_ws_rows and _size to limit the size of transactions.  2GB is upper limit for transaction size.

If you have a huge transaction be careful killing it because of rollback.  You may want to adjust flow control to avoid the other nodes from freezing.  Seppo did a demo showing the impact of huge transactions on the cluster.  Solutions could be to skip flow control using wsrep_desync.  This will fix the master, but slaves will just lag further and further behind.  You can also relax the commit order and allow the nodes to be temporarily out of sync (not supported by Galera just an idea).

Streaming replication helps and is added in Galera 4.  wsrep_trx_fragment_unit and wsrep_trx_fragement_size are used to configure.  This basically divides a huge transaction into a series of smaller transactions that will not interfere with Galera flow control.  Seppo set up streaming replication and then reran his demo showing the improvement of the cluster performance.  This will allow Galera to break the 2 GB write set limit by breaking up large transactions into smaller chunks.  These changes also help Galera to handle hot-spot transactions by setting transaction fragment size to 1 statement.

There are still issues with certification is large and small transactions are being written to multiple nodes, but this appears to be a huge improvement for Galera.  I look forward to playing with it in the future.  Galera 4 should be out first half of this year. Additional features Non blocking DDL. streaming replication, and optimized consistency alert shutdown (protocol to avoid cluster crash).

Managing MySQL with Puppet – Percona Live 2015

I attended the “Managing MySQL with Puppet” session at Percona Live 2015 with Jaako Pesonen from Spil Games.  He is using Diffy, Sheldon (provides information about environment, host, etc to Puppet), OpenStack, MySQL, Percona XtraDB Cluster, and Puppet.  In this session he concentrated on Puppet and Percona XtraDB Cluster.  Puppetlabs mysql is provided by Puppet labs.  It used to not be that good, but is getting better.  Puppet contains 4 mysql providors (server, client, bindings,and backups).  Here are my notes:

  • He recommends using Hiera to make puppet more useful and readable
  • https://github.com/spilgames/spilgames-mysql to get access to their module you can modify as needed (not set up for general use)
  • Nothing special needed for puppet and MHA or regular replication
  • Galera is special as that the first node is different
  • They use different users between production and staging for backups to avoid having production credentials in staging

There was a lot of puppet configuration slides that I will need to dive into more.  My takeaways is that we can manage Galera with Puppet, but it will take some customization.  I look forward to diving into Jaako’s GitHub repository and trying to figure out what will work best for us.