MySQL at Twitter – 2015 – Percona Live 2015

My last session of the day was “MySQL at Twitter – 2015” by 3 engineers at Twitter (Calvin Sun, Inaam Rana, and Anton Kuraev).  They mainly use MySQL for Timeline, Social graphs, users, and other uses for internal operations.  They use Master-Master replication but mostly Master-Slave.  They are running on Twitter WebScale 5.6.23.  They went over a lot of their improvements they have put into MySQL5.5.  Two of interest were auditing and query throttling.

Some of the changes are not practical for our workloads, but they have ported a lot of 5.7 features back into WebScale 5.6.  They showed that in an I/O bound cluster the node upgraded to 5.6 had much fewer page flushes.  They are working on a copy of write to improve incremental backups and provide consistent snapshots.  They also announced MySQL on Mesos named Mysos.  Mesos is a distributed kernel that makes for very large clusters of computers.  This looks pretty cool and is worth a deep dive after I get back from conference.

The Copy on Write has the potential to really help the MySQL backup story.

Advanced MySQL Query Tuning – Percona Live 2015

I attended the “Advanced MySQL Query Tuning” session by Alexander Rubin from Percona.  Alex just finished helping us out with a Galera evaluation and he is just top notch.  He also helps organize the Triangle MySQL Meetup which I attend.  The talk was specific to tuning Queries and did not cover configuration.  He went over GROUP BY, ORDER BY, DISTINCT queries and calculated fields (introduced in 5.7).

Indexes can be used to scan to leaf node on equality search.  It can also be scanned at the leaf level for range searches.  MySQL can use the leftmost part of combined indexes (pt_duplicate_index_checker can be used to look for duplicate indexes).  Alex showed how you can tell in the EXPLAIN if MySQL is only using part of the index based on key_len and the columns in the index.  MySQL cannot use a combined index if you are not using the leftmost part of the index.  A covered index covers all the columns in a query (WHERE and SELECT).  A covering index avoids row lookups.

MySQL cannot use an index if you use a calculated expression.  You will need to remove the expression to use an index.  You can also add a field to the table and create a trigger to keep it up to date and then you can use an index on that columns.  In 5.7.7 they add GATHERED field to create a derived column and they can be indexed.  The insert speed is 2x based on Alex’s example.  And he showed that the select does in fact use the index.  You can also define the derived table to be stored, but it increases the size of your table.

Temporary tables are slow and are created when you have group by, range + order by and in a few other cases.  There are both in memory and on disk temporary tables.  The tmp_table_size and max_heap_table_size configuration options are used to control spillage.  If the table is larger than either it will spill to disk.  Memory temporary table do not support BLOB/TEXT.  If you have a covering combined index it can avoid temporary table creation.

With a range scan the covering index will not help.  You can hack it by converting it to a union of multiple queries with covering indexes.  Still creates a temporary table but much much smaller.

When creating a covering index for filesort you need to start with the where condition and then the group by.

A loose index scan considers only a fraction of the key in an index.  The query cannot have joins, the group by should be the leftmost prefix of the index, only allows MIN() and MAX() aggregate functions.

Joining can cause duplicates and adding the DISTINCT can cause a slowdown.  By moving the DISTINCT to an outer query you can avoid spilling to disk if your select does not contain the columns with BLOG/TEXT.

In 5.7 Group By will no longer automatically cause an ORDER BY.  So Developers will need to add the ORDER BY if that is the desired result.

Big Transactions on Galera Cluster – Percona Live 2015

I attended the session “Big Transactions on Galera Cluster” Seppo Jaakola from Codership (folks that write Galera).

Huge Transactions are:

  • Large read set – no problems
  • Large Write Set roughly >10K rows modified can be slow
  • Long term transaction – ok but vulnerable for multi-master conflicts

A large transaction can cause flow control to kick in and prevent any transaction to commit before it finishes.  This is due to strict ordering of transactions to keep certification working deterministically.  Long term transactions are vulnerable to multi master writes.  Galera 3 and earlier can use wsrep_max_ws_rows and _size to limit the size of transactions.  2GB is upper limit for transaction size.

If you have a huge transaction be careful killing it because of rollback.  You may want to adjust flow control to avoid the other nodes from freezing.  Seppo did a demo showing the impact of huge transactions on the cluster.  Solutions could be to skip flow control using wsrep_desync.  This will fix the master, but slaves will just lag further and further behind.  You can also relax the commit order and allow the nodes to be temporarily out of sync (not supported by Galera just an idea).

Streaming replication helps and is added in Galera 4.  wsrep_trx_fragment_unit and wsrep_trx_fragement_size are used to configure.  This basically divides a huge transaction into a series of smaller transactions that will not interfere with Galera flow control.  Seppo set up streaming replication and then reran his demo showing the improvement of the cluster performance.  This will allow Galera to break the 2 GB write set limit by breaking up large transactions into smaller chunks.  These changes also help Galera to handle hot-spot transactions by setting transaction fragment size to 1 statement.

There are still issues with certification is large and small transactions are being written to multiple nodes, but this appears to be a huge improvement for Galera.  I look forward to playing with it in the future.  Galera 4 should be out first half of this year. Additional features Non blocking DDL. streaming replication, and optimized consistency alert shutdown (protocol to avoid cluster crash).

Managing MySQL with Puppet – Percona Live 2015

I attended the “Managing MySQL with Puppet” session at Percona Live 2015 with Jaako Pesonen from Spil Games.  He is using Diffy, Sheldon (provides information about environment, host, etc to Puppet), OpenStack, MySQL, Percona XtraDB Cluster, and Puppet.  In this session he concentrated on Puppet and Percona XtraDB Cluster.  Puppetlabs mysql is provided by Puppet labs.  It used to not be that good, but is getting better.  Puppet contains 4 mysql providors (server, client, bindings,and backups).  Here are my notes:

  • He recommends using Hiera to make puppet more useful and readable
  • to get access to their module you can modify as needed (not set up for general use)
  • Nothing special needed for puppet and MHA or regular replication
  • Galera is special as that the first node is different
  • They use different users between production and staging for backups to avoid having production credentials in staging

There was a lot of puppet configuration slides that I will need to dive into more.  My takeaways is that we can manage Galera with Puppet, but it will take some customization.  I look forward to diving into Jaako’s GitHub repository and trying to figure out what will work best for us.

Better DevOps with MySQL and Docker – Percona Live 2015

So I ran into the Docker session “Better DevOps with MySQL and Docker” by Sunny Gleason from SunnyCloud late.  Docker can help you lay down images of your system but it is not the be all end all for DevOps.  Docker is a replacement for your Hypervisor.  Docker instead is a way to isolate workloads on a single Operating system.  You can run docker on top of a Hypervisor but it is not required.

Need to be careful with public images.  Hard to tell the good from the bad.  Docker working on a process to improve this.  Some of the continuing issues are managing the connectivity between containers and the physical environment.  Cross platform is also very new.  Based on Linux containers so security is still an issue.  Not really for multi tenant but for running multiple trusted solutions on the same hardware.  Deployment can be I/O intensive.

InnoDB Change Buffering – Percona Live 2015

My first real session of the day today at Percona Live was “InnoDB Change Buffering” with Davi Arnaut from LinkedIn.

The change buffer is used to batch changes to non-unique secondary indexes to avoid random I/O. This is a feature that has to be enabled and you can use innodb_change_buffering to

  • inserts
  • deleted
  • purges
  • changes (inserts and delete-mark)
  • all (default)
  • none

If the page is not in the buffer pool then the changes is buffered and it marks the buffer bitmap that the page has pending changes and then the changes are merged on read from disk.  The change buffer is actually just a separate index not exposed with page number as the index.  It is also smart enough to know that a change  will cause a page to be empty or to split and it will not buffer those changes.

Need to grab these slides and go over it as to keep up and take notes was not working very well.

Key Notes Day 2 – Percona Live 2015

After the standard house cleaning items, Steaphen Greene from Facebook came out to discuss WebScaleSQL.  WebScaleSQL is a collaboration project between a number of companies supporting large Web applications on MySQL (Alibaba, FaceBook, Google, LinkedIn, and Twitter).  This is a branch of MySQL modified by all the companies.  They also work with MariaDB, Oracle, and Percona on their development.  He gave a lot of praise to Oracle for their work on MySQL.  Some interesting features are:

  • Asynchronous MySQL Client
  • Lower InnoDB buffer pool memory footprint
  • Reduced double-write mode
  • Server-side query management features
    • Query throttling etc.
  • Logical read-ahead (faster backups)

Coming features

  • On-line InnoDB rebuilds
  • DocStore
  • MySQL and RocksDB

He then wrapped up with a discussion on the legal challenges of trying to get WebScale changes into the main MySQL code branch.

Then Patrik Sallner from MariaDB came up and gave a talk on how open source can help solve data management problems.  His slide on the data storage subway map was very interesting and worth trying to find.  He points out that open source provides for greater collaboration between companies and reduce the duplication of efforts.  He showed how MariaDB has included changes from a large list of companies into their code base.  One feature discussed of interest was the Connect Storage Engine which allows federation of data from other data platforms.  MariaDB 10.1 will have Galera in the base code so no separate binaries, activation is based on configuration.  10.1 will also have tablespace and table level encryption built in. My take away was that I need to look into Webyog and Sea Lion for monitoring.

After Patrik, Tomas Ulin from Oracle came out to celebrate the 20th anniversary of MySQL.  It is also been 10 years that Oracle has owned InnoDB and 5 owning MySQL.  5.7 Release candidate was announced last week.  Highlights of 5.7 are:

  • 2 x faster then 5.6
  • Enhanced InnoDB: faster online and bulk load operations
  • Replication Improvements: multi master multi threaded slaves (8-10x faster slaves)
  • New optimizer cost model: greater user control and better query performance
  • performance schema improvements
  • Query Rewrite Plugin
  • JSON support added (labs now but planned for GA)
  • Functional indexes (wasn’t clear if this was new)
  • SYS Schema improvements to help with monitoring and administration
  • GIS improvements
  • InnoDB improvements
    • Native partitioning
    • configurable fill factor
    • Buffer Pool resize online
    • Separate UNDO tablespace
  • Group Replication (probably not available for GA see

Then he talked about MySQL Fabric 1.5.  The product sounds promising and we will take a look at this.  Also, Workbench 6.3 will have support for Fabric and many of these other features (just went RC).  Call for papers ends in two weeks for Open World.

Last but not least Robert Hodges from VMWare came out to talk about the cloud.  He talked about how VMWare is looking to build a public cloud that better integrates with private clouds that Enterprises already have.  The talk was basically a overview of the features of the newest version of VMWare.

Performance Birds of a Feather – Percona Live 2015

This session was a talk with Rick James where we just talked about performance issues we in the audience were interested in.  ( My Notes are:

  • Start with InnoDB buffer pool at 70% of memory unless really small VM
  • Partitioning should not be done for performance
  • Stored procedures can provide better performance over views
  • If you are replicating then break up your deletes into small batches
  • We talked about ways to create shards to minimize impact
  • Slow Query log at 2 seconds or less and use pt-query-digest to analyze
  • Batch your inserts to have one insert statement with up to a 1,000 rows to add
  • For Galera you can make an argument that you can run innodb to 0 or 2 because a node can be rebuilt with SST
  • You can use INFORMATION_SCHEMA tables to build a blocking chain
  • subqueries can be poor performing because the optimizer is not smart enough to handle them
  • Try your best to turn them into a join unless you have a group by in the subquery to limit the rows
  • Also can create a temp table in memory and join to it instead of a subquery

I need to grab Rick’s information on Galera for developers.

Scaling MySQL in the cloud with Vitess and Kubernetes – Percona Live 2015

Scaling MySQL in the cloud with Vitess and Kubernetes” by Anthony Yeh from Google.  He works on the YouTube team at Google.  The talk was basically how they use Vitess and their container platform (Kubernetes).  Vitess provides

  • Performance
  • Protection
  • Monitoring/Analysis
  • Mangement
  • Sharding

It sits between the application and database layer. It leverages a vtgate which abstracts the data layer and what shard to go get the information from.  Can separate live traffic from analytic traffic.  Vitess can cache based on primary key identified rows.  This is different from the block caching performed by MySQL.  Vitess will kill poorly performing queries automatically to avoid replicating bad queries.  It performs transparent rerouting.  It also performs live resharding.

Vitess is a container cluster.  Kubernates is a Container Cluster OS built on top of Docker.  For more info see  Looks very promising and will need to see how this fits with Trove.

Sharding, horizontal and vertical scaling – Percona Live 2015

In “Sharding, horizontal and vertical scaling“, Tom Christ from Tumblr discussed sharding for scale.  He discussed scale dimensions of read, write, and size.  While challenges are often specific to data sets and/or applications, there are ways to categorize and reason about them.  Read challenges include query complexity, query volume, and buffer pool effectiveness.  To fix read issues you can scale horizontally with replica fanout, or you can add caching.  You can also scale vertical with bigger servers, or leverage smaller concurrent queries.

He discussed the options for write and size scaling and this is where sharding comes into help.  He discussed the trade off between costs, complexity, write amplification, and concurrency when deciding how to scale and if sharding should be part of your solution.  Tom discussed the types of sharding lookup, range, and  mod sharding.  He also went over forms of caching: key/value (memcache) and structured (redis).

All in all a good overview of issues with scale and how sharding and cache can help and the pitfalls.