Nicolai Plum from Booking.com gave a presentation on how to split a database into multiple databases while minimizing/managing the impact to applications using it. There is no voodoo magic, but it was a great session covering how to work with the application developers to split a database.
- Stated with sales DB
- Customers
- Transactions
- Delivery
- Catalog
- Inventory
- Reviews
- Breaking out the Product information
- Catalog
- Inventory
- Reviews
- Good idea to change schema name as you go to avoid confusion
- Why
- size
- query time
- regulatory requirements
- business/architecture reasons
- “GDPR best practice is to split private information away from non private information”
- How to choose tables
- minimize joins
- Maximise gains
- split out bigger, fastest growing, business, most regulated
- size and activity use: ps-top (utility to help figure this out)
- Code changes needed
- may be able to use ProxySQL (has not tried it)
- try to separate read and write traffic
- “Everyone has a test environment, some people have a production one as well”
- Use binlog to help
- Get rid of joins in updates
- joins move from sql to app code
- foreign keys across servers will not work – relational integrity checks move up to app layer
- uses a translation master to move data between the two schema
- uses replicate-rewrite-db
- uses replicate-wild-do-table
- Steps to create translation master
- Make a slave, with binlog enabled
- shut down MySQL
- configuration: Add
- skip-slave=start
- replicate-rewrite-db=’sales=>product’
- replicate-wild-do-table…..
- Create new schema
- Rename all tables that are being split to the new schema
- Drop old schema from translation master
- start replication slave
- Remove skip-slave-start
- If replication stops with missing table errors, find the bad query, fix up (skip) or make a new translation master
- Add product schema on sales master to allow product to read and write to the same schema name
- use sys;
- call create_synonym_db(‘sales’, ‘product’);
- Need views in slaves also
- Now ready to remove translation master and cut over to the new product replication topology
- Need to fix up grants – they don’t replicate through translation master
- use different user names to avoid collisions/confusion
- Now cut the name over – either push config or use DNS CNAME or A record update
- Tradeoffs with cut over
- Disconnect first to have cleanest cut over but with a little downtime
- Disconnect last to keep application up but with risk of collisions