OpenWorld 2017: MySQL Document Store

This was a great session that was centered on the updates to JSON support in MySQL 8.  But to paint the full picture of how MySQL 8 is making JSON a first class data citizen in the platform, it also covered some updates to InnoDB Cluster and the new MySQL Shell.  The session was present by Alfredo Kojima, Miguel Araujo, and Mike Zinner all from Oracle.  Mike is responsible for client tools – shell, workbench, router, connectors, etc.  Alfredo was there to talk about the changes in InnoDB Cluster, and Miguel for the Document Store itself.

These changes make it more possible for you to use MySQL as a NoSQL document store.  With MySQL however you can retain ACID compliance while doing most (if not all) that NoSQL data platforms like MongoDB provide.  It will be interesting to get my hands on this to see what tradeoffs they make to retain the ACID model.  Especially compared to other NoSQL data engines that take the BASE model.

My notes from the session where.

  • MySQL Document Store Intro
    • What is a document?
      • structured data
      • They are using JSON (=JavaScript Object Notation)
      • Others are XML, YAML, etc.
    • Relational Model vs. Document Model
      • Higher Upfront effort during design vs. lower effort upfront
      • Database ensures integrity vs. Application responsible for integrity
      • Schema updates can be problematic vs. Simple schema change
    • Now MySQL supports both relational and document models
    • Exposes new CRUD interface, while keeping SQL
    • Standard plugin for MySQL Server (since MysqL 5.7.12)
    • Store JSON documents in MySQL inside tables (JSON type column)
    • Store JSON outside of relational tables (collection not table)
      • Collections belong to a database/schema just like a table, so you can have collections of the same name in different schemas on the same instance
      • Collections are implemented as tables(with an id and a doc column of type json)
    • Key takeaways
      • INNODB
      • transactions and locking
      • ect…
  • Components of the MySQL Document Store
    • MySQL Shell – new command line interface
    • Works with InnoDB Cluster
    • X Protocol – new protocol to deal with documents (more efficient)
    • Connectors for
      • .NEt
      • Node.JS
      • Python
      • Java
      • etc..
    • Document Store is tightly connected to InnoDB Cluster
      • for scale out
  • InnoDB Cluster
    • basically turning MySQL into a scale out architecture (looks like MongoDB or other NoSQL system architectures to me)
  • The X DevAPI – A modern CRUD App Programming Interface
    • To create a collection – “var products = db.createCollection(‘products’);”
    • Add document – “products.add({“_id”: 123, “name”: “Test1″}).execute();”
    • retrieve/select – “products.find(“price > 10 and price < 100”).fields([“_id”, “name”]).sort([“name”]).execute();
    • change – “products.modify(“_id”: 123  <missed it>
    • delete – products.remove(” <missed it>
  • Node Js example – Async call to MySQL and function (doc) as loopback

var schema = session.getSchema(‘mySchema’);

var coll = schema.getCollection(‘myColl’);

var query = “$.name == :name”;

coll.find(query).bind(‘name’,  ‘Alfredo’).execute(function (doc) {

console.log(doc);

})

.catch(function (err) {

console.log(err.message);

console.log(err.stack);

});

  • Demo of shell interface
    • port 33060 for XDev API port
    • new shell defaults to java script mode
    • can switch to sql with \sql
    • \js to switch back to javascript mode
    • dba.  used to manage the instance with javascript
    • session.sql(“SELECT * FROM TEST1;”).execute(); – for sql in javascript shell
    • session.sql(“select ?”).bind(“‘Hello World'”).execute(); – by using bind it escapes everything for you
    • much more powerful interface even for DBAs
    • scripts require the .execute(), but not needed while using the shell
    • new shell documentation – https://dev.mysql.com/doc/x-devapi-userguide/en/
    • new uuid with timestamp at the end to be more sequential for performance (within a session)
  • Documents via SQL
  • Behind the Scenes

OpenWorld 2017: Kafka, Data Streaming and Analytic Microservices

My company has been wrestling with the need for a more scalable and standardized method of integrating applications for a while.  We have taken steps toward moving to an enterprise data bus to improve how we do this.  Stewart Bryson’s session was a great example of how Kafka can be used to do this.  While his focus was on BI and data integration, he did a great job of showing how Kafka improves the story for other use cases as well.   My notes from his session are:

 

  • History Lesson
    • Traditional Data Warehouse
      • ETL -> Data Warehouse -> Analytics
    • All analytic problems can’t be solved by this paradigm
      • Realtime events
      • Mobile Analytics
      • Search
      • Machine Learning – Old BI prejudices pattern matching
  • Analytics is not just people sitting in front of dashboards making decisions
  • Microservices
    • “The microservice architecture is an approach to developing a single application as a suite of small services, each running its own process and communicating with lightweight mechanisms, often an HTTP resource API.  These services are built around business capabilities and independently deployable by fully automated deployment machinery.” – Martin Fowler
    • Opposite of the Monolith
    • separate but connected
    • API contracts
  • Proposed Kafka as single place to ingest your data
  • Use Kafka as our data bus
  • “Analytics applications” are different than “BI platforms”
    • Data is the product
    • Needs SLAs and up times (internal or external)
  • Apache Kafka
    • Kafka – not state of the table but more like the redo log (true data of the database)
    • capturing events not state
    • sharded big data solution
    • topics – schemaless “table”
    • schema on read not on write
      • If you have schema at all
      • just bytes on a message
      • you choose the serialization
    • topics replicated by partitions across multiple nodes (named broker in kafka)
    • confluent is open sourced kafka with add ons (also have paid versions)
    • producers put data into kafak
    • consumers read data from kafka
    • producers and consumers are decoupled (no need to know who consumes the data to put it in)
  • kafka connect
    • framework for writing connectors for kafka
    • sources and sinks (producers and consumers)
  • kafka knows where you are at in the stream using a consumer group
  • Can reset your offsets to reload data
  • makes reloading data warehouse much easier
  • makes testing new platforms easier
  • Event Hub Cloud service – Oracle’s kafka in the cloud

Thanks to Robin Moffatt (@rmoff) for correcting the notes around schema.  He pointed out that with Kafka it is just bytes on a message, so you choose the serialization format and thus whetehr there is a schema.