OpenWorld 2017: Kafka, Data Streaming and Analytic Microservices

My company has been wrestling with the need for a more scalable and standardized method of integrating applications for a while.  We have taken steps toward moving to an enterprise data bus to improve how we do this.  Stewart Bryson’s session was a great example of how Kafka can be used to do this.  While his focus was on BI and data integration, he did a great job of showing how Kafka improves the story for other use cases as well.   My notes from his session are:

 

  • History Lesson
    • Traditional Data Warehouse
      • ETL -> Data Warehouse -> Analytics
    • All analytic problems can’t be solved by this paradigm
      • Realtime events
      • Mobile Analytics
      • Search
      • Machine Learning – Old BI prejudices pattern matching
  • Analytics is not just people sitting in front of dashboards making decisions
  • Microservices
    • “The microservice architecture is an approach to developing a single application as a suite of small services, each running its own process and communicating with lightweight mechanisms, often an HTTP resource API.  These services are built around business capabilities and independently deployable by fully automated deployment machinery.” – Martin Fowler
    • Opposite of the Monolith
    • separate but connected
    • API contracts
  • Proposed Kafka as single place to ingest your data
  • Use Kafka as our data bus
  • “Analytics applications” are different than “BI platforms”
    • Data is the product
    • Needs SLAs and up times (internal or external)
  • Apache Kafka
    • Kafka – not state of the table but more like the redo log (true data of the database)
    • capturing events not state
    • sharded big data solution
    • topics – schemaless “table”
    • schema on read not on write
      • If you have schema at all
      • just bytes on a message
      • you choose the serialization
    • topics replicated by partitions across multiple nodes (named broker in kafka)
    • confluent is open sourced kafka with add ons (also have paid versions)
    • producers put data into kafak
    • consumers read data from kafka
    • producers and consumers are decoupled (no need to know who consumes the data to put it in)
  • kafka connect
    • framework for writing connectors for kafka
    • sources and sinks (producers and consumers)
  • kafka knows where you are at in the stream using a consumer group
  • Can reset your offsets to reload data
  • makes reloading data warehouse much easier
  • makes testing new platforms easier
  • Event Hub Cloud service – Oracle’s kafka in the cloud

Thanks to Robin Moffatt (@rmoff) for correcting the notes around schema.  He pointed out that with Kafka it is just bytes on a message, so you choose the serialization format and thus whetehr there is a schema.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s