Druid is an open source data store designed for real-time exploratory analytics on large data sets. The system combines a column-oriented storage layout, a distributed, shared-nothing architecture, and an advanced indexing structure to allow for the arbitrary exploration of billion-row tables with sub-second latencies.


Problem and motivation

  • The goal is to provide a multi-tenant, highly available system which serves the functionality of low latency data ingestion and interactive query platform. [main novelty]

Design Decision

  • Low latency
    • Specialize node servers to process real-time information with Message bus for parallelism
  • High Availability
    • Make node servers query-able when external failure happened

Overall Architecture

alt text

System Architecture

  • Real-time node
    • Only concerned with events for some small time range
    • Persists real-time query data in memory and on disk
    • Periodically hand off immutable batches of events to “Deep Storage” (such as HDFS, S3)
  • Message Bus such as Kafka

    alt text

    • Collaborated with real-time node and serve as a producer for event stream.
    • Functionality
      1. Acts as a buffer => fast recover for real-time node
      2. Acts as a single end-point => enables data partition and data replication
  • Historical node
    • Functions to only load, drop, serve immutable data blocks created by real-time nodes
    • Shared-nothing architecture, connect Zookeeper for segment info, cache for fast recovery
    • Features
      • Supports read consistency
      • Available when Zookeeper down(only serve what it have)
      • Grouped into different “Tiers” [with different importance]
  • Broker node
    • Functionality
      • Act as a query router, to route query request to real-time node or historical node, merge partial result before return to client
      • cache results for no re-computation at historical node, never cache real-time result
    • Keeps query-able when Zookeeper down, use last knowledge of clusters to route
  • Coordinator node
    • Functionality
      • In charge of data management and distribution on historical node
    • Use multi-version concurrency control swapping protocol to manage immutable segments
    • Connect to MySQL to maintain segments distribution info
    • Functions as a load balancer , segments replication manager for historical node.
    • Unable to work when Zookeeper down, but not affect data availability

Storage format

  • Data tables in Druid are collections of timestamped(required) events and partitioned into a set of segments
  • Stored in a column orientation for CPU efficiency, associate with different compression methods

References

[1]Druid: A Real-time Analytical Data Store