Druid for real-time analysis

Yann Esposito

7 Avril 2016

Druid the Sales Pitch

Intro

Experience

Real Time?

Demand

Reality

Origin (PHP)

OH NOES PHP! 

1st Refactoring (Node.js)

Return of Experience

MongoDB the destroyer 

Return of Experience

Too Slow, Bored
Too Slow, Bored

2nd Refactoring

2nd Refactoring (FTW!)

Now we’re talking 

2nd Refactoring return of experience

Thanks Druid!

Demo

DEMO Time

Pre Considerations

Discovered vs Invented

Try to conceptualize a s.t.

Analytics: timeseries, alerting system, top N, etc…

In the End

Druid concepts are always emerging naturally

Druid

Who?

Metamarkets

Powered by Druid

Goal

Druid is an open source store designed for real-time exploratory analytics on large data sets.

hosted dashboard that would allow users to arbitrarily explore and visualize event streams.

Concepts

Key Features

Right for me?

High Level Architecture

Inspiration

Index / Immutability

Druid indexes data to create mostly immutable views.

Storage

Store data in custom column format highly optimized for aggregation & filter.

Specialized Nodes

Druid vs X

Elasticsearch

Key/Value Stores (HBase/Cassandra/OpenTSDB)

Spark

SQL-on-Hadoop (Impala/Drill/Spark SQL/Presto)

Data

Concepts

Indexing

Loading

Querying

Segments

Roll-up

Example

timestamp             page    ... added  deleted
2011-01-01T00:01:35Z  Cthulhu     10      65
2011-01-01T00:03:63Z  Cthulhu     15      62
2011-01-01T01:04:51Z  Cthulhu     32      45
2011-01-01T01:01:00Z  Azatoth     17      87
2011-01-01T01:02:00Z  Azatoth     43      99
2011-01-01T02:03:00Z  Azatoth     12      53
timestamp             page    ... nb added deleted
2011-01-01T00:00:00Z  Cthulhu      2 25    127
2011-01-01T01:00:00Z  Cthulhu      1 32    45
2011-01-01T01:00:00Z  Azatoth      2 60    186
2011-01-01T02:00:00Z  Azatoth      1 12    53

as SQL

GROUP BY timestamp, page, nb, added, deleted
  :: nb = COUNT(1)
  ,  added = SUM(added)
  ,  deleted = SUM(deleted)

In practice can dramatically reduce the size (up to x100)

Segments

Sharding

sampleData_2011-01-01T01:00:00:00Z_2011-01-01T02:00:00:00Z_v1_0

timestamp             page    ... nb added deleted
2011-01-01T01:00:00Z  Cthulhu      1 20    45
2011-01-01T01:00:00Z  Azatoth      1 30    106

sampleData_2011-01-01T01:00:00:00Z_2011-01-01T02:00:00:00Z_v1_0

timestamp             page    ... nb added deleted
2011-01-01T01:00:00Z  Cthulhu      1 12    45
2011-01-01T01:00:00Z  Azatoth      2 30    80

Core Data Structure

Segment 

Example

dictionary: { "Cthulhu": 0
            , "Azatoth": 1 }

column data: [0, 0, 1, 1]

bitmaps (one for each value of the column):
value="Cthulhu": [1,1,0,0]
value="Azatoth": [0,0,1,1]

Example (multiple matches)

dictionary: { "Cthulhu": 0
            , "Azatoth": 1 }

column data: [0, [0,1], 1, 1]

bitmaps (one for each value of the column):
value="Cthulhu": [1,1,0,0]
value="Azatoth": [0,1,1,1]

Real-time ingestion

Batch Ingestion

Real-time Ingestion

Task 1: [   Interval          ][ Window ]
Task 2:                        [                     ]
----------------------------------------------------->
                                                  time

Querying

Query types

Example(s)

{"queryType": "groupBy",
 "dataSource": "druidtest",
 "granularity": "all",
 "dimensions": [],
 "aggregations": [
     {"type": "count", "name": "rows"},
     {"type": "longSum", "name": "imps", "fieldName": "impressions"},
     {"type": "doubleSum", "name": "wp", "fieldName": "wp"}
 ],
 "intervals": ["2010-01-01T00:00/2020-01-01T00"]}

Result

[ {
  "version" : "v1",
  "timestamp" : "2010-01-01T00:00:00.000Z",
  "event" : {
    "imps" : 5,
    "wp" : 15000.0,
    "rows" : 5
  }
} ]

Caching

Druid Components

Druid

Also

Coordinator

When not to choose Druid

Graphite (metrics)

Graphite__

Graphite

Pivot (exploring data)

Pivot 

Pivot

Caravel

caravel 

Caravel

Conclusions

Precompute your time series?

You’re doing it wrong 

Don’t reinvent it

Druid way is the right way!

  1. Push in kafka
  2. Add the right dimensions
  3. Push in druid
  4. ???
  5. Profit!