An In-Depth Look at Time Series Databases for Metrics and Monitoring

Hello friends! Today we are going to do deep dive into understanding time series databases and how they enable powerful monitoring and analytics solutions.

Content Navigation show

What is Time Series Data

A time series is a sequence of data points consisting of numeric measurements and the times at which they occurred. For example:

Server CPU utilization every minute
Daily sales for the last 5 years
Temperature readings from an IoT sensor

This time-indexed data is generated from a wide variety of sources – servers, applications, user interactions, IoT devices, networks, finance systems and more.

Analyzing time series allows spotting trends, performance issues, usage patterns and anomalies. Time series data represents the ultimate source of truth for any monitoring or analytics solution.

Purpose-Built for Sequential Data

Relational databases aren‘t optimized to query series of time-based records efficiently. Time-series databases utilize special schemas, data structures and indexing to optimize storage and retrieval of time-stamped records.

As data ages, time series databases employ data retention policies to automatically roll up older data to coarser resolutions while retaining latest data at maximum resolution. They are focused solely on making writes and reads of sequential data very fast.

Here are the key requirements that time series databases are designed for:

High ingestion throughput – Ability to add a high volume of new records concurrently without impacting read performance

Low latency queries – Retrieve recent or historical records very fast for real-time monitoring

Data compression – Store more metric history using less storage by efficient encoding

Durability – Preserve data integrity through failures via distributed architecture

Rollups – Automatically aggregate older data to coarser resolution

Scalability – Grow storage and throughput smoothly by adding nodes

Next let‘s explore leading open source and commercial time series database options available today.

1. InfluxDB

InfluxDB is a popular open-source time series database optimized for fast reads and writes of metrics and events data. It has a customized data storage engine developed specifically for time stamped data.

Key Features

High ingestion rate exceeding 500K writes per second
Millisecond response for queries on recent data
Custom web-based dashboarding and graphing
SQL-like InfluxQL query language
HTTP/UDP APIs for writing metrics
Built-in data retention and rollup policies
Horizontal scalability through sharding

Here is an InfluxDB write example using line protocol syntax:

# Write web traffic metric
curl -i -XPOST "http://localhost:8086/write" --data-binary ‘web.home.traffic value=720i 1501352144‘

InfluxDB is purpose-built for recording metrics and events data. It works great for monitoring IT infrastructure and applications. It has good scalability suited for large environments.

The open source edition is comparable to the Enterprise version lacking only some security audit controls. InfluxDB is a good choice where open source workloads are preferred.

2. Prometheus

Prometheus consists of multiple components that together deliver a monitoring pipeline – scraping, storing, querying, alerting and dashboarding metrics.

At the core sits a time series database optimized for storing multi-dimensional metrics data efficiently including encoding, compression, chunking and staleness handling.

Key Capabilities

Multi-dimensional data model with metrics + labels
A rich query language called PromQL
Pull-based collection of metrics via exporters
Powerful graphing and dashboarding integration
Targeted more for machines/infrastructure monitoring

Here is a sample PromQL query to average CPU temperature across hosts over 1 hour intervals:

avg_over_time(cpu_tempCelsius{host="[a-z]+"}[1h])

Prometheus works well for cloud-native setups using short-lived instances and containers. It integrates nicely with Kubernetes environments. The various exporters and integrations provide good coverage for infrastructure layer metrics.

3. TimescaleDB

TimescaleDB is a relational database, enhanced with time-series capabilities and automatic partitioning across time. This allows running complex SQL queries on time-stamped records while optimizing performance under the covers.

TimescaleDB supports the complete SQL interface while providing fast analytics. Features like compression, horizontal scalability and multi-threading allow leveraging modern hardware efficiently even for trillion row datasets from sources like IoT sensors, monitoring stats or financial records.

Key Highlights

Full SQL interface for advanced queries across time
Up to 100x faster queries compared to PostgreSQL
Automatic data compression
Horizontal scalability through sharding
Specialized indexing for time-based data

Here is a sample SQL query in TimescaleDB to aggregate metrics over a time range:

SELECT
  avg(requests), region 
FROM
  metrics
WHERE
  time > now() - interval ‘3 days‘  
GROUP BY
  region;

TimescaleDB works very well for storing metrics from applications, microservices, IoT devices etc. The familiar SQL access allows easier reporting and analysis integration.

4. Graphite

Graphite consists of 3 software modules:

Carbon – Daemon process that listens for time-series metrics
Whisper – Database library that stores metrics data
Graphite-Web – Renders graphs and dashboards for visualization

It follows a traditional metrics processing pipeline:

Collect – Feed metrics via data ingestion protocols
Aggregate – Rules for rollups, precision retention
Visualize – Render charts, dashboards and alerts

Graphite focuses mainly on rendering performance metrics visually. It has rich graph rendering capabilities allowing flexible charting with transformations.

Key Features

Flexible visualization with ability to transform, aggregate metrics
Great support for creating custom dashboards
Plugin architecture supports wide variety of data formats and protocols
Minimal dependencies and lightweight footprint

For example, given below is a query that smooths a server metric using moving average:

movingAverage(server*.requests, 60)

Graphite scales horizontally by running multiple Carbon daemons with civilian hierarchies. It is a great open source platform for visualizing metrics from varied sources.

5. OpenTSDB

OpenTSDB is a distributed time series database built on top of HBase for storing and serving massive amounts of time series metrics without losing granularity.

It encodes metrics using a key-value lookup table model, built on top of HBase tables as storage. HBase allows OpenTSDB to scale massively by distributing metrics across nodes.

Features

Billions of metrics per node
Millisecond timestamp precision
Customizible data retention policies
Horizontal scaling with HBase cluster
Support for high Cardinality tags
Compression plugins for storage savings

This flexibility makes OpenTSDB suitable for recording metrics at massive scale. It follows classic IT monitoring design tenets – store data forever in raw unaltered form, run rollups and aggregations on read.

Here is a sample telnet query on OpenTSDB:

put sys.cpu.user 1356998400 42.5 host=webserver01

OpenTSDB offers great leveraging of cheap commodity infrastructure for building enterprise monitoring systems.

6. QuestDB

QuestDB is an open source time series database optimized for ingesting, analyzing and storing massive amounts of machine-generated time series data.

It implements a relational data model while storing data in proprietary columnar formats optimized specifically for timeseries workloads. QuestDB can deliver ingestion speeds exceeding 1 million rows per second on a single node.

Key highlights:

Relational data model with SQL querying
Columnar storage format delivering blazing performance
Highly efficient data compression
Constant-time inserts and queries
Optimized for latest generation NVMe SSDs
Embeddable library for edge computing use cases

Here is an example query:

SELECT * FROM sensors 
   WHERE sensor=‘temp‘ AND timestamp>=NOW - 1 HOUR;

QuestDB offers excellent write performance coupled with a familiar SQL interface. It works very well for industrial scale telemetry applications.

7. Amazon Timestream

Amazon Timestream is a fast, scalable, serverless time series database offered as a fully managed cloud service. It is purpose built to collect, store and process the massive amounts of time series data produced by IoT devices, applications, infrastructure and more.

It uses a multi-tiered storage architecture to support ingesting trillions of events per day and petabyte-scale history storage – all while maintaining millisecond query performance.

Key capabilities:

Fully managed DBaaS removes operational heavy lifting
Native support for time series data structures
Scales writes and storage automatically
Analytics via familiar SQL queries
Data retention policies
Encryption at rest and in transit

Here is an example query:

SELECT avg(cpu) 
FROM metrics
WHERE region = ‘us-west‘ 
  AND time > now() - interval ‘5‘ minute
GROUP BY time(1m);

Amazon Timestream offers a purpose-built serverless database for time series data that leverages AWS‘s operational expertise running trillions of metrics worldwide.

Comparing Platforms

Here is a quick comparison across some popular options:

Database	Data Model	Licensing	Language	Compression	Dashboards
InfluxDB	Time series	Open source	Go	Basic + Advanced enterprise	Built-in + Chronograf integration
Prometheus	Multi-dimensional metrics	Open source	Go	Snappy	Grafana integration
TimescaleDB	Relational	Apache 2	C/C++	Multiple algorithms	Grafana + Metrictank
Graphite	Metrics pipeline	Apache 2	Python	Plugin based	Built-in graphing functions
OpenTSDB	Time series	GPL v2	Java	Client side + HBase	Grafana Plugin
QuestDB	Relational + Time series	Business source license	Java	Patented algorithms	SQL
Timestream	Time series	AWS Service	C++	Multiple encodings	None

We can see capabilities varying from open source to commercial offerings both self-managed and fully hosted as database services. Cost, complexity and control are different balancing factors compared to features supported.

InfluxData and TimescaleDB lead in providing robust enterprise scale open source time series databases. Kubernetes Operators and cloud hosted options are now available for simplifying management.

Architecting Time Series Infrastructure

While individual capabilities vary across offerings, architecting performant, durable and agile time series infrastructure requires bringing together:

Seamless data ingestion pipeline
Storage tier optimized for cost, scale and analytics
Accessibility through standards like SQL,REST
Consuming applications for monitoring like dashboards, notebooks and workflow automation

In the IoT and DevOps world, time series data now drives much of the decision making. Purpose-built time series databases unlock the potential of this vital business data.

Conclusion

Time series databases provide the critical base for building enterprise-wide monitoring and analytics platforms. Their ability to ingest, store and serve massive amounts of temporal data enables real-time metrics monitoring across 100s of thousands of data sources and trillions of measurements.

Integrating a dedicated time series database unlocks the true potential of metrics based monitoring – in DevOps, IoT, applications or infrastructure. Their unique storage and sampling optimizations lead to cost and performance benefits and deeper visibility compared to traditional logging or databases.

Leading open source options combined with emerging fully managed cloud services make it possible to focus on business value rather than complex data plumbing. Graphing integrations and SQL access empower entire organizations to benefit from metrics monitoring.

I hope this guide gave you clarity in navigating the time series databases landscape. Please let me know if you have any other questions!

An In-Depth Look at Time Series Databases for Metrics and Monitoring

What is Time Series Data

Purpose-Built for Sequential Data

1. InfluxDB

Key Features

2. Prometheus

Key Capabilities

3. TimescaleDB

Key Highlights

4. Graphite

Key Features

5. OpenTSDB

Features

6. QuestDB

Key highlights:

7. Amazon Timestream

Key capabilities:

Comparing Platforms

Architecting Time Series Infrastructure

Conclusion

Related