April 2, 20258 min read

ClickHouse vs InfluxDB for time-series metrics: what we learned

Real comparison of ClickHouse and InfluxDB for storing time-series metrics. Compression ratios, query performance, cost at scale — from building Kernus on ClickHouse.

clickhouseinfluxdbtime-seriesdatabaseengineering

When we were building Kernus, one of the first decisions we had to make was how to store time-series metric data. Container metrics — CPU, memory, network, restart counts — are an obvious time-series workload: high write volume, time-range queries, aggregation at multiple resolutions. We evaluated InfluxDB, TimescaleDB, and ClickHouse before committing to ClickHouse. This post is the honest account of what we found — including benchmarks, tradeoffs, and when InfluxDB is actually the better choice.

Why time-series storage is different from regular OLTP

A relational database like PostgreSQL works well for most application data — user records, transactions, config. But time-series data has characteristics that make naive OLTP design perform poorly:

Insert-heavy, rarely updated — metric data is written once and never modified. No updates, no deletes (usually). Just appends.
High cardinality dimensions — imagine storing CPU per container per host per org. You might have 10,000+ unique container IDs, each generating a data point every 30 seconds.
Time-range queries — almost every query is "give me metrics for container X between time A and time B at resolution Y." These queries need to skip non-matching time ranges quickly.
Aggregation at query time — you rarely want the raw data points; you want averages, max values, p95s over rolling windows.
Retention and downsampling — old high-resolution data should be compressed or deleted. You might keep 1-minute resolution for 7 days and 1-hour resolution for 30 days.

General-purpose databases can handle this, but their storage format and query optimizer aren't tuned for it. You end up with slow queries and much larger-than-necessary storage.

InfluxDB: purpose-built, developer-friendly

InfluxDB from InfluxData is the most widely-used purpose-built time-series database. It was designed from the ground up for this use case.

What InfluxDB does well

The data model is intuitive for time-series:

measurement: cpu_usage
  tags: container_id=abc123, host=prod-01, org_id=org_456
  fields: value=72.5
  timestamp: 1705312200000000000

Tags are indexed (fast filtering by container or host), fields are the numeric values you measure.

Flux query language (InfluxDB 2.x) is powerful for time-series operations:

from(bucket: "metrics")
  |> range(start: -24h)
  |> filter(fn: (r) => r.container_id == "abc123")
  |> aggregateWindow(every: 5m, fn: mean)
  |> yield(name: "mean")

This is more readable than PromQL for many queries, and the built-in time-window aggregation functions are excellent.

Native retention policies — define automatic data expiry per bucket. Old data is deleted without manual maintenance.

Easy setup and operation — InfluxDB has a clean UI for writing queries, visualizing data, and managing retention policies. For a smaller team getting started quickly, the onboarding experience is better than ClickHouse.

InfluxDB's limitations

Cost at scale: InfluxDB Cloud pricing is usage-based with costs that compound quickly. Their write/read credit model can get expensive for high-cardinality workloads. InfluxDB OSS (self-hosted) v2 works fine for small deployments but requires operational overhead.

Storage efficiency: In our benchmarks, InfluxDB compressed 1GB of raw container metrics to approximately 180-240MB (5-6x compression). Excellent by OLTP standards, but ClickHouse did significantly better.

High cardinality limits: InfluxDB's indexed tags have limits. As cardinality explodes (many unique container IDs, org IDs, host IDs), query performance degrades and storage efficiency drops.

Flux adoption: Flux is InfluxDB-specific. Engineers who know SQL need to learn a new language. InfluxDB 3.x is moving back toward SQL, which helps.

ClickHouse: columnar, fast, and surprisingly cheap

ClickHouse is a columnar OLAP database originally developed at Yandex for analytics workloads. It's not a time-series database by design, but its columnar architecture is an excellent fit for time-series data.

The columnar advantage

Traditional row-based databases store each row contiguously on disk:

[timestamp1, container_id_1, cpu_1, memory_1, ...]
[timestamp2, container_id_2, cpu_2, memory_2, ...]
[timestamp3, container_id_1, cpu_3, memory_3, ...]

When you query "average CPU for container_id_1 over the last day," you have to read every column of every row, even columns you don't need.

ClickHouse stores each column separately:

timestamps:     [t1, t2, t3, t4, ...]
container_ids:  [id1, id2, id1, id1, ...]
cpu_values:     [72.5, 45.2, 80.1, 78.9, ...]
memory_values:  [512, 384, 519, 492, ...]

To compute average CPU for container_id_1, ClickHouse reads only the timestamps, container_ids, and cpu_values columns — skipping memory_values, network_rx, and all other columns entirely. For time-series queries that typically access 2-3 of 20+ columns, this is a dramatic I/O reduction.

ClickHouse compression: the real differentiator

ClickHouse's default LZ4 compression on columnar data achieves exceptional compression ratios for time-series data because:

Similar values cluster together in columns (CPU values are often close to each other in a column)
LZ4 + delta encoding compresses time deltas and near-constant metric streams extremely well

In our production data:

Dataset	Raw size	InfluxDB	ClickHouse	CH compression ratio
1M CPU metric points	48 MB raw	~9 MB	~4.8 MB	10:1
1M memory metric points	48 MB raw	~8 MB	~4.1 MB	11.7:1
1M restart counts	48 MB raw	~10 MB	~3.2 MB	15:1
Mixed container metrics	1 GB raw	~180 MB	~95 MB	10.5:1

The 10:1 compression ratio is what made ClickHouse the right choice for Kernus. Storing 30 days of metrics for a Business plan organization costs us roughly $5-15/organization/month in storage — not $50-100.

ClickHouse query performance

For aggregation queries on time-range windows, ClickHouse is extremely fast:

-- Average CPU per 5-minute bucket for a container, last 24 hours
SELECT
    toStartOfFiveMinutes(collected_at) AS bucket,
    avg(cpu_percent) AS avg_cpu,
    max(cpu_percent) AS max_cpu
FROM container_metrics
WHERE
    org_id = 'org_abc123'
    AND container_id = 'ctr_xyz456'
    AND collected_at >= now() - INTERVAL 24 HOUR
GROUP BY bucket
ORDER BY bucket ASC

On a table with 500M rows, this query returns in under 100ms. InfluxDB would be competitive here but ClickHouse typically wins in raw aggregation speed at scale, especially with high-cardinality dimensions.

ClickHouse's costs and tradeoffs

It's not purpose-built for time-series: You need to implement retention yourself (using TTL expressions on tables), downsampling via materialized views, and there's no native "continuous query" concept.

-- TTL for automatic data expiry (7 days)
ALTER TABLE container_metrics
MODIFY TTL collected_at + INTERVAL 7 DAY;

The learning curve: ClickHouse is SQL, which helps. But its distributed mode, replication model, and optimization options require deeper expertise than InfluxDB's UI-guided experience.

Operational complexity: ClickHouse clusters are more complex to operate than InfluxDB OSS. For a single-node deployment (what we use), it's straightforward. Multi-node distributed ClickHouse is a different beast.

No native Grafana datasource by default (old): ClickHouse now has an official Grafana plugin and solid community support. This is less of a barrier than it was in 2022.

When to choose InfluxDB

You prioritize developer experience over raw storage efficiency — Flux is excellent for time-series queries once you learn it
You need built-in time-series features — continuous queries, retention policies with automatic downsampling, are first-class in InfluxDB
Smaller scale — under 100M data points/month, InfluxDB Cloud is straightforward and affordable
You're already using Telegraf — InfluxData's ecosystem (Telegraf collectors) is mature and plugs into InfluxDB cleanly
IoT and sensor data — InfluxDB was built with IoT in mind; the edge-to-cloud pipeline tools are excellent

When to choose ClickHouse

Storage cost is a primary constraint — 10:1 compression means 10× more data for the same storage bill
You have massive data volume — ClickHouse scales to petabytes; InfluxDB OSS doesn't
Your team knows SQL — ClickHouse's SQL dialect is familiar, no new language to learn
You need real-time analytics alongside metrics — ClickHouse can handle complex analytical queries that InfluxDB's Flux isn't designed for
You're building a multi-tenant SaaS — columnar partitioning and aggressive compression make multi-tenant data cost manageable

What we'd do differently

We chose ClickHouse early and it was the right call. The 10:1 compression ratio is the primary reason Kernus can offer pricing at $29-99/month for multi-host monitoring while remaining profitable.

If we were starting today, we'd make the same choice. The one thing we underestimated was the complexity of implementing downsampling via materialized views. With InfluxDB, this is a few lines of configuration. With ClickHouse, it's a set of materialized views with AggregatingMergeTree engines that need careful design.

We use three resolution levels:

Raw data: 1-per-collection-interval (kept for 7-30 days depending on plan)
5-minute averages: kept for 90 days
1-hour averages: kept for 90 days (Enterprise tier; aligns with plan retention)

Implementing this in ClickHouse correctly — without gaps, without double-counting, with efficient storage — took a few days of engineering that we wouldn't have needed with InfluxDB.

For how this storage architecture translates to product pricing: How we built a monitoring SaaS with 90% gross margins.

See what Kernus's ClickHouse-powered monitoring looks like →

Try Kernus free

Set up Docker monitoring in 2 minutes. Free for 1 host — no credit card required.

Start monitoring

← Back to all posts