• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Small teams
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
    • Team structure
    • Customer Success
    • Exec
    • Experimentation
    • Growth
    • Infrastructure
    • Marketing
    • People & Ops
    • Pipeline
    • Product Analytics
    • Session Recording
    • Website & Docs
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Side gigs
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • How to interview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Working with product design
      • Releasing a new version
      • Handling incidents
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • How-to access PostHog Cloud infra
      • Developing the website
      • MDX setup
      • Markdown
      • Jobs
      • Overview
      • Data storage or what is a MergeTree
      • Data replication
      • Data ingestion
      • Working with JSON
      • Query performance
      • Operations
        • Overview
        • sharded_events
        • app_metrics
        • person_distinct_id
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Paid features
    • Releasing as beta
    • Our philosophy
    • Product design process
    • Designing posthog.com
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
    • Growth reviews
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Small teams
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
    • Team structure
    • Customer Success
    • Exec
    • Experimentation
    • Growth
    • Infrastructure
    • Marketing
    • People & Ops
    • Pipeline
    • Product Analytics
    • Session Recording
    • Website & Docs
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Side gigs
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • How to interview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Working with product design
      • Releasing a new version
      • Handling incidents
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • How-to access PostHog Cloud infra
      • Developing the website
      • MDX setup
      • Markdown
      • Jobs
      • Overview
      • Data storage or what is a MergeTree
      • Data replication
      • Data ingestion
      • Working with JSON
      • Query performance
      • Operations
        • Overview
        • sharded_events
        • app_metrics
        • person_distinct_id
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Paid features
    • Releasing as beta
    • Our philosophy
    • Product design process
    • Designing posthog.com
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
    • Growth reviews
  • Handbook
  • Engineering
  • ClickHouse manual
  • Overview

ClickHouse

Last updated: Nov 14, 2022

On this page

  • About this manual
  • Why ClickHouse
  • Manual sections

Welcome to PostHog's ClickHouse manual.

About this manual

PostHog uses ClickHouse to power our data analytics tooling and we've learned a lot about it over the years. The goal of this manual is to share that knowledge externally and raise the average level of ClickHouse understanding for people starting work with ClickHouse.

If you have extensive ClickHouse experience, and want to contribute thoughts or tips of your own, please do by opening an PR or issue on GitHub!

Consider this manual a companion to other great resources out there:

  • ClickHouse Docs and Knowledge Base
  • Altinity's ClickHouse Knowledge Base
  • Tinybird's curated ClickHouse Knowledge Base

Why ClickHouse

In 2020, we had launched PostHog for the first time, were getting great early traction, but were struggling with scaling.

To solve this problem we looked at a wide range of OLAP solutions, including Pinot, Presto, Druid, TimescaleDB, CitusDB, and ClickHouse. Some of our team had used these tools before at other companies, such as Uber where Pinot and Presto are both used extensively.

While assessing each tool, we looked at on three main factors:

  • Speed: Our users want results in real-time, so our new database needed to scale well and give fast results. Ideally, it wouldn’t be too expensive either.
  • Complexity: PostHog users can self-host and install our product themselves, so we didn’t want it to be too complicated for users to manage or deploy. We didn’t want users to have to install an entire Hadoop stack, for example.
  • Query interface: We like standardised tools. We eliminated tools such as Druid because, while it does have a SQL wrap around it, it’s not exactly SQL. That can get messy.

ClickHouse was a good fit for all of these factors, so we started doing a more thorough investigation. We read up on benchmarks and researched the experience of companies such as Cloudflare that uses ClickHouse to process 6m requests per second. Eventually, we set up a test cluster to run our own benchmarks.

ClickHouse repeatedly performed an order of magnitude better than other tools we considered. We also discovered other perks, such as the fact that it is column-orientated and written in C++. We found these to be the key benefits of ClickHouse:

  • Compression: ClickHouse has excellent compression and the size-on-disk was incredible. ClickHouse even beat out serialization formats such as ORC and Parquet.
  • Process from disk: Some OLAP solutions, like Presto, require data to live in memory. That’s fast, but you need to have a lot of memory for big datasets. ClickHouse processes from disk, which is better for smaller instances too.
  • Real-time data updates: ClickHouse processes data as it arrives, so there’s no need to pre-aggregate data. It’s faster for us, and our users.

Eventually, we decided we knew enough to proceed and so we spun our test cluster out into an actual production cluster. It’s just part of how we like to bias for speed.

Now, ClickHouse powers all of our analytics features and we're happy with the path taken.

However knowledge on how to build on it and maintain it is more important than ever, bringing us to this manual.

Manual sections

  • Data storage or what is a MergeTree
  • Data replication and distributed queries
  • Data ingestion
  • Working with JSON
  • Query performance
  • Operations
  • Schema case studies
    • sharded_events
    • app_metrics
    • person_distinct_id

Questions?

Was this page useful?

Next article

Data storage or what is a MergeTree

This document covers the answers to the following questions: How data is stored on disk for MergeTree engine family tables What are parts , granules and marks How and why choosing the correct ORDER BY and PARTITION BY in table definitions affects query performance How to use EXPLAIN to understand what ClickHouse is doing Difference between PREWHERE and WHERE Data compression Introduction to MergeTree Why is ClickHouse so fast? states: ClickHouse was initially built as a prototype…

Read next article

Authors

  • Karl-Aksel Puulmann
    Karl-Aksel Puulmann
  • Ian Vanagas
    Ian Vanagas

Share

Jump to:

  • About this manual
  • Why ClickHouse
  • Manual sections
  • Questions?
  • Edit this page
  • Raise an issue
  • Toggle content width
  • Toggle dark mode
  • Product

  • Overview
  • Pricing
  • Product analytics
  • Session recording
  • A/B testing
  • Feature flags
  • Apps
  • Customer stories
  • PostHog vs...
  • Docs

  • Quickstart guide
  • Self-hosting
  • Installing PostHog
  • Building an app
  • API
  • Webhooks
  • How PostHog works
  • Data privacy
  • Using PostHog

  • Product manual
  • Apps manuals
  • Tutorials
  • Community

  • Questions?
  • Product roadmap
  • Contributors
  • Partners
  • Newsletter
  • Merch
  • PostHog FM
  • PostHog on GitHub
  • Handbook

  • Getting started
  • Company
  • Strategy
  • How we work
  • Small teams
  • People & Ops
  • Engineering
  • Product
  • Design
  • Marketing
  • Customer success
  • Company

  • About
  • Team
  • Investors
  • Press
  • Blog
  • FAQ
  • Support
  • Careers
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy policy
  • Terms