• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Docs

  • Overview
    • Quickstart with PostHog Cloud
    • Overview
      • AWS
      • Azure
      • DigitalOcean
      • Google Cloud Platform
      • Hobby
      • EU Hosting Companies
      • Other platforms
      • Instance settings
      • Environment variables
      • Securing PostHog
      • Monitoring with Grafana
      • Running behind a proxy
      • Configuring email
      • Helm chart configuration
      • Deploying ClickHouse using Altinity.Cloud
      • Configuring Slack
      • Overview
        • Overview
        • Upgrade notes
        • Overview
        • 0001-events-sample-by
        • 0002_events_sample_by
        • 0003_fill_person_distinct_id2
        • ClickHouse
          • Backup
          • Debug hanging / freezing process
          • Horizontal scaling (Sharding & replication)
          • Kafka Engine
          • Resize disk
          • Restore
          • Vertical scaling
        • Kafka
          • Resize disk
          • Log retention
        • PostgreSQL
          • Resize disk
          • Troubleshooting long-running migrations
        • Plugin server
        • MinIO
        • Redis
        • Zookeeper
      • Disaster recovery
    • Troubleshooting and FAQs
    • Support for self-hosting (open-source and enterprise)
    • Managing hosting costs
    • Overview
    • Ingest live data
    • Ingest historical data
    • Identify users
    • User properties
    • Deploying a reverse proxy
    • Library comparison
    • Badge
    • Browser Extensions
      • Snippet installation
      • Android
      • iOS
      • JavaScript
      • Flutter
      • React Native
      • Node.js
      • Go
      • Python
      • Rust
      • Java
      • PHP
      • Ruby
      • Elixir
      • Docusaurus v2
      • Gatsby
      • Google Tag Manager
      • Next.js
      • Nuxt.js
      • Retool
      • RudderStack
      • Segment
      • Sentry
      • Slack
      • Shopify
      • WordPress
      • Message formatting
      • Microsoft Teams
      • Slack
      • Discord
    • To another self-hosted instance
    • To PostHog from Amplitude
    • To PostHog Cloud EU
    • Between Cloud and self-hosted
    • Overview
    • Tutorial
    • Troubleshooting
    • Developer reference
    • Using the PostHog API
    • Jobs
    • Testing
    • TypeScript types
    • Overview
    • POST-only public endpoints
    • Actions
    • Annotations
    • Cohorts
    • Dashboards
    • Event definitions
    • Events
    • Experiments
    • Feature flags
    • Funnels
    • Groups
    • Groups types
    • Insights
    • Invites
    • Members
    • Persons
    • Plugin configs
    • Plugins
    • Projects
    • Property definitions
    • Session recordings
    • Trends
    • Users
    • Data model
    • Overview
    • Data model
    • Ingestion pipeline
    • ClickHouse
    • Querying data
    • Overview
    • GDPR guidance
    • HIPAA guidance
    • CCPA guidance
    • Data egress & compliance
    • Data deletion
    • Overview
    • Code of conduct
    • Recognizing contributions
  • Using PostHog

  • Table of contents
      • Dashboards
      • Funnels
      • Group Analytics
      • Insights
      • Lifecycle
      • Path analysis
      • Retention
      • Stickiness
      • Trends
      • Heatmaps
      • Session Recording
      • Correlation Analysis
      • Experimentation
      • Feature Flags
      • Actions
      • Annotations
      • Cohorts
      • Data Management
      • Events
      • Persons
      • Sessions
      • UTM segmentation
      • Team collaboration
      • Organizations & projects
      • Settings
      • SSO & SAML
      • Toolbar
      • Notifications & alerts
    • Overview
      • Amazon Kinesis Import
      • BitBucket Release Tracker
      • Event Replicator
      • GitHub Release Tracker
      • GitHub Star Sync
      • GitLab Release Tracker
      • Heartbeat
      • Ingestion Alert
      • Email Scoring
      • n8n Connector
      • Orbit Connector
      • Redshift Import
      • Segment Connector
      • Shopify Connector
      • Twitter Followers Tracker
      • Zendesk Connector
      • Airbyte Exporter
      • Amazon S3 Export
      • BigQuery Export
      • Customer.io Connector
      • Databricks Export
      • Engage Connector
      • GCP Pub/Sub Connector
      • Google Cloud Storage Export
      • Hubspot Connector
      • Intercom Connector
      • Migrator 3000
      • PagerDuty Connector
      • PostgreSQL Export
      • Redshift Export
      • RudderStack Export
      • Salesforce Connector
      • Sendgrid Connector
      • Sentry Connector
      • Snowflake Export
      • Twilio Connector
      • Variance Connector
      • Zapier Connector
      • Downsampler
      • Event Sequence Timer
      • First Time Event Tracker
      • Property Filter
      • Property Flattener
      • Schema Enforcer
      • Taxonomy Standardizer
      • Unduplicator
      • Automatic Cohort Creator
      • Currency Normalizer
      • GeoIP Enricher
      • Timestamp Parser
      • URL Normalizer
      • User Agent Populator
  • Tutorials
    • All tutorials
    • Actions
    • Apps
    • Cohorts
    • Dashboards
    • Feature flags
    • Funnels
    • Heatmaps
    • Path analysis
    • Retention
    • Session recording
    • Trends
  • Support
  • Glossary
  • Docs

  • Overview
    • Quickstart with PostHog Cloud
    • Overview
      • AWS
      • Azure
      • DigitalOcean
      • Google Cloud Platform
      • Hobby
      • EU Hosting Companies
      • Other platforms
      • Instance settings
      • Environment variables
      • Securing PostHog
      • Monitoring with Grafana
      • Running behind a proxy
      • Configuring email
      • Helm chart configuration
      • Deploying ClickHouse using Altinity.Cloud
      • Configuring Slack
      • Overview
        • Overview
        • Upgrade notes
        • Overview
        • 0001-events-sample-by
        • 0002_events_sample_by
        • 0003_fill_person_distinct_id2
        • ClickHouse
          • Backup
          • Debug hanging / freezing process
          • Horizontal scaling (Sharding & replication)
          • Kafka Engine
          • Resize disk
          • Restore
          • Vertical scaling
        • Kafka
          • Resize disk
          • Log retention
        • PostgreSQL
          • Resize disk
          • Troubleshooting long-running migrations
        • Plugin server
        • MinIO
        • Redis
        • Zookeeper
      • Disaster recovery
    • Troubleshooting and FAQs
    • Support for self-hosting (open-source and enterprise)
    • Managing hosting costs
    • Overview
    • Ingest live data
    • Ingest historical data
    • Identify users
    • User properties
    • Deploying a reverse proxy
    • Library comparison
    • Badge
    • Browser Extensions
      • Snippet installation
      • Android
      • iOS
      • JavaScript
      • Flutter
      • React Native
      • Node.js
      • Go
      • Python
      • Rust
      • Java
      • PHP
      • Ruby
      • Elixir
      • Docusaurus v2
      • Gatsby
      • Google Tag Manager
      • Next.js
      • Nuxt.js
      • Retool
      • RudderStack
      • Segment
      • Sentry
      • Slack
      • Shopify
      • WordPress
      • Message formatting
      • Microsoft Teams
      • Slack
      • Discord
    • To another self-hosted instance
    • To PostHog from Amplitude
    • To PostHog Cloud EU
    • Between Cloud and self-hosted
    • Overview
    • Tutorial
    • Troubleshooting
    • Developer reference
    • Using the PostHog API
    • Jobs
    • Testing
    • TypeScript types
    • Overview
    • POST-only public endpoints
    • Actions
    • Annotations
    • Cohorts
    • Dashboards
    • Event definitions
    • Events
    • Experiments
    • Feature flags
    • Funnels
    • Groups
    • Groups types
    • Insights
    • Invites
    • Members
    • Persons
    • Plugin configs
    • Plugins
    • Projects
    • Property definitions
    • Session recordings
    • Trends
    • Users
    • Data model
    • Overview
    • Data model
    • Ingestion pipeline
    • ClickHouse
    • Querying data
    • Overview
    • GDPR guidance
    • HIPAA guidance
    • CCPA guidance
    • Data egress & compliance
    • Data deletion
    • Overview
    • Code of conduct
    • Recognizing contributions
  • Using PostHog

  • Table of contents
      • Dashboards
      • Funnels
      • Group Analytics
      • Insights
      • Lifecycle
      • Path analysis
      • Retention
      • Stickiness
      • Trends
      • Heatmaps
      • Session Recording
      • Correlation Analysis
      • Experimentation
      • Feature Flags
      • Actions
      • Annotations
      • Cohorts
      • Data Management
      • Events
      • Persons
      • Sessions
      • UTM segmentation
      • Team collaboration
      • Organizations & projects
      • Settings
      • SSO & SAML
      • Toolbar
      • Notifications & alerts
    • Overview
      • Amazon Kinesis Import
      • BitBucket Release Tracker
      • Event Replicator
      • GitHub Release Tracker
      • GitHub Star Sync
      • GitLab Release Tracker
      • Heartbeat
      • Ingestion Alert
      • Email Scoring
      • n8n Connector
      • Orbit Connector
      • Redshift Import
      • Segment Connector
      • Shopify Connector
      • Twitter Followers Tracker
      • Zendesk Connector
      • Airbyte Exporter
      • Amazon S3 Export
      • BigQuery Export
      • Customer.io Connector
      • Databricks Export
      • Engage Connector
      • GCP Pub/Sub Connector
      • Google Cloud Storage Export
      • Hubspot Connector
      • Intercom Connector
      • Migrator 3000
      • PagerDuty Connector
      • PostgreSQL Export
      • Redshift Export
      • RudderStack Export
      • Salesforce Connector
      • Sendgrid Connector
      • Sentry Connector
      • Snowflake Export
      • Twilio Connector
      • Variance Connector
      • Zapier Connector
      • Downsampler
      • Event Sequence Timer
      • First Time Event Tracker
      • Property Filter
      • Property Flattener
      • Schema Enforcer
      • Taxonomy Standardizer
      • Unduplicator
      • Automatic Cohort Creator
      • Currency Normalizer
      • GeoIP Enricher
      • Timestamp Parser
      • URL Normalizer
      • User Agent Populator
  • Tutorials
    • All tutorials
    • Actions
    • Apps
    • Cohorts
    • Dashboards
    • Feature flags
    • Funnels
    • Heatmaps
    • Path analysis
    • Retention
    • Session recording
    • Trends
  • Support
  • Glossary
  • Docs
  • How PostHog works
  • ClickHouse

ClickHouse

Last updated: Aug 29, 2022

On this page

  • Events
  • kafka_events table
  • events_mv Materialized View
  • writable_events table
  • sharded_events table
  • events table
  • Persons

ClickHouse is our main analytics backend.

Instead of data being inserted directly into ClickHouse, it itself pulls data from Kafka. This makes our ingestion pipeline more resilient towards outages.

The following sections go more into depth in how this works exactly.

Events

In order to make PostHog easy to scale, we use a sharded ClickHouse setup.

clickhouse_events_json topic
reads from
pushes data to
pushes data to
reads from
kafka_events table
(Kafka table engine)
events_mv table
(Materialized view)
writable_events table
(Distributed table engine)
events table
(Distributed table engine)
sharded_events table
(ReplicatedReplacingMergeTree table engine)
Kafka

kafka_events table

kafka_events table uses the Kafka table engine

Tables using this engine set up Kafka consumers that consume data on read queries to the table, advancing the offset for the consumer group in Kafka.

events_mv Materialized View

events_mv table is a Materialized View.

In this case it acts as a data pipe which periodically pulls data from kafka_events and pushes it into the target (events) table.

writable_events table

writable_events table uses the Distributed table engine.

The schema looks something like as follows:

SQL
CREATE TABLE posthog.writable_events (
`uuid` UUID,
`event` String,
`properties` String,
`timestamp` DateTime64(6, 'UTC'),
`team_id` Int64,
`distinct_id` String,
`elements_hash` String,
`created_at` DateTime64(6, 'UTC'),
`_timestamp` DateTime,
`_offset` UInt64,
`elements_chain` String
) ENGINE = Distributed('posthog', 'posthog', 'sharded_events', sipHash64(distinct_id))

This table:

  • Gets pushed rows from events_mv table.
  • For every row, it calculates a hash based on the distinct_id column.
  • Based on the hash, sends the row to the right shard on the posthog cluster into the posthog.sharded_events table.
  • Does not contain materialized columns as they would hinder INSERT queries.

sharded_events table

sharded_events table uses the ReplicatedReplacingMergeTree.

This table:

  • Stores the event data.
  • Is sharded and replicated.
  • Is queried indirectly via the events table.

events table

Similar to writable_events, the events table uses the Distributed table engine.

This table is being queried from app and for every query, figures out what shard(s) to query and aggregates the results from shards.

Note that even though the ReplacingMergeTree engine is used, we should avoid writing duplicate data into the table, as deduplication is not a guarantee.

Persons

The source of truth for person info and person to distinct_id mappings is in PostgreSQL, but to speed up queries we replicate it to ClickHouse. Both tables use the ReplacingMergeTree and collapse by the version column, which is incremented every time a person is updated.

Note that querying both tables requires handling duplicated rows. Check out PersonQuery code for an example of how it's done.

In sharded setups, person and person_distinct_id tables are not sharded and instead replicated onto each node to avoid JOINs over the network.

Questions?

Was this page useful?

Next article

Querying data

This page provides a high-level overview of how queries are run when creating insights in PostHog. Note: This page does not cover all the intricacies of how queries are run in PostHog. Insights counting unique persons This section covers how PostHog determines the number of unique users who performed a certain action, such as when creating a Trends or Funnel insight. In this case, PostHog determines this number by counting the total number of unique person_id 's on events that match your…

Read next article

Authors

  • Tiina Turban
    Tiina Turban
  • Paul Hultgren
    Paul Hultgren

Share

Jump to:

  • Events
  • Persons
  • Questions?
  • Edit this page
  • Raise an issue
  • Toggle content width
  • Toggle dark mode
  • Product

  • Overview
  • Pricing
  • Product analytics
  • Session recording
  • A/B testing
  • Feature flags
  • Apps
  • Customer stories
  • PostHog vs...
  • Docs

  • Quickstart guide
  • Self-hosting
  • Installing PostHog
  • Building an app
  • API
  • Webhooks
  • How PostHog works
  • Data privacy
  • Using PostHog

  • Product manual
  • Apps manuals
  • Tutorials
  • Community

  • Questions?
  • Product roadmap
  • Contributors
  • Partners
  • Newsletter
  • Merch
  • PostHog FM
  • PostHog on GitHub
  • Handbook

  • Getting started
  • Company
  • Strategy
  • How we work
  • Small teams
  • People & Ops
  • Engineering
  • Product
  • Design
  • Marketing
  • Customer success
  • Company

  • About
  • Team
  • Investors
  • Press
  • Blog
  • FAQ
  • Support
  • Careers
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy policy
  • Terms