• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Small teams
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
    • Team structure
    • Customer Success
    • Exec
    • Experimentation
    • Growth
    • Infrastructure
    • Marketing
    • People & Ops
    • Pipeline
    • Product Analytics
    • Session Recording
    • Website & Docs
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Side gigs
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • How to interview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Working with product design
      • Releasing a new version
      • Handling incidents
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • How-to access PostHog Cloud infra
      • Developing the website
      • MDX setup
      • Markdown
      • Jobs
      • Overview
      • Data storage or what is a MergeTree
      • Data replication
      • Data ingestion
      • Working with JSON
      • Query performance
      • Operations
        • Overview
        • sharded_events
        • app_metrics
        • person_distinct_id
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Paid features
    • Releasing as beta
    • Our philosophy
    • Product design process
    • Designing posthog.com
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
    • Growth reviews
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Small teams
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
    • Team structure
    • Customer Success
    • Exec
    • Experimentation
    • Growth
    • Infrastructure
    • Marketing
    • People & Ops
    • Pipeline
    • Product Analytics
    • Session Recording
    • Website & Docs
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Side gigs
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • How to interview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Working with product design
      • Releasing a new version
      • Handling incidents
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • How-to access PostHog Cloud infra
      • Developing the website
      • MDX setup
      • Markdown
      • Jobs
      • Overview
      • Data storage or what is a MergeTree
      • Data replication
      • Data ingestion
      • Working with JSON
      • Query performance
      • Operations
        • Overview
        • sharded_events
        • app_metrics
        • person_distinct_id
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Paid features
    • Releasing as beta
    • Our philosophy
    • Product design process
    • Designing posthog.com
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
    • Growth reviews
  • Handbook
  • Engineering
  • ClickHouse manual
  • Working with JSON

Working with JSON

Last updated: Nov 14, 2022

On this page

  • JSON Strings
  • Compressing JSON
  • Materialized columns
  • Operational notes
  • Alternative solutions
  • Arrays
  • Semi-structured data / JSON data type

At PostHog, we store arbitrary payloads users send us for further analysis as JSON. As such, it's critical we do a good job at storing and analyzing this data.

This document covers:

  • Storing JSON in Strings and operations on them
  • How and why to compress this data
  • Materialized columns
  • Alternative solutions: JSON data type, arrays

JSON Strings

At PostHog, we store JSON data as VARCHAR (or String) columns.

Relevant properties are then parsed out from query-time using JSONExtract functions.

This has the following problems:

  1. These columns end up really large even after compression, meaning slow I/O
  2. It requires CPU parse properties
  3. Data is not stored optimally. As an example, JSON keys are frequently repeated and numbers are stored as strings.

Compressing JSON

Luckily, JSON compresses really well, speeding up reading this data from disk.

By default our JSON columns are compressed by the LZ4 algorithm. See benchmarks for more information and benchmarks.

Materialized columns

ClickHouse has support for Materialized columns which are columns calculated dynamically based off of other columns.

We leverage them to dynamically create new columns for frequently-queried JSON keys to speed up queries as each materialized column is stored the same way as normal columns and requires less resources to read and parse.

Read more in our blog and in this guide for PostHog specific details.

Operational notes

After adding a materialized column, it is only populated for new data and on merges. When querying old data, this can introduce performance regressions, so forcing the column to be written to disk, even for historical data, is recommended.

Materialized columns may cause issues during operations - e.g. they can make copying data between tables painful. It's sometimes worth considering dropping them before large operations.

Alternative solutions

Arrays

Uber published an article on their logging, popularizing the idea to store JSON data as arrays: one for keys, one for values.

However internal benchmarking showed that in our use-case the improvement wasn't big enough to be worth the investment (yet)..

Semi-structured data / JSON data type

In 2022, ClickHouse released support for semi-structured data.

However after testing we encountered several fundamental problems which make this feature unusable in our case until they are resolved: 1, 2, 3, 4 and 5

Next in the ClickHouse manual: Query performance

Questions?

Was this page useful?

Next article

Query performance

This document goes over: what tools are available to understand and measure query performance importance of page cache general tips and tricks for performant queries Tooling clickhouse-client clickhouse-client is a command-line application for running queries against ClickHouse. When executing queries, it details progress, execution time, how many rows and gigabytes of data were processed, and how much CPU was used. system.query_log ClickHouse saves all queries it runs into system.query_log…

Read next article

Authors

  • Karl-Aksel Puulmann
    Karl-Aksel Puulmann
  • Ian Vanagas
    Ian Vanagas

Share

Jump to:

  • JSON Strings
  • Compressing JSON
  • Materialized columns
  • Alternative solutions
  • Arrays
  • Semi-structured data / JSON data type
  • Questions?
  • Edit this page
  • Raise an issue
  • Toggle content width
  • Toggle dark mode
  • Product

  • Overview
  • Pricing
  • Product analytics
  • Session recording
  • A/B testing
  • Feature flags
  • Apps
  • Customer stories
  • PostHog vs...
  • Docs

  • Quickstart guide
  • Self-hosting
  • Installing PostHog
  • Building an app
  • API
  • Webhooks
  • How PostHog works
  • Data privacy
  • Using PostHog

  • Product manual
  • Apps manuals
  • Tutorials
  • Community

  • Questions?
  • Product roadmap
  • Contributors
  • Partners
  • Newsletter
  • Merch
  • PostHog FM
  • PostHog on GitHub
  • Handbook

  • Getting started
  • Company
  • Strategy
  • How we work
  • Small teams
  • People & Ops
  • Engineering
  • Product
  • Design
  • Marketing
  • Customer success
  • Company

  • About
  • Team
  • Investors
  • Press
  • Blog
  • FAQ
  • Support
  • Careers
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy policy
  • Terms