• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Small teams
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
    • Team structure
    • Customer Success
    • Exec
    • Experimentation
    • Growth
    • Infrastructure
    • Marketing
    • People & Ops
    • Pipeline
    • Product Analytics
    • Session Recording
    • Website & Docs
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Side gigs
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • How to interview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Working with product design
      • Releasing a new version
      • Handling incidents
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • How-to access PostHog Cloud infra
      • Developing the website
      • MDX setup
      • Markdown
      • Jobs
      • Overview
      • Data storage or what is a MergeTree
      • Data replication
      • Data ingestion
      • Working with JSON
      • Query performance
      • Operations
        • Overview
        • sharded_events
        • app_metrics
        • person_distinct_id
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Paid features
    • Releasing as beta
    • Our philosophy
    • Product design process
    • Designing posthog.com
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
    • Growth reviews
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Small teams
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
    • Team structure
    • Customer Success
    • Exec
    • Experimentation
    • Growth
    • Infrastructure
    • Marketing
    • People & Ops
    • Pipeline
    • Product Analytics
    • Session Recording
    • Website & Docs
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Side gigs
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • How to interview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Working with product design
      • Releasing a new version
      • Handling incidents
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • How-to access PostHog Cloud infra
      • Developing the website
      • MDX setup
      • Markdown
      • Jobs
      • Overview
      • Data storage or what is a MergeTree
      • Data replication
      • Data ingestion
      • Working with JSON
      • Query performance
      • Operations
        • Overview
        • sharded_events
        • app_metrics
        • person_distinct_id
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Paid features
    • Releasing as beta
    • Our philosophy
    • Product design process
    • Designing posthog.com
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
    • Growth reviews
  • Handbook
  • Engineering
  • ClickHouse manual
  • Query performance

Query performance

Last updated: Nov 14, 2022

On this page

  • Tooling
  • clickhouse-client
  • system.query_log
  • EXPLAIN
  • Flame graphs
  • Importance of the page cache
  • Effect on benchmarking
  • Tips for achieving well-performing queries

This document goes over:

  • what tools are available to understand and measure query performance
  • importance of page cache
  • general tips and tricks for performant queries

Tooling

clickhouse-client

clickhouse-client is a command-line application for running queries against ClickHouse.

When executing queries, it details progress, execution time, how many rows and gigabytes of data were processed, and how much CPU was used.

clickhouse-client progress reporting

system.query_log

ClickHouse saves all queries it runs into system.query_log table.

It includes information on:

  • What query was run and when
  • How long did it take to execute
  • How many resources did it take up: memory, rows/bytes read
  • In case of errors, exception information
  • For distributed queries, is_initial_query to disambiguate distributed queries

At PostHog, we also often add metadata to the query as a comment to make it easier to analyze later.

An example query to get recent slow queries:

SQL
SELECT
query_duration_ms,
query,
event_time,
read_rows,
formatReadableSize(read_bytes) as read_size,
result_rows,
formatReadableSize(result_bytes) as result_size,
formatReadableSize(memory_usage) as memory,
columns,
query_id
FROM system.query_log
WHERE
query NOT LIKE '%query_log%'
AND type = 'QueryFinish'
AND event_time > now() - interval 3 day
AND query_duration_ms > 30000
ORDER BY query_duration_ms desc
LIMIT 10

Note that this table is not distributed - on a cluster setting you might need to run query against each node separately or do ad-hoc distributed queries.

EXPLAIN

Previous pages in this manual showed various examples of using the ClickHouse EXPLAIN statement to your advantage.

Various forms of explain can detail:

  • If and how much data ClickHouse was able to avoid processing thanks to schema setup
  • If and how ClickHouse "optimizes" the query by moving columns to PREWHERE
  • Detailing how the query is planned to be executed

Read more about EXPLAIN in ClickHouse's EXPLAIN Statement docs.

Flame graphs

For CPU-bound calculations, flamegraphs can help visualize what ClickHouse worked on during query execution.

We've built flamegraph support into PostHog. You can find tools to generate flamegraphs for queries under PostHog instance settings.

Importance of the page cache

When running queries, you might encounter an odd artifact: the first time you run a query, it's really slow but it speeds up significantly when run again.

This behavior is due to page cache in linux. In broad terms, the operating system caches recently read files into memory, speeding up subsequent reads of the same data.

As most queries in ClickHouse are dependent on fast I/O to execute fast, this can have a significant effect on query performance. It is a reason why at PostHog our ClickHouse nodes have a lot of memory available.

Effect on benchmarking

This behavior can be a problem for profiling: users constructing new queries might not hit the page cache and receive a worse experience than benchmarking may show.

This means it's often important to wipe page cache on ClickHouse when doing queries. This can be achieved with the following command on a ClickHouse node:

Terminal
sudo sh -c "/usr/bin/echo 3 > /proc/sys/vm/drop_caches"

Tips for achieving well-performing queries

Previous pages in the ClickHouse manual have highlighted the importance of setting up the correct schema and detailing how queries work in a distributed setting.

This section highlights some general rules of thumb that can help speed up queries:

  • Make sure query is supported by ORDER BY. This is frequently the single biggest factor in query performance.
  • Avoid JOINs and sub-queries whenever possible. Denormalizing data can help here.
  • Partition data so old partitions can be skipped.
  • Push down any and all WHERE conditions to in queries as deep as possible, as ClickHouse won't do it for you.
  • Avoid reading columns that are not needed for the query.
  • Test if moving small column filters to PREWHERE helps performance.
  • Avoid reading large columns in PREWHERE.
  • Compress columns with appropriate codecs.
  • Make use of the extensive catalogue aggregation functions available in ClickHouse.
  • Looking up single rows is frequently slow.
  • ORDER BY and LIMIT/OFFSET can be slow in sharded settings.
  • Cheat: preaggregate data or leverage materialized columns to pull out data from large columns.
  • Leverage query settings in special cases.

Also always do your benchmarking in a realistic setting: on large datasets on powerful machines.

Next in the ClickHouse manual: Operations

Questions?

Was this page useful?

Next article

Operations

This document gives an overview of the kitchen side of ClickHouse: how various operations work, what tricky migrations we have experience with as well as various settings and tips. Settings ClickHouse provides daunting amounts of configuration on all levels. This section provides information on the different kind of settings and how to configure them. Query settings Query settings allow to manipulate the behavior of queries, for example setting limits on query execution time and resource usage…

Read next article

Authors

  • Karl-Aksel Puulmann
    Karl-Aksel Puulmann
  • Ian Vanagas
    Ian Vanagas

Share

Jump to:

  • Tooling
  • clickhouse-client
  • system.query_log
  • EXPLAIN
  • Flame graphs
  • Importance of the page cache
  • Effect on benchmarking
  • Tips for achieving well-performing queries
  • Questions?
  • Edit this page
  • Raise an issue
  • Toggle content width
  • Toggle dark mode
  • Product

  • Overview
  • Pricing
  • Product analytics
  • Session recording
  • A/B testing
  • Feature flags
  • Apps
  • Customer stories
  • PostHog vs...
  • Docs

  • Quickstart guide
  • Self-hosting
  • Installing PostHog
  • Building an app
  • API
  • Webhooks
  • How PostHog works
  • Data privacy
  • Using PostHog

  • Product manual
  • Apps manuals
  • Tutorials
  • Community

  • Questions?
  • Product roadmap
  • Contributors
  • Partners
  • Newsletter
  • Merch
  • PostHog FM
  • PostHog on GitHub
  • Handbook

  • Getting started
  • Company
  • Strategy
  • How we work
  • Small teams
  • People & Ops
  • Engineering
  • Product
  • Design
  • Marketing
  • Customer success
  • Company

  • About
  • Team
  • Investors
  • Press
  • Blog
  • FAQ
  • Support
  • Careers
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy policy
  • Terms