People
- Yakko Majuri (Full Stack Engineer)
- Karl-Aksel Puulmann (Full Stack Engineer)
- Harry Waye (Full Stack Engineer)
- Tiina Turban (Full Stack Engineer)
Mission
Provide the best events pipeline in the world.
Objectives
Objective: Events are ingested and processed reliably, accurately, and quickly on all deployments
- Key result:
- End-to-end P95 time to ingest events is under 1m30s for "normal events" and 2m30s for "buffer events" (note: includes 60s error bars)
- We ingest 99.999% of valid events
- Limit dead letter queue usage
- Ingestion on self-hosted deployments is horizontally scalable
- Why?
- For customers to trust our product they need to be confident in our ability to handle their volumes reliably
- Key result:
Objective: MVP for PostHog Customer Data Platform (CDP) with 5 happy customers
- Key results:
- Nail data exports
- Guaranteed job execution (99.999% of jobs processed)
- (Automated) testing framework for apps
- CDP UX
- Pipeline metrics
- Destination apps event filtering
- (Stretch) CDP-like UI
- Nail data exports
- Why?
- By removing the need for other CDPs, Posthog can save the customers money and potentially create a new revenue line.
- Key results:
Responsibilities
Team Ingestion owns our ingestion pipeline end-to-end. That means we own the Django server ingestion API, the ingestion (plugin) server, as well as our client libraries, Kafka and ClickHouse setup, where it pertains to event ingestion.
Our work generally falls into one of three categories:
Scaffolding to support core PostHog features
In order to achieve company goals or introduce new features (often owned by other teams), changes to our ingestion pipeline may be required.
An example of this is the work to remodel our events to store person and group data, which is essential to ensuring we can provide fast querying for users.
While querying data is not owned by this team, the change to enable faster queries inevitably requires a large restructuring of our events pipeline, and thus we are owners of that component of the project.
In short, a core responsibility of our team is to enable other teams to be successful.
Ingestion robustness
On the road to providing the best events pipeline in the world, we need to build a system that is robust.
To do so, we must ensure:
- Reliability: We should not lose events and events ingested should be correct
- Scalability: We should be able to scale to massive event volumes
- Maintainability: It should be easy to debug and contribute to our ingestion pipeline
Thus, it is our responsibility to consistently revise our past decisions and improve processes where we see fit, from client library behaviors to ClickHouse schemas.
Extensibility
Our ingestion pipeline is powerful because it allows for plugins to be built on top of it, to do things like transform and export events, and well as import data from third parties.
It is our responsibility to ensure that the extensibility of the pipeline does not interfere with ingestion robustness, as well as:
- Build new features to support plugin developers in building more powerful tools
- Ensure a delightful experience for plugin developers
How do we work?
We run a quick 15min standup on Monday, Wednesdays, and Fridays, and extend the slot if we feel the need to have a longer synchronous discussion about a specific topic. We document every standup on this doc.
We are happy to sync anytime if we feel it is important to do so. This is generally coordinated on Slack where someone will spontaneously drop a Zoom link. Some of the reasons we sync include: debugging outages, sharing context (including shadowing), making decisions when there's been a deadlock, and pairing sessions.
We work as a team. Our priorities are owned by the team, and we work together towards the same overall goal every sprint. It is inevitable that sometimes tasks will fall on one person or another, but we try hard to share context and collaborate as much as possible.