Name: Techfellow
Price range: $

Observability Platform Engineer(s)

Europe, United Kingdom, London

Permanent

Job ID: 2390

Job Description

[Up to c. £300k Comp Package | Office-Led Working - 3 Days Remote Per Month]

Role Overview

We’re working with a research-driven quantitative technology firm on two remaining opportunities within their Observability platform team. The team is responsible for how telemetry is produced, transported, enriched and consumed across a highly complex, large-scale engineering environment - making Observability a core platform capability, not just a set of tools used after something breaks.

The two roles sit within the same team but carry slightly different emphasis. One is more focused on owning Observability across the full pipeline - OpenTelemetry, Prometheus, telemetry ingestion, backends, SaaS tooling and operational reliability. The other is more software/platform-led, focused on the producer side of Observability - building SDKs, libraries, collectors, integrations and shared engineering patterns that help teams emit high-quality metrics, logs and traces by default...

Key Responsibilities

Design, build and evolve Observability infrastructure across metrics, logs and traces, from telemetry production through to ingestion and backend consumption
Own and improve OpenTelemetry components, including SDKs, collectors, exporters, shared libraries and integrations used across engineering teams
Build reliable telemetry pipelines and data paths that improve consistency, routing, signal quality and long-term operability
Develop shared instrumentation patterns, APIs and “golden paths” that make it easier for teams to emit useful telemetry by default
Work with Prometheus-based systems, including writing and maintaining PromQL queries and improving metric quality
Support Observability platform deployments, migrations and integrations, including modern SaaS Observability tooling where relevant
Deploy and manage code and infrastructure using DevOps practices, including scripting, infrastructure as code and container-based delivery
Partner closely with software, platform and infrastructure teams to embed Observability expectations into service design
Improve incident diagnosis and recovery by expanding coverage, correlation, SLI/SLO thinking and failure analysis across telemetry sources
Contribute to future-looking work around streaming telemetry, event-based architectures, profiling, deeper signal collection and AI-assisted Observability
Take part in a measured on-call rota supporting critical Observability services as the team continues moving towards a stronger SRE model

What You’ll Bring...

Core experience across both roles:

5-10 years’ experience across Observability, platform engineering, DevOps, SRE or software engineering roles in distributed production environments
Genuine Observability depth - not just experience using dashboards or monitoring tools at a surface level
Hands-on OpenTelemetry experience, ideally across SDKs, collectors, instrumentation, libraries, exporters or pipeline design
Strong understanding of metrics, logs and traces, including how telemetry is produced, transported, stored and consumed at scale
Kubernetes experience, including deploying workloads, working with Helm or understanding container-based application patterns
Comfort with DevOps practices, including infrastructure as code, deployment automation and operating production services
Exposure to SRE concepts such as SLIs, SLOs, error budgets, incident reduction and operational resilience
A pragmatic engineering mindset - focused on usability, reliability, adoption and long-term maintainability

For the Observability/platform-focused role:

Strong experience with Prometheus and PromQL, including practical use of Prometheus-based systems in production
Experience owning telemetry pipelines from producers through to ingestion, backend routing and ongoing platform management
Ability to deploy, operate or migrate Observability platforms, including modern SaaS Observability tools
Strong scripting ability, ideally with Python, alongside infrastructure tooling such as Terraform, Ansible or similar
Solid understanding of distributed systems, failure modes, performance bottlenecks and production reliability

For the software/platform-focused role:

Strong software engineering ability in C# and/or Python, with comfort working across both where needed
Experience building shared libraries, SDKs, APIs, collectors or integrations used by multiple engineering teams
Good understanding of software architecture and system design, beyond isolated coding tasks
Ability to work closely with application teams to improve telemetry quality and embed Observability patterns into services
Interest in shaping future tooling direction as the organisation continues moving more towards Python

(Preferred experience):

Experience with Kafka, event streaming or telemetry pipeline tooling
Exposure to profiling, eBPF-based visibility tooling, synthetic monitoring or deeper runtime Observability
Familiarity with AI-assisted Observability, automated signal analysis or intelligent incident diagnosis

...

Apply for this role

All fields marked with * are required.

Your Name *

Your Email *

Your Nationality *

Contact Number *

I confirm I have a pre-existing right to work in the role’s location *

I require visa sponsorship now or will require it in the future

Upload your CV (PDF or Word file only) *

Observability Platform Engineer(s)

Apply for this job today