Kubernetes Platform Engineer

Europe, United Kingdom, London
Permanent
Job ID: 2325

Job Description


[Up to c. £350k Comp Package | Hybrid Working - 3 Days in Office]


Role Overview

We’re representing a leading quantitative research and technology firm that’s investing heavily in its next-generation infrastructure. As part of this expansion, they’re hiring Kubernetes specialists in London to design, operate, and evolve container platforms that power large-scale research and trading workloads. This is a hands-on role working with complex, high-demand compute environments - from GPU-heavy machine learning pipelines to latency-sensitive trading applications. You’ll be responsible for building secure, automated, and resilient Kubernetes platforms that scale globally, collaborating with engineering and research teams to deliver both immediate impact and long-term technical innovation. The team is open to both engineering-focused and operations-focused backgrounds, with roles available across design, automation, and day-to-day platform reliability.


Key Responsibilities

  • Design, build, and manage Kubernetes-based platforms, ensuring performance, reliability, and security at scale
  • Develop and maintain custom operators, controllers, and automation to extend Kubernetes beyond standard capabilities
  • Implement secure multi-tenant environments through namespace isolation, RBAC, and policy enforcement frameworks
  • Build and maintain CI/CD pipelines with GitOps tooling (e.g. ArgoCD, Flux) to ensure safe and auditable changes
  • Use Infrastructure as Code (Terraform, Helm) to deliver consistent and repeatable infrastructure deployments
  • Embed observability into systems from the ground up using Prometheus, Grafana, and OpenTelemetry
  • Monitor and improve platform performance, troubleshooting complex issues across workloads and infrastructure layers
  • Collaborate closely with internal stakeholders - refining requirements, improving developer experience, and challenging assumptions where needed
  • Participate in on-call rotations, owning incidents end-to-end and sharing learnings openly in a blameless culture
  • Capture and share knowledge through clear documentation, runbooks, and post-incident reviews


What You’ll Bring...

  • Strong Linux systems background (5+ years) with practical experience managing containerised platforms
  • Proficiency in Go or Python (experience writing Kubernetes controllers/operators highly valued)
  • Proven ability to implement Kubernetes security best practices, including network segmentation and policy enforcement
  • Strong understanding of Kubernetes internals, including CRDs, RBAC, scheduling, and custom controller patterns
  • Hands-on experience with Helm and GitOps workflows in production environments
  • Experience troubleshooting complex performance and availability challenges across clusters and workloads
  • Familiarity with observability and monitoring stacks such as Prometheus, Grafana, and OpenTelemetry
  • Excellent communication and collaboration skills, comfortable engaging with developers and platform users to deliver improvements
  • Experience documenting processes, runbooks, or postmortems in production environments
  • Familiarity with cloud or virtualised platforms (AWS EKS, OpenStack, VMware)
  • (Preferred) Exposure to CNIs (e.g. Cilium) or container runtimes
  • (Preferred) Understanding of service-level objectives (SLOs), error budgets, and reliability engineering practices
  • (Preferred) Experience supporting GPU-intensive or HPC-style workloads (e.g. ML pipelines, LLMs, scientific computing)


...


Apply for this role

All fields marked with * are required.

I confirm I have a pre-existing Right to Work in this location *

Back to Job Listings