Kubernetes Platform Engineer
Job Description
[Up to c. £350k Comp Package | Hybrid Working - 3 Days in Office]
Role Overview
We’re representing a leading quantitative research and technology firm that’s investing heavily in its next-generation infrastructure. As part of this expansion, they’re hiring Kubernetes specialists in London to design, operate, and evolve container platforms that power large-scale research and trading workloads. This is a hands-on role working with complex, high-demand compute environments - from GPU-heavy machine learning pipelines to latency-sensitive trading applications. You’ll be responsible for building secure, automated, and resilient Kubernetes platforms that scale globally, collaborating with engineering and research teams to deliver both immediate impact and long-term technical innovation. The team is open to both engineering-focused and operations-focused backgrounds, with roles available across design, automation, and day-to-day platform reliability.
Key Responsibilities
- Design, build, and manage Kubernetes-based platforms, ensuring performance, reliability, and security at scale
- Develop and maintain custom operators, controllers, and automation to extend Kubernetes beyond standard capabilities
- Implement secure multi-tenant environments through namespace isolation, RBAC, and policy enforcement frameworks
- Build and maintain CI/CD pipelines with GitOps tooling (e.g. ArgoCD, Flux) to ensure safe and auditable changes
- Use Infrastructure as Code (Terraform, Helm) to deliver consistent and repeatable infrastructure deployments
- Embed observability into systems from the ground up using Prometheus, Grafana, and OpenTelemetry
- Monitor and improve platform performance, troubleshooting complex issues across workloads and infrastructure layers
- Collaborate closely with internal stakeholders - refining requirements, improving developer experience, and challenging assumptions where needed
- Participate in on-call rotations, owning incidents end-to-end and sharing learnings openly in a blameless culture
- Capture and share knowledge through clear documentation, runbooks, and post-incident reviews
What You’ll Bring...
- Strong Linux systems background (5+ years) with practical experience managing containerised platforms
- Proficiency in Go or Python (experience writing Kubernetes controllers/operators highly valued)
- Proven ability to implement Kubernetes security best practices, including network segmentation and policy enforcement
- Strong understanding of Kubernetes internals, including CRDs, RBAC, scheduling, and custom controller patterns
- Hands-on experience with Helm and GitOps workflows in production environments
- Experience troubleshooting complex performance and availability challenges across clusters and workloads
- Familiarity with observability and monitoring stacks such as Prometheus, Grafana, and OpenTelemetry
- Excellent communication and collaboration skills, comfortable engaging with developers and platform users to deliver improvements
- Experience documenting processes, runbooks, or postmortems in production environments
- Familiarity with cloud or virtualised platforms (AWS EKS, OpenStack, VMware)
- (Preferred) Exposure to CNIs (e.g. Cilium) or container runtimes
- (Preferred) Understanding of service-level objectives (SLOs), error budgets, and reliability engineering practices
- (Preferred) Experience supporting GPU-intensive or HPC-style workloads (e.g. ML pipelines, LLMs, scientific computing)
...
Apply for this role
All fields marked with * are required.