SRE Lead

United States, New York
Permanent
Job ID: 2495

Job Description


[Up to c. $500k Comp Package | Hybrid Working - 3 Days in Office]


Role Overview

We’re representing a global multi-strategy investment firm seeking an SRE Lead to take ownership of reliability engineering across a business-critical technology estate. This role will lead a distributed team across New York and London, improving production stability, observability, operational discipline and reliability standards across demanding front-office and firmwide platforms.

This is a hands-on technical leadership role, not a purely managerial position. The team is experienced, but the next phase requires someone who can bring structure, cohesion and strategic direction - moving the function from a DevOps-leaning model towards a more mature SRE discipline. You’ll need the technical gravitas to command respect from senior engineers, while working constructively with demanding business stakeholders to deliver a high-quality service. Longer term, this is a strong progression opportunity for someone capable of growing into broader platform engineering leadership....


Key Responsibilities

  • Bring structure to planning, prioritisation, delivery tracking and ownership across the team
  • Establish consistent SRE standards across monitoring, incident response, operational readiness and service ownership
  • Improve observability, alert quality, routing, metrics and performance visibility across the environment
  • Move the team towards a more proactive reliability model, reducing repeat issues and reactive support
  • Partner closely with business users, platform teams and engineering groups to improve service quality and resilience
  • Lead improvements across Kubernetes operations, including reliability, upgrades, capacity, networking and workload stability
  • Own reliability practices around critical distributed systems, including Kafka or similar messaging platforms
  • Strengthen automation, CI/CD and GitOps practices using Terraform, Ansible, GitLab and ArgoCD
  • Drive technical debt reduction and ensure recurring issues are addressed with durable fixes
  • Participate in on-call as a senior escalation point for high-severity production incidents
  • Track utilisation, cost and vendor performance across relevant SRE-owned services


What You’ll Bring…

  • 8-15 years’ experience across SRE, production engineering, platform reliability or infrastructure engineering
  • Proven experience leading senior engineers, either as a formal manager or technical lead
  • Strong technical credibility, with the ability to operate at or above the level of an experienced SRE team
  • Deep hands-on Kubernetes expertise across production operations, troubleshooting, upgrades, networking, RBAC, capacity and workload reliability
  • Strong automation and Infrastructure-as-Code experience using Terraform, Ansible or similar
  • Practical coding ability, ideally in Python, for tooling, automation and workflow improvement
  • Strong observability background, including monitoring standards, alert quality and incident response processes
  • Experience operating distributed systems, ideally Kafka or similar streaming/messaging platforms
  • Familiarity with CI/CD and GitOps workflows, ideally with GitLab, ArgoCD or comparable tooling
  • Experience across hybrid infrastructure environments, with AWS or similar public cloud exposure
  • Strong Linux systems knowledge and broader infrastructure troubleshooting capability
  • Opinionated technical judgement, balanced with the ability to bring others along constructively
  • Service-oriented mindset, with the ability to support demanding business needs while improving long-term platform quality
  • (Preferred) Experience with multi-region or multi-cluster reliability patterns, disaster recovery testing, or continuous service validation
  • (Preferred) Background in financial services, trading, large-scale SaaS or other production-critical environments


...


Apply for this role

All fields marked with * are required.

I confirm I have a pre-existing right to work in the role’s location *
I require visa sponsorship now or will require it in the future

Back to Job Listings