Site Reliability Engineer

Europe, United Kingdom, London
Job ID: 1474

Job Description

Our client is a disruptive, innovative trading firm with over $50bn of AUM, passionate about technology and data and have built bleeding edge hardware and software solutions.

They require an enthusiastic Reliability Engineer to join a core team responsible for the technology which underpins everything the business does. This is a position where you’ll have direct impact on the success of the company. From scaling for the huge volumes of data that drive research process, to improving the reliability and speed of a rapidly evolving application estate, there is always a relentless focus on automation and efficiency at scale.

Successful candidates will be those who want to find unique solutions for optimising efficiency and performance in an environment where they are key enablers. Will need to have deep knowledge of Kubernetes as the platform is a growing presence and is critical to many parts of the business.

Role Responsibilities:

  • Collaboratively architecting a rock-solid and secure Kubernetes platform that can handle the huge volumes of data and load of a diverse technology estate
  • Accelerate the migration strategy to more cloud-native, distributed applications
  • Enhance and simplify the on-prem stack and its integrations with a hybrid Kubernetes setup
  • Create, implement, and evangelize the "Infrastructure as Code" mind-set and best practices across the environment
  • Eliminate the toil that emerges with large, distributed systems by automating where possible
  • Working as both an individual contributor and collaboratively to find new ways of improving the reliability, availability, and performance of the infrastructure

Technical Experience and Qualifications Required:

  • Expert level scripting/coding skills in one or more languages (Python/Golang/Shell etc.)
  • Expert in cloud native and containerisation technologies (Kubernetes/Docker)
  • Excellent Linux systems knowledge (experience with RHEL desirable)
  • Configuration management tools (Ansible/Puppet/Kapitan/Terraform etc.)
  • Broad knowledge across network technologies, server virtualisation and storage
  • Experience with observability systems (Prometheus/ELK/Jaeger etc.)
  • Experience with distributed data platforms (Kafka/Flink/Airflow etc.)
  • Self-starter, able to quickly pick up concepts, implement new ideas and think outside the box
  • Focused on improving system availability, security and resilience through testing, standardisation and automation

Apply for this role

All fields marked with * are required.

  I confirm that I have the right to work in this location. *

Back to Job Listings