Site Reliability Engineer (SRE)

Europe, United Kingdom, London, APAC, Singapore
Permanent
Job ID: 1474

Job Description

[c. £150-250k Comp Package (or equivalent), Flexible Working Options]

Our client, a premier force in the global alternative investment sector with assets exceeding $63 billion, is actively seeking a Site Reliability Engineer (SRE) to enhance their innovative technology teams in Singapore and London. These opportunities are unparalleled, offering the chance to significantly influence a business that places a high value on technology and data as its core operational pillars.

In Singapore, the successful candidate will become the pioneering SRE hire in the Asia-Pacific region, marking a unique opportunity to shape the technological foundation and directly contribute to the company's accomplishments. This role is especially suited for an adaptable technologist eager to tackle a variety of challenges, distinguishing itself by requiring a versatile set of skills to navigate and optimise a complex technology environment.

Contrastingly, the London position demands a more focused skill set with an emphasis on specific platform-centric responsibilities.


Key Responsibilities:

  • Architecting a secure and robust Kubernetes platform to handle immense data volumes and diverse technology estate loads
  • Leading cloud-native, distributed application migration strategies
  • Streamlining on-prem stack integrations with hybrid Kubernetes setups
  • Advocating for and implementing "Infrastructure as Code" principles
  • Automating processes to reduce manual efforts in managing large systems
  • Independently and collaboratively seeking improvements in infrastructure reliability, availability, and performance


Key Requirements:

  • Advanced scripting/coding skills (Python, Golang, Shell)
  • Expertise in Kubernetes, Docker, and cloud-native technologies
  • Strong Linux systems knowledge, RHEL experience preferred
  • Proficiency with configuration management tools (Ansible, Puppet, Terraform, etc.)
  • Broad understanding of network, server virtualization, and storage technologies
  • Experience with observability systems (Prometheus, ELK, Jaeger)
  • Knowledge of distributed data platforms (Kafka, Flink, Airflow)
  • Proven ability to innovate, grasp new concepts quickly, and think outside the box
  • Commitment to improving system availability, security, and resilience
  • Excellent communication skills for building positive, collaborative relationships across teams and regions

Apply for this role

All fields marked with * are required.

I confirm I have a pre-existing Right to Work in this location *

Back to Job Listings