SRE Team Lead

Europe, United Kingdom, London
Job ID: 1754

Job Description

Our client provides integrated front to back office services to investment managers and financial institutions worldwide. They’re looking for a highly-talented SRE to lead their UNIX/SRE team. This teams supports the global Linux and AWS estate and assists development teams in architecting and automating applications. The candidate will need to provide leadership over the team to provide support of existing services and evolve and grow the team into a modern highly effective SRE function.

Role Responsibilities:

  • Creation and development of an SRE function from existing sys admin and operations functions
  • Build, manage, and maintain the Linux/Unix based on premise and AWS production environments by monitoring availability and taking a holistic view of system health
  • Build and maintain software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of their suite of software solutions
  • Provide primary operational support and engineering for multiple in house developed and COTS software applications
  • Gather and analyse metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Participate in system design consulting, platform management, and capacity planning
  • Create sustainable systems and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objective
  • Provisioning and management of physical and virtual Linux servers, both on-premises and in EC2
  • Investigation and resolution of hardware and software issues, liaising with third parties where necessary
  • Assisting developers with application architecture, AWS hosted infrastructure, automation and CI/CD
  • Management of AWS infrastructure via Terraform Installation, configuration and management of a number of mostly developer-facing applications such as GitLab, Nexus and Graylog Management and evolution of application monitoring systems
  • Ongoing patching of Linux operating systems, applications and appliances Management of hardware and software refresh initiatives incl scoping, procurement, implementation/migration/decommissioning.
  • Monitoring, capacity planning and reporting as required

Technical Experience and Qualifications Required:

  • RHEL system administration and patching
  • Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • Unix shell scripting and automation - Strong Python scripting skills
  • Experience with orchestration tools such as Puppet and Terraform
  • Good working knowledge of core AWS services
  • Good general networking knowledge: TCP/IP, DNS, FTP, SFTP, NTP, iSCSI etc
  • Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
  • Nagios (or similar) administration
  • Strong working knowledge and administrative experience with Gitlab.
  • NetApp administration
  • Veritas NetBackup
  • IBM WebSphere MQ

Apply for this role

All fields marked with * are required.

  I confirm that I have the right to work in this location. *

Back to Job Listings