Site Reliability Engineer
Here we have an opportunity to work with a fintech giant on their site reliability engineering team, responsible for building a world-class, fault tolerant system, supporting the entire organisation. This team solve the most challenging engineering problems; build massively scalable software and systems; and architect low-latency infrastructure solutions.
Joining some of the original SRE pioneers, engineers in this team are responsible for the availability and reliability of the company’s most critical platforms and services, and ensure they meet the requirements of our internal and external users. Will collaborate with various business and technical remits to build and maintain sustainable production systems.
- Balance feature development velocity and reliability with well-defined SLOs.
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Drive incident management process and support a blameless post-mortems culture.
- Partner with development teams to improve services via rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and uplift.
Technical Experience and Knowledge
- BS degree in Computer Science or related technical field involving coding and/or systems engineering.
- Proficiency in one of more of the following: Go, Python, C, C++, Java, Perl, Ruby or shell scripting.
- Experience with algorithms, data structures, and software design.
- Experience with UNIX operating systems internals and/or networking.
- Experience with distributed systems design, maintenance, and troubleshooting.
- Hands-on experience with debugging and optimising code, as well as automation
- Coding beyond simple scripts; solving novel problems from first principles
Apply for this role
All fields marked with * are required.