You will play a pivotal role in developing and supporting application solutions, resolving production issues, and performing testing activities. Collaborating closely with client personnel and other teams, you will contribute to identifying functional requirements and designing tailored solutions.
Responsibilities:
1. Design, build, and maintain highly reliable and scalable systems in a hybrid multi cloud environment based on Red Hat OpenShift.
2. Collaborate with cross-functional teams to analyze business needs, advise, and design well-engineered information systems.
3. Focus on Automation, Reliability, Performance, and Security of services, ensuring a flawless experience for users.
4. Develop and implement strategies for system resilience, disaster recovery, and business continuity.
5. Contribute to the creation of learning plans and training materials to share best practices and experiences of practicing SRE.
6. Mentor and guide junior SREs, helping them to broaden their skills and improve visibility on their career path.
7. Stay updated with the latest trends and advancements in SRE, cloud technologies, and container development.
1. Proven experience as a Site Reliability Engineer or similar role, with a strong focus on Infrastructure as Code, Public Cloud architecture, and Container development.
2. Deep skills in multiple domains, including software, programming, data structures, algorithms, and systems.
3. Experience operating software on internal and external infrastructure at scale.
4. Strong understanding of automation, reliability, performance, and security principles.
5. Familiarity with IBM Cloud platform and services is a plus.
6. Excellent problem-solving skills and the ability to think holistically about application availability and reliability.
7. Strong collaboration skills and the ability to work effectively with cross-functional teams.
8. Willingness to share knowledge and experiences, contributing to the SRE Center of Excellence.
- Proficiency in Cloud Architecture Concepts (AWS, Azure, GCP).
- Extensive experience in Infrastructure as Code using tools like Terraform, Ansible, or CloudFormation.
- Expertise in containerization and orchestration using Docker and Kubernetes.
- Advanced knowledge of Red Hat OpenShift or similar container orchestration platforms.
- Strong command of scripting languages such as Python, Bash, or Go.
- Familiarity with monitoring tools like Prometheus, Grafana, ELK Stack, and log aggregation tools.