A career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe. You'll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is enabled by our strategic partner ecosystem and our robust technology platforms across the IBM portfolio
As an SRE in IBM Consulting, you'll serve as a leader in defining solutions for clients. You'll identify insights and tasks that can be automated.
You'll have the opportunity to identify points of improvement in technical processes and propose new ways to do it through automation, help our customer to resolve their pain points and, through co-creation, define solutions that allow improving the efficiency of their operations.
Your primary responsibilities include:
Strategic Design and Analysis of Distributed Systems: Design, analyze, and troubleshooting large-scale distributed systems.
Proactive Reliability Management and Incident Response: Participate in on-call rotation, engage with product teams to fix production outages, and carry forward action items to improve ongoing reliability.
Empowering Tools and Automation for Enhanced Reliability: Develop effective tooling, alerts, and response to both identify and address reliability risks including automatic problem detection and mitigation.
A Site Reliability Engineer (SRE) Level 2 is responsible for ensuring the reliability and availability of Cloud's services, both internal and external. They focus on designing, developing, testing, deploying, and maintaining software solutions to improve system reliability and uptime. SREs also work to automate operations, monitor systems for issues, and quickly address and resolve problems
Previous experience in:
1. Application monitoring including telemetry collection, alerts, SLOs using tools such as Azure Monitor or equivalent.
2. Experience configuring and deploying Azure cloud resources, e.g. Azure Kubernetes Service, CosmosDB, Azure Logic Apps, Azure Functions, Key Vault, Redis Cache, Storage, ServiceBus, App Gateway, etc.
3. Programming in Bash/Powershell, Python.
4. Java, Springboot, microservices coding experience.
5. Experience in Kubernetes and DevOps best practices, including configuration and maintenance of CI/CD implementation using Azure Devops or equivalent.
- Experience with Grafana and Prometheus
- Troubleshooting Skills
- Experienced in log query and aggregation to identify issues.
- Team Player - Be able to communicate with multiple teams