Site Reliability Engineer, IBM Corporation, Detroit, MI (Up to 40% telecommuting permitted)
- Manage cloud environments for stability, security, and satisfaction to ensure optimal customer experience.
- Set up and maintain multiple cloud staging/production environments in AWS and other major cloud providers.
- Collaborate with application and development teams and the Global Support Organization.
- Establish and enforce standards and procedures for the installation and maintenance of systems and data.
- Oversee and operate global customer environments to meet industry-leading targets for availability and quality.
- Coordinate with cross-functional teams (e.g., product management, engineering, solution architects) to deliver cloud-readiness capabilities and cross-product architectures.
- Observe and understand relevant cloud market trends and services to support the transformation and operation of full-stack enterprise applications in the cloud.
- Collect and review customer requirements to translate them into feature backlogs managed by product management.
- Partner with cloud hyperscalers (Azure, AWS, GCP) to jointly engineer and document cloud architectures, guiding customers on developing, deploying, and running application environments under mission-critical conditions, including security, high availability, recovery, sizing, scalability, and performance.
- Monitor cloud infrastructure, applications, and services to ensure high availability and performance.
- Manage backups, disaster recovery planning, and execution to protect data integrity.
- Develop and maintain scripts to automate cloud operations tasks such as provisioning, configuration, and scaling.
- Diagnose and troubleshoot cloud infrastructure and service issues to maintain reliability.
- Build and manage CI/CD pipelines to streamline software delivery.
- Generate reports on cloud cost trends and provide recommendations to assist stakeholders in decision-making.
- Utilize: Kubernetes, API Gateway, Developer Portal, CloudFlare, Monitoring tools: Prometheus, Grafana and Elastic Search, Terraform, Python, Cloud Technology.
Required: Master’s degree or equivalent in Computer Science, Computer Engineering or related (employer will accept a Bachelor's degree plus five (5) years of progressive experience in lieu of a Master’s degree) and one (1) year of experience as a ETL Developer or related. One (1) year of experience must include utilizing Kubernetes, API Gateway, Developer Portal, CloudFlare, Monitoring tools: Prometheus, Grafana and Elastic Search, Terraform, Python, Cloud Technology. $167421 per year. Full time. V214.
Master’s degree or equivalent in Computer Science, Computer Engineering or related (employer will accept a Bachelor's degree plus five (5) years of progressive experience in lieu of a Master’s degree) and one (1) year of experience as a ETL Developer or related. One (1) year of experience must include utilizing Kubernetes, API Gateway, Developer Portal, CloudFlare, Monitoring tools: Prometheus, Grafana and Elastic Search, Terraform, Python, Cloud Technology.