A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
As a Senior Engineer (DevOps / SRE), you will be at the forefront of shaping our technological landscape, driving innovation in software delivery and operational excellence. This role allows you to directly impact our engineering practices, helping to ensure our critical systems operate smoothly and continuously improve. Your focus will primarily be on enhancing our internal products and infrastructure, working collaboratively with various teams.
CI/CD Pipeline Management: Design, implement, and manage resilient, scalable, and stable CI/CD pipelines using tools like Tekton, GitLab CI, or GitHub Actions.
Kubernetes Operations: Lead the management and optimization of software deployments on Kubernetes with Helm, ensuring efficient resource utilization, high availability, and fault tolerance.
Observability Strategy: Develop and implement comprehensive observability solutions (logging, monitoring, alerting) for rapid root cause analysis and proactive issue resolution.
SRE Principles: Drive the integration of Site Reliability Engineering (SRE) best practices throughout the software lifecycle, fostering a culture of reliability and operational excellence.
To excel in this multifaceted role, we are seeking individuals who possess a robust foundation in software engineering coupled with a proven track record in building and maintaining resilient, high-performance systems.
Kubernetes Expertise: Experience in Kubernetes, including deployment, management (Helm), troubleshooting, and designing for high availability and resilience.
CI/CD Tooling: Experience with CI/CD tools, including Tekton, GitLab CI, or GitHub Actions, for building and releasing software.
Observability Solutions: Experience implementing and managing comprehensive observability solutions (e.g., Prometheus, Grafana, Loki, SignalFx, Splunk) for system health and troubleshooting.
Containerization: Good understanding of containerization technologies (e.g., Docker) and their strategic application in scalable environments.
Programming Skills: Experience with modern languages like Java or Python, with the ability to understand and contribute to existing codebases for operational needs and automation.
Database Experience: Experience with database systems, particularly PostgreSQL (AWS Aurora), including operational management and troubleshooting.
Beyond the core requirements, candidates with the following experiences will be particularly well-suited to contribute to our advanced initiatives and evolving technological ecosystem.
Cloud Platforms: Experience with major cloud platforms (e.g., IBM Cloud, AWS) and their services for cloud-native application deployment.
Infrastructure as Code: Familiarity with infrastructure as code tools (e.g., Terraform, Ansible) for automating infrastructure.
Network & Security: Strong understanding of network fundamentals, security best practices, and compliance in a cloud-native environment.