A career in IBM CIO means you’ll be part of a team that transforms IBM's capability to deliver to the marketplace. You will seek new possibilities and remain curious. we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
As a Junior Site Reliability Engineer, you will work in an agile, collaborative dynamic team to build, deploy, configure, and maintain systems for the IBM Internal Developer Experience. In this role, you will lead the problem resolution process for our developers, from analysis and troubleshooting, to deploying the latest software updates & fixes; which will grow your expertise in cloud-native operations and DevSecOps practices.
Your primary responsibilities include:
•24x7 Observability: Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience.
•Cross-Functional Troubleshooting: Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
•Deployment and Configuration: Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
•Security and Compliance Implementation: Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.
•Maintenance and Support: Tasks related to applying GitHub Enterprise, and Linux security patches and upgrades, supporting on-call rotation, and collaborating with GitHub Product support and other suppliers for issue resolution.
•System Monitoring and Troubleshooting: Strong skills in monitoring/observability, issue response, and troubleshooting for optimal system performance.
•Automation Proficiency: Proficiency in automation for production environment changes, streamlining processes for efficiency, and reducing toil (Python, Bash)
•Experience in CI/CD creation / maintenance: Proven track record configuring pipelines, testing, vulnerability scanning (Jenkins, Github Actions).
•Enterprise-grade platfom experience: Experience supporting enterprise-grade platforms (e.g. Github Enterprise)
•Source Code Management: Proficiency with github.com and CLI git.
•Linux Proficiency: Strong knowledge of Linux operating systems (Debian / RHEL).
•Operation and Support Experience: Demonstrated experience in handling day-to-day operations, alert management, incident support, migration tasks, and break-fix support.
•English: Fluent in written and spoken English.
•Kubernetes/OpenShift: Strongly preferred experience in working with production Kubernetes/OpenShift environments.
•Automation/Scripting: In depth experience with the Ansible, Python, Terraform, and CI/CD tools such as Jenkins, Github Actions.
•Monitoring/Observability: Hands on experience crafting alerts and dashboards using tools such as Instana, New Relic, Grafana/Prometheus and log aggregation querying.