A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Your primary responsibilities include:
•24x7 Observability: Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience.
•Cross-Functional Troubleshooting: Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
•Deployment and Configuration: Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
•Security and Compliance Implementation: Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.
•Maintenance and Support: Tasks related to applying Couchbase security patches and upgrades, supporting Cassandra and Mongo for pager duty rotation, and collaborating with Couchbase Product support for issue resolution.
- 10+ years working in high-performance engineering team
- Experience in Cloud server management and troubleshooting, network, windows server management, Aws cloud and automation, cloud monitoring, GitHub, kubernetes, Linux,
- 10+ years of working knowledge with one or more operating systems: RHEL, CentOS Linux, and Windows Servers.
- Working knowledge with ServiceNow, JIRA, Confluent, and GitHub
- In-depth understanding and working knowledge with server technologies
- Working knowledge with how Virtualization, Network, and Storage technologies work in the data center and cloud environments
- Working knowledge with ServiceNow, JIRA, Confluent, and GitHub
- ITIL Foundation V4 certification is a plus
- Excellent verbal and written communication skills
- Highly responsible, motivated, able to work with little direction
- Ability to troubleshoot complex problems and customer issues