Highly skilled Site Reliability Engineer (SRE) responsible for managing and maintaining a large-scale production IT environment supporting IBM product development. The ideal candidate will have a strong background in systems engineering, automation, and troubleshooting, with a focus on ensuring high availability, scalability, and performance of our critical systems.
Highly skilled Site Reliability Engineer (SRE) responsible for managing and maintaining a large-scale production IT environment supporting IBM product development. The ideal candidate will have a strong background in systems engineering, automation, and troubleshooting, with a focus on ensuring high availability, scalability, and performance of our critical systems.
Responsibilities
- Management, maintenance and support of various data storage solutions and environments.
- Oversee storage provisioning and day-to-day maintenance tasks.
- Provision virtual machines, apply patches and upgrade software.
- Analyze performance data, debug operational issues, perform capacity planning and ensure data is secured.
- Automate various day-to-day repetitive IT tasks.
- Collaborate with a global team to help provide 24x7 support for large-scale production IT environments.
- Think and act like a Site Reliability Engineer (SRE).
- Aims to automate tasks, monitor systems, and respond to incidents, improving system reliability and overall system quality.
- Experience in computer science (or similar) or equivalent experience.
- Strong skills in Linux administration/IT operations.
- Experience in structured IT environments and processes - Networking, SAN Storage, Compute, etc.
- Solid scripting and automation abilities.
- Excellent communication skills collaborating with clients, customers, third party support and supply vendors, internal stakeholders, and team members.
- Ability to provide on-call support as needed.
- Curiosity and inquisitive intuition and the ability to debug issues to find the root cause.
- Experience with General IT security compliance.
- Experience with IT Virtualization: POWER VM, KVM, VMware, and OpenShift Virtualization.
- Experience with IT Storage: Clustered Filesystems, NFS, Backup Systems, SAN Storage, and SAN Fabrics.
- Experience with IBM Products: Storage Scale (GPFS), Storage Protect (TSM), SAN Volume Controller, and POWER Hardware / HMC.
- Experience with Red Hat Products: Enterprise Linux, Directory Server, OpenShift, High Availability, and Ansible.
- Python, Perl, Shell Scripting, etc.
- Programming / Development experience in conjunction with Git / GitHub source control.