IBM is seeking a Site Reliability Engineer (SRE) to be a key player in ensuring the seamless operation of our quantum computing systems. Working closely with our development teams, you will design, build, and maintain critical systems, creating software and systems to manage, monitor, and scale our quantum computing platforms. Your expertise will contribute to the high availability, optimal performance, and efficient problem resolution of our technology.
· Ensure the reliability, scalability, and high availability of our quantum computing systems.
· Collaborate with development teams to design, deploy, and maintain quantum systems.
· Implement and maintain CI/CD pipelines using modern tools like Concourse, Tekton, and GitLab CI/CD.
· Monitor system performance using Grafana, Sysdig, LogDNA, Datadog, and other tools, troubleshoot and resolve issues.
· Develop and execute monitoring, load, and stress testing, ensuring system resilience.
· Implement security measures to safeguard system integrity, leveraging tools like Vault.
· Respond to system alerts using PagerDuty and similar tools to ensure swift issue resolution.
· Create and maintain comprehensive system documentation, utilizing Github for version control and collaboration.
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
- Proven experience as a Site Reliability Engineer or similar role in a software development setting.
- Proficiency in Python, Go (Golang), JavaScript, TypeScript, C++, or Rust: In-depth knowledge in at least one of these languages is required.
- Proficiency with Kubernetes and familiarity with service mesh technologies like Istio.
- Experience with GitOps and infrastructure as code tools such as ArgoCD, Ansible, and Terraform.
- Familiarity with cloud platforms like IBM Cloud, AWS, GCP, or Azure.
- Master's degree in Computer Science, Engineering, or a related field.
- Experience in a quantum computing environment.
- Advanced knowledge of Quantum Information Science principles and technologies.
- Experience with Helm for managing Kubernetes applications.
- Familiarity with the principles and practices of DevOps and Agile methodologies.
- Experience automating manual processes, customizing and optimizing CI/CD pipelines.
- Knowledge of database technologies such as PostgreSQL, MySQL, MongoDB, and InfluxDB.
- Certifications related to Kubernetes, Red Hat, or other relevant technologies.