IBM Cloudant NoSQL DB is a globally distributed Database-as-a-Service (DBaaS) designed for developers of large and fast-growing applications. Hundreds of organisations around the world rely on Cloudant to keep their data online 24/7. This includes IBM Cloud itself, where Cloudant underpins many critical Cloud components. We help customers achieve their goals by providing a rock-solid, high availability database that can scale with their needs and replicate their data to many global locations to keep their applications performant and robust to failure.
Learn more about IBM Cloudant at https://www.ibm.com/products/cloudant.
Cloudant is looking for a talented Infrastructure Engineer to help manage, evolve and operate our global service infrastructure. The infrastructure team’s role is to keep our bare metal and Kubernetes infrastructure secure, healthy and performant. We play a key role across the product by providing a solid foundation to deliver Cloudant’s serverless database as a service.
As an engineer in the infrastructure team, you’ll be able to develop a deep expertise in the technologies that keep a large-scale cloud database online and available. You’ll help build automation to reduce the manual effort in managing our machines, contribute to the day-to-day maintenance of the systems, and ensure our infrastructure provides the right support for Cloudant’s key customer features and security standards. We prioritise engineer growth and have a lot of in-house experience to learn from.
We code primarily in Python and Ruby. Our infrastructure is a mixture of bare metal machines running Debian and Kubernetes, running on IBM’s Cloud. This is managed using Chef and Terraform, along with a lot of homegrown automation to tie it all together.
Over time, you will become a subject matter expert in our infrastructure and help out debugging and fixing service issues. This role involves on-call responsibilities.
· Some experience with managing Linux machines using SSH or configuration management / Infrastructure as Code tooling (eg
· Skills writing code in a modern backend language (eg, Python, Go, Ruby).
· A focus on creating reliable code using techniques like unit testing and staged rollout.
· Comfortable working using pull requests and continuous integration.
· Experience with observability tooling (eg, Graphite, Prometheus, Grafana).
· Strong written skills in English and an ability to work in a distributed team.
· Experience maintaining systems within a compliance environment (eg, financial services, tools such as Auditree).
· Previous experience as an SRE for a large-scale service, especially maintaining database and observability systems.
· Significant experience with Linux, including networking and storage debugging.
· Comfortable working with open-source tools, contributing fixes where needed.