The IBM Software SaaS SRE team is seeking a full stack performance and monitoring engineer to join us in developing solutions that impact the success of a wide range of cloud offerings. The mission involves providing performance and monitoring support as well as developing software tools to support hundreds of clients and a large Hybrid Cloud infrastructure. This is an important technical role that will require participation in an evolving culture, designed to deliver software solutions working with different teams into a continually available environment.
- Deploy and support system health and performance monitoring solutions
- Design and implement new software features to support the tooling and framework
- Bug fixes from numerous internal stakeholders
- Cross-functional communication with impacted stakeholders
- Co-manage agents, servers, and soltutions used by multiple teams
- Experience with a monitoring solution like Instana or New Relic
- Experience with an RDBMS like mySQL and timeseries DB, preferably Influxdb
- Working knowledge of RHOS or similar containerization technologies
- Working knowledge of Redhat UNIX/Linux system administration
- Strong knowledge of UNIX/Linux systems shell scripting
- Experience with SCM systems like Git
- Working knowledge and/or experience with the Agile methodologies
- Experience with microservices architecture, regression testing and continuous deployment
- Software development experience in Java and Python and/or Golan
- Experience with AI driven automation
- Strong Documentation skills
- Experience with Grafana and Site Reliability Engineering (SRE)
- Familiarity with Maximo and/or Tririga would be a plus