This role is responsible for designing, deploying, and maintaining SQL and NoSQL-based infrastructures in support of high-volume, complex data transactions for specific services or groups of services. The position will be part of the SRE Data team and will primarily work with R&D, SRE and Data analysts. The work includes designing, building and deploying high availability, robust, resilient and supportable database solutions to support large volumes of data transactions. With a focus on the infrastructure and operational elements of designing and deploying a database solution, the SRE Data Engineer must ensure the databases are highly available, have sufficient capacity in place and are fully resilient across multiple data centres and cloud architectures.
- Manage MYSQL and NoSQL databases in Development/QA/Production environments including installation, configuration, backup, recovery, replication, upgrades, schema changes, etc.
- Perform database health monitoring and diagnostics.
- Integrate monitoring, auditing, and alert systems for databases with existing monitoring infrastructure
- Design, implement, maintain and automate the appropriate backup and recovery architecture as required
- Occasional off-shift availability to resolve Production issues
- Work closely with other members of the SRE and R&D teams
- Responsible for system performance and reliability
- Maintains up-to-date knowledge of database administration and applies these developments to other major projects and initiatives
- Ensure proactive engagement in the Incident Management process, working with Operational teams to minimize the impact of database outages
- Presents or makes recommendations on best practices regarding data management, data architecture, and data design
- Bachelor’s degree (Computer Science preferred)
- 3+ year’s experience as an SRE Data Engineer on the following databases
- Cassandra
- MySQL
- Elasticsearch or OpenSearch or Couchbase
- Redshift
- Experience with AWS Cloud environment
- 3+ years Experience in Python
- Experience in configuration management and CI\CD systems like Jenkins and Terraform
- Experience working in production environments requiring 99.99% availability
- Experience in deploying, administrating, tuning, monitoring, and maintaining database technologies
- Solid experience in database tuning, design, security, backup, recovery, and archival concepts and procedures
- Experience with monitoring systems
- Ability to write ETL, using Python or other languages
- Excellent communication skills include effectively communicating with technical, and non-technical employees and vendors
- Strong problem-solving, testing, and network troubleshooting skills