At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk.
At IBM Quantum Software, we are looking for an experienced Data Engineer to help build the foundation of our data lake that powers data insights on our Quantum Computing platform. The candidate must demonstrate experience in designing and implementing data pipelines, data orchestration platforms, and data catalogs for data warehouses and data lakes. The candidate is familiar with the principals of data governance, scalable computing, and data quality monitoring, and has experience building data pipelines connecting to multiple services in a complex organization and adhering to best practices for data privacy and security. This role requires strong communication skills to be able to work with data owners and consumers of the data, such as business and technical teams. The candidate should have organizational skills in defining processes and methods to curate and present data in an easily consumable fashion to facilitate a self-serve data access pattern in a governed and secure manner.
To be successful as a data engineer, you should have a working knowledge of modern data lake and lake house concepts, and be well versed in data orchestration tools to support extraction, loading, and transformation of data.
* 5+ years of experience building data pipelines on large scale data warehouses or data lake solutions
* Strong experience with data orchestration tools like Apache Airflow, Prefect, Dagster, or DBT
* Strong command of relational databases (Postgresql preferred), data modeling and database design
* Experience delivering complex data pipelines end-to-end, including collaboration with stakeholders to understand the nature of data sources, building scalable processes to persist data in a data lake, and ensuring continued success of pipelines through monitoring and observability
* Experience with data security, access control, and data governance
* Ability to document processes and data, and familiarity with modern data catalog solutions
* Strong command of Python and Python scripting; experience using Python for Data Pipelines
* Familiarity with modern data warehouse and data lakes, including technologies like Presto, Trino, Spark, Iceberg, and/or Hive
* Strong communications skills to interact with technical and business teams to prepare data assets
- Experience with data pipeline observability
- Experience with streaming technologies like Kafka
- Experience with ingesting data from APIs like Github and Airtable
- Infrastructure and CI/CD Pipeline experience with technologies like Kubernetes, Helm, ArgoCD, and terraform
- SQL/NoSQL experience with other databases like MongoDB, IBM DB2 or similar
- Experience with cloud object storage and formats such as Parquet and Avro
- Ability to Design and implement a complete lifecycle for data operations (DataOps)