A career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe. You'll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is
In this role, you'll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology.
A Data Engineer with expertise in Google's data platforms is responsible for designing, building, and maintaining data engineering solutions on Google's Cloud ecosystem. This includes utilizing services such as Google DataProc, DataFlow, PubSub, BigQuery, Big Table, Cloud Spanner, CloudSQL, and AlloyDB for batch and real-time data pipelines, data migration, and data layer design.
The Data Engineer should be proficient in using Google Cloud Storage, BigTable, BigQuery DataProc with Spark and Hadoop, Google DataFlow with Apache Beam or Python, and other open source technologies like Apache Airflow, dbt, Spark/Python, or Spark/Scala. Experience in developing and managing batch and real-time data pipelines for Data Warehouse and Datalake, as well as scheduling and managing the data platform using Google Cloud Scheduler and Cloud Composer (Airflow), is essential.
GCP Services: Cloud Storage, Pub/Sub, Workflows, Dataflow, Cloud Composer, Firestore, DataProc, Data Model understanding: Data model experience, Star, Snowflake Experience of building data model and best practises, star and snowflake wrt performance Cardinality Advanced SQL Window functions / Analytical functions, Derived tables, Pivot, rolling sum, dense rank, diff of rownum v dense rank, rank over partition by etc such multiple scenario BigQuery Big Query Transformation (Advanced SQL/PLSQL/Query Tuning & Optimization, Sprocs) When cache cannot be used-for non-deterministic functions Partitioning versus cluttering-when what, purpose of partition and cluster, Partitioning can be done on columns int64, date, timestamp | How to partition when such columns are not present?