A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
We are seeking a skilled Spark Developer to join our IBM Software team. As part of our team, you will be responsible for developing and maintaining high-quality software products, working with a variety of technologies and programming languages.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
· Design, implement, and maintain distributed data processing pipelines using Apache Spark (Core, SQL, Streaming) and Scala.
· Work closely with data architects and business teams to develop efficient, scalable, and high-performance data solutions.
· Write clean, testable, and well-documented code that meets enterprise-level standards.
· Perform Spark performance tuning and optimization across batch and streaming jobs.
· Integrate data from multiple sources, including relational databases, APIs, and real-time data streams (Kafka, Flume, etc.).
· Collaborate in Agile development environments, participating in sprint planning, reviews, and retrospectives.
· Troubleshoot production issues and provide timely fixes and improvements.
· Create unit and integration tests to ensure solution integrity and stability.
· Mentor junior developers and help enforce coding standards and best practices
- 9+ years of experience in software development with a strong background in Scala and functional programming.
- 4-5+ years of recent hands-on experience in Apache Spark development (RDD, DataFrames, Datasets, Spark SQL).
- Experience with data storage solutions such as HDFS, Hive, Parquet, ORC, or NoSQL databases.
- Solid understanding of data modeling, data wrangling, and data quality best practices.
- Strong understanding of distributed systems, big data architecture, and performance tuning.
- Hands-on experience with at least one cloud platform (AWS, Azure, or GCP).
- Familiarity with CI/CD tools and version control systems like Git and Kafka, Airflow, or other ETL/streaming tools.
- Experience with Databricks, Delta Lake, or AWS EMR.
- Knowledge of SQL and experience working with RDBMS like PostgreSQL or MySQL.
- Exposure to containerization and orchestration tools like Docker and Kubernetes.
- Experience with agile methodologies and tools like JIRA, Confluence.
- Understanding of data security, encryption, and governance practices.
- Understanding of data lake and Lakehouse architectures.
- Knowledge of Python, Java, or other backend languages is a plus.
- Contributions to open-source big data projects are a plus.