Spark developer (San Jose/CA)
IBM
**Introduction**
A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
We are seeking a skilled Software Developer to join our IBM Software team. As part of our team, you will be responsible for developing and maintaining high-quality software products, working with a variety of technologies and programming languages.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
**Your role and responsibilities**
We are seeking an experienced and highly skilled Spark Scala Developer.The candidate will have a deep understanding of distributed computing, data pipelines, and real-time and batch data processing architecture.Key Responsibilities:
*
Design, develop, and optimize big data applications using Apache Spark and Scala.
*
Architect and implement scalable data pipelines for both batch and real-time processing.
*
Collaborate with data engineers, analysts, and architects to define data strategies.
*
Optimize Spark jobs for performance and cost-effectiveness on distributed clusters.
*
Build and maintain reusable code and libraries for future use.
*
Work with various data storage systems like HDFS, Hive, HBase, Cassandra, Kafka, and Parquet.
*
Implement data quality checks, logging, monitoring, and alerting for ETL jobs.
*
Mentor junior developers and lead code reviews to ensure best practices.
*
Ensure security, governance, and compliance standards are adhered to in all data processes.
*
Troubleshoot and resolve performance issues and bugs in big data solutions.
**Required technical and professional expertise**
*
12+ years of total software development experience.
*
Minimum 5+ years of hands-on experience with Apache Spark and Scala.
*
Strong experience with distributed computing, parallel data processing, and cluster computing frameworks.
*
Proficiency in Scala with deep knowledge of functional programming.
*
Solid understanding of Spark tuning, partitions, joins, broadcast variables, and performance optimization techniques.
*
Experience with cloud platforms such as AWS, Azure, or GCP (especially EMR, Databricks, or HDInsight).
*
Hands-on experience with Kafka, Hive, HBase, NoSQL databases, and data lake architectures.
*
Familiarity with CI/CD pipelines, Git, Jenkins, and automated testing.
*
Strong problem-solving skills and the ability to work independently or as part of a team.
**Preferred technical and professional experience**
*
Experience with Databricks, Delta Lake, or Apache Iceberg.
*
Exposure to machine learning pipelines using Spark MLlib or integration with ML frameworks.
*
Experience with data governance tools (e.g., Apache Atlas, Collibra).
*
Contributions to open-source big data projects are a plus.
*
Excellent communication and leadership skills.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Por favor confirme su dirección de correo electrónico: Send Email