Officer - Big Data Engineer - C11 - Hybrid - Chennai
Citigroup
Responsible for designing, developing, and optimizing data processing solutions using a combination of Big Data technologies. Focus on building scalable and efficient data pipelines for handling large datasets and enabling batch & real-time data streaming and processing.
Responsibilities:
> Develop Spark applications using Scala or Python (Pyspark) for data transformation, aggregation, and analysis.
> Develop and maintain Kafka-based data pipelines: This includes designing Kafka Streams, setting up Kafka Clusters, and ensuring efficient data flow.
> Create and optimize Spark applications using Scala and PySpark: They leverage these languages to process large datasets and implement data transformations and aggregations.
> Integrate Kafka with Spark for real-time processing: They build systems that ingest real-time data from Kafka and process it using Spark Streaming or Structured Streaming.
> Collaborate with data teams: This includes data engineers, data scientists, and DevOps, to design and implement data solutions.
> Tune and optimize Spark and Kafka clusters: Ensuring high performance, scalability, and efficiency of data processing workflows.
> Write clean, functional, and optimized code: Adhering to coding standards and best practices.
> Troubleshoot and resolve issues: Identifying and addressing any problems related to Kafka and Spark applications.
> Maintain documentation: Creating and maintaining documentation for Kafka configurations, Spark jobs, and other processes.
> Stay updated on technology trends: Continuously learning and applying new advancements in functional programming, big data, and related technologies.
Proficiency in:
**Hadoop** ecosystem big data tech stack(HDFS, YARN, MapReduce, Hive, Impala).
**Spark (Scala, Python)** for data processing and analysis.
Kafka for real-time data ingestion and processing.
ETL processes and data ingestion tools
Deep hands-on expertise in Pyspark, Scala, Kafka
Programming Languages:
Scala, Python, or Java for developing Spark applications.
SQL for data querying and analysis.
Other Skills:
Data warehousing concepts.
Linux/Unix operating systems.
Problem-solving and analytical skills.
Version control systems
------------------------------------------------------
**Job Family Group:**
Technology
------------------------------------------------------
**Job Family:**
Applications Development
------------------------------------------------------
**Time Type:**
Full time
------------------------------------------------------
**Most Relevant Skills**
Please see the requirements listed above.
------------------------------------------------------
**Other Relevant Skills**
For complementary skills, please see above and/or contact the recruiter.
------------------------------------------------------
_Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law._
_If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review_ _Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm)_ _._
_View Citi’s_ _EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy)_ _and the_ _Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf)_ _poster._
Citi is an equal opportunity and affirmative action employer.
Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.
Por favor confirme su dirección de correo electrónico: Send Email