Trivandrum
24 days ago
Lead I - Senior Data Engineer – PySpark - GCP or Any Cloud - SQL
Roles & Responsibilities: Development & Implementation

Design, build, and maintain large-scale batch and real-time data pipelines using PySpark, Spark, Hive, and related big data tools.

Write clean, efficient, and scalable code aligned with application design and coding standards.

Create and maintain technical documentation including design documents, test cases, and configurations.

Technical Leadership

Contribute to HLD, LLD, and data architecture documents.

Review and validate designs and code from peers and junior developers.

Lead technical discussions and decisions with cross-functional teams.

Data Management & Optimization

Optimize data processing workflows for efficiency, cost, and performance.

Manage data quality and ensure data accuracy, lineage, and governance across the pipeline.

Stakeholder Collaboration

Collaborate with product managers, data stewards, and business stakeholders to translate data requirements into robust engineering solutions.

Clarify requirements and propose design options to customers.

Testing & Quality Assurance

Write and review unit tests and integration tests to ensure data integrity and performance.

Monitor and troubleshoot data pipeline issues and ensure minimal downtime.

Agile Project Contribution

Participate in sprint planning, estimation, and daily stand-ups.

Ensure on-time delivery of user stories and bug fixes.

Drive release planning and execution processes.

Team Mentorship & Leadership

Set FAST goals and provide timely feedback to team members.

Mentor junior engineers, contribute to a positive team environment, and drive continuous improvement.

Compliance & Documentation

Ensure adherence to compliance standards such as SOX, HIPAA, and organizational coding standards.

Contribute to knowledge repositories, project wikis, and best practice documents.

Must-Have Skills:

Minimum 6+ years of experience as a Data Engineer.

Hands-on expertise in PySpark and SQL.

Experience in Google Cloud Platform (GCP) or similar cloud environments (AWS, Azure).

Proficient in Big Data technologies such as Spark, Hadoop, Hive.

Solid understanding of ETL/ELT frameworks, data warehousing, and data modeling.

Strong knowledge of CI/CD tools (Jenkins, Git, Ansible, etc.).

Excellent problem-solving and analytical skills.

Strong written and verbal communication skills.

Experience with Agile/Scrum methodologies.

Good-to-Have Skills:

Experience with data orchestration tools (Airflow, Control-M).

Familiarity with modern data platforms such as Snowflake, DataRobot, Denodo.

Experience in containerized environments (Kubernetes, Docker).

Exposure to data security, governance, and compliance frameworks.

Hands-on with Terraform, ARM Templates, or similar scripting tools for infrastructure automation.

Domain knowledge in banking, healthcare, or retail industries.

Por favor confirme su dirección de correo electrónico: Send Email