Trivandrum
17 days ago
Data Engineer – GCP, Airflow & PySpark,BigQuery
Key Responsibilities:

Design, develop, and automate scalable data processing workflows using Apache Airflow, PySpark, and Dataproc on Google Cloud Platform (GCP).

Build and maintain robust ETL pipelines to handle structured and unstructured data from multiple sources and formats.

Manage and provision GCP resources including Dataproc clusters, serverless batches, Vertex AI instances, GCS buckets, and custom images.

Provide platform and pipeline support for analytics and product teams, resolving issues related to Spark, BigQuery, Airflow DAGs, and serverless workflows.

Collaborate with data scientists, data analysts, and other stakeholders to understand data requirements and deliver reliable solutions.

Deliver prompt and effective technical support to internal users for data-related queries and challenges.

Optimize and fine-tune data systems for performance, cost-efficiency, and reliability.

Conduct root cause analysis for recurring pipeline/platform issues and work with cross-functional teams to implement long-term solutions.

Must-Have Skills:

Strong programming expertise in Python and SQL

Deep hands-on experience with Apache Airflow (including Astronomer)

Strong experience with PySpark, SparkSQL, and Dataproc

Proven knowledge and implementation experience on GCP data services:

BigQuery, Vertex AI, Pub/Sub, Cloud Functions, GCS

Strong troubleshooting skills related to data pipelines, Spark job failures, and cloud data environments

Familiarity with data modeling, ETL best practices, and distributed systems

Ability to support and optimize large-scale batch and streaming data processes

Good-to-Have Skills:

Experience with SQL dialects like HiveQL, PL/SQL, and SparkSQL

Exposure to serverless data processing and ML model deployment workflows (using Vertex AI)

Familiarity with Terraform or Infrastructure-as-Code (IaC) for provisioning GCP resources

Knowledge of data governance, monitoring, and cost control best practices on GCP

Previous experience in healthcare, retail, or BFSI domains involving large-scale data platforms

Educational Qualification:

Bachelor's or Master's degree in Computer Science, Information Technology, or a related field

Certifications in GCP Data Engineer, GCP Professional Cloud Architect, or Apache Spark are a plus

Por favor confirme su dirección de correo electrónico: Send Email