About the Role:
We are looking for a highly skilled and results-driven Data Engineer to join our dynamic data engineering team.
In this role, you will be responsible for designing, building, and maintaining real-time data pipelines and high-performance ETL workflows across cloud platforms such as AWS and Azure.
You will play a key role in managing large-scale data integration, ensuring data quality and security, and enabling advanced analytics for telecom and financial domains.
Key Responsibilities:
Design and deploy scalable, real-time data streaming pipelines using AWS Glue, Kinesis, and Teradata VantageCloud Lake. Build and maintain robust ETL workflows using Azure Data Factory and Synapse Analytics to process telecom and financial data exceeding 10TB. Optimize SQL performance with advanced techniques like secondary indexing, partition pruning, and temporary table usage in Teradata and Synapse. Automate data validation using Python to enforce schema integrity, null handling, and transformation audits. Implement data security measures including masking and role-based access control (RBAC) to ensure compliance with GDPR and internal policies. Develop web scraping tools using Selenium and BeautifulSoup to ingest semi-structured and dynamic web data into ETL pipelines. Automate deployment and server-side operations using Bash scripting, reducing manual intervention and streamlining engineering workflows. Collaborate with cross-functional teams to implement and monitor data solutions using CI/CD pipelines and modern orchestration tools.Key Projects:
CDN Data Processing Pipeline: Ingested and processed over 500GB daily using AWS Lambda, Kinesis, and Spark SQL with real-time monitoring and auto-error recovery.
LLM-Powered QA Validator: Designed an AI-based quality assurance pipeline using LangChain, vector stores, and LLM evaluation to reduce manual QA efforts by 60%.
Smart Migration Assistant: Built an LLM-based code migration tool to convert legacy workflows (Informatica, SAS) into modern frameworks like DBT and PySpark.
Qualifications:
Bachelor’s degree in Computer Science or a related field from a reputed university. 2+ years of experience in data engineering or backend development with hands-on expertise in cloud-native ETL systems. Strong programming skills in Python and SQL with a deep understanding of data structures and algorithms. Proficiency in working with AWS (Glue, Kinesis, Lambda), Azure (ADF, Synapse), and Teradata VantageCloud. Experience with LLM tools like LangChain, vector databases, and building AI-powered automation systems is a plus. Knowledge of data orchestration and DevOps tools like Airflow, Docker, Git, Kubernetes, Terraform, and Jenkins.Preferred Skills:
Exposure to GCP ecosystem (BigQuery, Dataflow). Understanding of data governance, compliance standards, and secure data handling (PII, GDPR). Familiarity with REST API integration and event-driven data architectures.Why Join Us?
Work with cutting-edge technologies in a cloud-first, AI-augmented environment. Contribute to high-impact projects for global telecom and financial clients. Flexible work arrangements and a strong culture of learning and innovation.