We are seeking an experienced Data Engineer with 4+ years of hands-on experience in data pipeline development, ETL processes, and cloud-based data platforms. The ideal candidate will have strong coding skills in Python, PySpark, and SQL, along with expertise in ETL tools such as Informatica, AWS Glue, Databricks, and GCP DataProc. This role requires end-to-end ownership of data engineering tasks, including designing, coding, testing, and deploying scalable pipelines while ensuring high-quality, secure, and accessible data across the organization.
Key Responsibilities
Design, develop, and maintain data pipelines for ingestion, wrangling, transformation, and integration from multiple sources (databases, APIs, cloud services, third-party providers).
Implement ETL processes to support efficient data movement, transformation, and storage.
Develop and manage data storage solutions, including relational databases, NoSQL databases, and data lakes.
Establish and maintain data quality checks and validation procedures to ensure accuracy, completeness, and consistency.
Collaborate with data analysts, data scientists, and business stakeholders to deliver reliable and accessible data solutions.
Ensure adherence to data warehousing principles, including Slowly Changing Dimensions (SCD) concepts.
Conduct unit testing and validate pipeline performance and accuracy.
Document source-to-target mappings, test cases, results, and operational processes.
Troubleshoot, debug, and resolve pipeline and production issues quickly.
Stay updated on the latest tools, cloud technologies, and best practices in big data and DevOps.
Required Skills & Qualifications
Bachelor’s degree in Computer Science, Information Technology, or related field.
4+ years of experience in full development lifecycle with strong focus on data engineering.
Proficiency in SQL, Python (preferred), PySpark for data manipulation and pipeline development.
Hands-on experience with ETL tools: Informatica, Talend, AWS Glue, Databricks, Apache Airflow, GCP DataProc, Azure ADF.
Experience with cloud platforms (AWS, Azure, GCP) and services like Glue, BigQuery, DataFlow, ADLS.
Strong understanding of data warehousing principles and schemas.
Experience in DevOps & IaC tooling: CDK, Terraform, CloudFormation, Git, CI/CD pipelines.
Familiarity with AWS services such as IAM, Lake Formation, Glue, and EC2.
Knowledge of automated testing methodologies, including TDD and BDD.
Ability to explain solutions to technical and non-technical stakeholders.
Strong problem-solving, debugging, and performance tuning skills.
Desired Skills
AWS Solutions Architect Certification.
Experience with RESTful APIs and microservice architectures.
Familiarity with SonarQube, Veracode or similar tools for code quality and security.
Exposure to Agile methodologies and working in collaborative DevOps environments.
Hands-on experience with data engineering practices in large-scale enterprise environments.