Job Description:
Seeking a data engineer to develop and manage data pipelines for ingesting, transforming, and joining data from diverse sources. The role requires expertise in ETL tools like Informatica, Glue, and Databricks, with strong coding skills in Python, PySpark, and SQL. The ideal candidate will be proficient in data warehousing solutions such as Snowflake and BigQuery, and capable of optimizing performance and managing costs. This position also involves collaborating on architecture design, documenting project milestones, and mentoring team members. A solid understanding of DevOps, infrastructure needs, and data security is essential for success. The role also requires the ability to proactively identify and resolve defects, ensuring adherence to quality standards and timelines. Strong communication skills are needed to interface with customers and project teams.
Mandatory Skillset:
ETL Tools: Informatica, AWS Glue, Databricks, GCP DataProc/Dataflow, Azure ADF.
Programming Languages: Python, PySpark, and SQL (with strong knowledge of windowing functions).
Cloud Platforms: AWS, Azure, or Google Cloud (specifically data-related services).
Data Warehousing: Snowflake, BigQuery, Delta Lake, or Lakehouse architecture.
Databases: Relational databases, NoSQL, and strong Oracle SQL/PLSQL experience.
DevOps & CICD: Understanding of infrastructure needs, DevOps practices, and tools like Jenkins.
Data Concepts: Data modeling, schema design, performance tuning, and cost optimization.
Problem-Solving: Defect root cause analysis and proactive mitigation.
Additional Comments:
Strong Oracle SQL / PLSQL experience on enterprise databases Data engineering for ETL of large datasets Java and/or Python knowledge Knowledge of AWS Cloud for data storage and API hosting CoaaS / Kubernetes CICD tools i.e. Jenkins, Harness, Artifactory