Role Proficiency:
This role requires proficiency in data pipeline development including coding and testing data pipelines for ingesting wrangling transforming and joining data from various sources. Must be skilled in ETL tools such as Informatica Glue Databricks and DataProc with coding expertise in Python PySpark and SQL. Works independently and has a deep understanding of data warehousing solutions including Snowflake BigQuery Lakehouse and Delta Lake. Capable of calculating costs and understanding performance issues related to data solutions.
Outcomes:
Act creatively to develop pipelines and applications by selecting appropriate technical options optimizing application development maintenance and performance using design patterns and reusing proven solutions.rnInterpret requirements to create optimal architecture and design developing solutions in accordance with specifications. Document and communicate milestones/stages for end-to-end delivery. Code adhering to best coding standards debug and test solutions to deliver best-in-class quality. Perform performance tuning of code and align it with the appropriate infrastructure to optimize efficiency. Validate results with user representatives integrating the overall solution seamlessly. Develop and manage data storage solutions including relational databases NoSQL databases and data lakes. Stay updated on the latest trends and best practices in data engineering cloud technologies and big data tools. Influence and improve customer satisfaction through effective data solutions.Measures of Outcomes:
Adherence to engineering processes and standards Adherence to schedule / timelines Adhere to SLAs where applicable # of defects post delivery # of non-compliance issues Reduction of reoccurrence of known defects Quickly turnaround production bugs Completion of applicable technical/domain certifications Completion of all mandatory training requirements Efficiency improvements in data pipelines (e.g. reduced resource consumption faster run times). Average time to detect respond to and resolve pipeline failures or data issues. Number of data security incidents or compliance breaches.Outputs Expected:
Code Development:
Develop data processing code independentlyensuring it meets performance and scalability requirements. Define coding standards
templates
and checklists. Review code for team members and peers.
Documentation:
checklists
guidelines
and standards for design
processes
and development. Create and review deliverable documents
including design documents
architecture documents
infrastructure costing
business requirements
source-target mappings
test cases
and results.
Configuration:
Testing:
scenarios
and execution plans. Review the test plan and test strategy developed by the testing team. Provide clarifications and support to the testing team as needed.
Domain Relevance:
demonstrating a deeper understanding of business needs. Learn about customer domains to identify opportunities for value addition. Complete relevant domain certifications to enhance expertise.
Project Management:
Defect Management:
Estimation:
Knowledge Management:
SharePoint
libraries
and client universities. Review reusable documents created by the team.
Release Management:
Design Contribution:
low-level design (LLD)
and system architecture for applications
business components
and data models.
Customer Interface:
Team Management:
Certifications:
Skill Examples:
Proficiency in SQL Python or other programming languages used for data manipulation. Experience with ETL tools such as Apache Airflow Talend Informatica AWS Glue Dataproc and Azure ADF. Hands-on experience with cloud platforms like AWS Azure or Google Cloud particularly with data-related services (e.g. AWS Glue BigQuery). Conduct tests on data pipelines and evaluate results against data quality and performance specifications. Experience in performance tuning of data processes. Expertise in designing and optimizing data warehouses for cost efficiency. Ability to apply and optimize data models for efficient storage retrieval and processing of large datasets. Capacity to clearly explain and communicate design and development aspects to customers. Ability to estimate time and resource requirements for developing and debugging features or components.Knowledge Examples:
Knowledge Examples
Knowledge of various ETL services offered by cloud providers including Apache PySpark AWS Glue GCP DataProc/DataFlow Azure ADF and ADLF. Proficiency in SQL for analytics including windowing functions. Understanding of data schemas and models relevant to various business contexts. Familiarity with domain-related data and its implications. Expertise in data warehousing optimization techniques. Knowledge of data security concepts and best practices. Familiarity with design patterns and frameworks in data engineering.Additional Comments:
Mandatory Skills:AWS Glue Skill to Evaluate:AWS Glue Experience:6 to 8 Years Location:Bengaluru Job Description: Minimum experience of 6 years in building, optimizing, and maintaining scalable data pipelines as an ETL Engineer. Hands-on experience in coding techniques with a proven record. Hands-on experience in end-to-end data workflows, including pulling data from third-party and in-house tools via APIs, transforming and loading it into data warehouses, and improving performance across the ETL lifecycle. Hands-on experience with scripting (Python, shell scripting), relational databases (PostgreSQL, Redshift), REST APIs (OAuth, JWT, Basic Auth), job scheduler (cron), version control system (Git), and in AWS environment. Hands-on experience in integrating data from various data sources. Understanding of Agile processes and principles. Good communication and presentation skills. Good documentation skills. Preferred: Ability to understand business problems and customer needs and provide data solutions. Hands-on experience in working with Qualys and its APIs. Understanding of business intelligence tools such as PowerBI. Knowledge of data security and privacy. Education Qualificaiton : Bachelor Degree JobTitle: C&S ETL Engineer Roles & Responsibilities: Design, develop, implement, and maintain robust and scalable ETL pipelines using Python and SQL as well as AWS Glue and AWS Lambda for data ingestion, transformation, loading into various data targets (e.g., PostgreSQL, Amazon S3, Redshift, Aurora) and structured data management. Translate business requirements and data models into efficient and secure data processing solutions. Pull data using APIs and handle large datasets efficiently. Integrate data from various sources including REST APIs (OAuth, JWT, Basic Auth), CSVs, and relational databases ensuring accurate configuration, optimization and schema definitions. Optimize code and reduce ETL execution time through performance tuning and error handling best practices. Write, test, and deploy AWS Glue ETL scripts (Python/PySpark or Scala) to perform complex data transformations, cleansing, aggregation, and enrichment. Implement security best practices for data in transit and at rest. Automate job execution using cron or other scheduling tools and monitor pipeline health. Use Git for version control, maintaining CI/CD hygiene for data scripts and managing ETL code through various branching strategies. Troubleshoot and debug complex ETL workflows and provide RCA for failures. Work on Linux-based systems with shell scripting for operational support. Collaborate effectively with business users and data engineers through Pull Request (PR) / Merge Request (MR) workflows to gather requirements and document workflows. Ensure code quality, data quality, consistency, and timely delivery of all scheduled data loads while adhering to branching policies to ensure isolated changes and preventing conflicts with ongoing work. Participate in code review processes, providing constructive feedback and ensuring that proposed changes align with data integrity, performance, and security standards before merging into shared branches. Manage the promotion of ETL code across different environments (e.g., Development, Staging, Production) by adhering to defined branching and release strategies, ensuring smooth and controlled deployments. Participate in Agile rituals, such as daily stand-up meetings, and other project meetings to contribute to achieve project deliverables within required timeframe. Success Factors: Maintain regular communication with supervisor and continually update needs, priorities and risks to the supervisor. Critical analytical, problem-solving skills is essential. Be able to communicate clearly the message via oral communication and written communication. Possess an assertive communication style, but maintain a positive relationship with all team members and stakeholders. Strong focus on continuous learning and improvement. Project Details: The person in this role works in the Data Intelligence team, maintains a strong people network within the Compliance & Security team and stakeholders, is conversant with key assets & their operations and is instrumental in the design, development, optimization and maintenance of ETL procedures.