Role & Responsibilities
The Senior Data Engineer will be responsible for
· Create and maintain optimal data pipeline architecture; assemble large, complex data sets that meet functional / non-functional requirements.
· Design the right schema to support the functional requirement and consumption pattern.
· Design and build production data pipelines from ingestion to consumption.
· Build the necessary datamarts, data warehouse required for optimal extraction, transformation, and loading of data from a wide variety of data sources.
· Create necessary preprocessing and postprocessing for various forms of data for training/ retraining and inference ingestions as required
· Create data visualization and business intelligence tools for stakeholders and data scientists for necessary business/ solution insights
· Identify, design, and implement internal process improvements\: automating manual data processes, optimizing data delivery, etc.
· Ensure our data is separated and secure across national boundaries through multiple data centers and AWS regions.
Technical Skill Set
· Hands on experience in Informatica PowerCenter/IICS as an ETL tool
· Experience with AWS cloud services\: EC2, EMR, RDS, Redshift, S3, Athena and familiarity with various log formats from AWS.
· Experience in AWS Glue ETL, AWS Crawler, AWS Lambda, Glue Data Catalog, AWS Glue Studio.
· Hands on experience with python programming, spark, shell scripting
· Knowledge of Database Concepts – Indexing, Partitioning is must.
· Knowledge of Data warehousing – Normalization, Denormalization, Star/Snow-flake schemas.
· Good Hands -on the table’s creations, DDL, DML and TCL
· Experience with big data tools\: Hadoop, Spark, Kafka, etc.
· Experience with data pipeline and workflow management tools\: Airflow, Luigi etc.