Data Wrangling & Processing
• Clean, transform, and normalize structured and unstructured data (images, video, tabular data).
• Perform data annotation quality checks and support the development of labeled datasets for ML/CV tasks.
Dockerization & Environment Setup
• Build and manage Docker containers to standardize data processing environments and ensure reproducibility.
• Collaborate with DevOps/ML/CV engineers to containerize preprocessing pipelines and visualization tools.
• Optimize Docker images for deployment on local machines, cloud instances, and edge devices.
Data Visualization & Analysis
• Develop visualizations to understand data distributions, detect anomalies, and communicate data stories using tools like Matplotlib, Seaborn, Plotly, or Power BI
• Create dashboards or visual reports for stakeholders to assess data readiness and model inputs.
• Work closely and Coordinate with cross functional teams to evaluate, design, implement and integrate customer requirements.
ML & Computer Vision Integration
• Understand the role of data in ML/CV pipelines—how bias, noise, or imbalance can impact model performance.
• Assist in defining data-centric metrics for model training and evaluation.
• Contribute to dataset versioning, documentation, and lineage tracking for traceability in experiments.
Work Experience
Required Skills & Qualifications
• 5–7 years of experience working with data pipelines and ML-ready datasets.
• Strong skills in Python, especially in libraries like Pandas, NumPy, and OpenCV.
• Hands-on experience with Docker, containerizing scripts or microservices.
• Proficiency in data visualization tools (e.g., Matplotlib, Seaborn, Plotly, Dash, or Power BI)
. Need exposure in lidar point clouds and radar scans
• Familiarity with ML and CV workflows (e.g., model input formats, augmentation, dataset split strategies).
• Strong analytical skills and a good sense of data quality and its impact on machine learning models.