San Jose, CRI
1 day ago
Data Scientist
**Company description** Re:Sources is the backbone of Publicis Groupe, the world’s third-largest communications group. Formed in 1998 as a small team to service a few Publicis Groupe firms, Re:Sources has grown to 4,000+ people servicing a global network of prestigious advertising, public relations, media, healthcare and digital marketing agencies. We provide technology solutions and business services including finance, accounting, legal, benefits, procurement, tax, real estate, treasury and risk management to help Publicis Groupe agencies do what they do best: create and innovate for their clients. In addition to providing essential, everyday services to our agencies, Re:Sources develops and implements platforms, applications and tools to enhance productivity, encourage collaboration and enable professional and personal development. We continually transform to keep pace with our ever-changing communications industry and thrive on a spirit of innovation felt around the globe. With our support, Publicis Groupe agencies continue to create and deliver award-winning campaigns for their clients. **Overview** We are looking for a Data Scientist to help drive the application of business intelligence, advanced analytics, and machine learning across our internal platforms. In this role, you will collaborate closely with data scientists, engineers, and product owners to ensure high-quality, timely delivery of data-driven features and solutions. You will work end-to-end with feature teams to manage the entire data lifecycle, enforce SLAs, and ensure seamless integration into production. Additionally, you will be responsible for the setup, monitoring, and ongoing maintenance of development, testing, staging, and production environments in partnership with DevOps, infrastructure, and product teams. This is a hands-on role at the intersection of data science, operations, and platform scalability. **Responsibilities** **Natural Language Processing (NLP) & Large Language Models (LLMs)** + Design and develop advanced data science and machine learning algorithms, with a strong emphasis on Natural Language Processing (NLP) for personalized content, user understanding, and recommendation systems + Lead the design, development, deployment, and integration of NLP models into platform features such as search, recommendations, and content curation + Work on end-to-end LLM-driven features, including fine-tuning pre-trained models (e.g., BERT, GPT), prompt engineering, vector embeddings, and retrieval-augmented generation (RAG) + Build robust models on diverse datasets to solve for semantic similarity, user intent detection, entity recognition, and content summarization/classification + Architect and implement LLM-based pipelines using Hugging Face Transformers, LangChain, or similar libraries + Regularly assess and maintain model accuracy and relevance through evaluation, retraining, and continuous improvement processes + Architect scalable solutions for deploying and monitoring language models within platform services, ensuring performance and interpretability + Champion and establish best practices and standards in building NLP/ML systems including ethical AI principles + Actively contribute to knowledge sharing and support the team in keeping pace with state-of-the-art NLP/LLM research + Contribute to a culture of innovation and experimentation, continuously exploring new techniques in the rapidly evolving NLP/LLM space **Machine Learning & Data Science** + Design and deliver end-to-end solutions, from prototype to production, with continuous performance tracking + Analyze user behaviour through data and derive actionable insights for platform feature improvements using experimentation (A/B testing, multivariate testing) + Own model documentation including architecture, training data, evaluation metrics, and deployment workflow **Software Engineering & ML Ops** + Write modular, maintainable code with unit testing and CI/CD practices integrated into the ML lifecycle + Write clean, well-documented code in notebooks and scripts, following best practices for version control, testing, and deployment + Collaborate with Data Engineering to ensure scalable and reliable data pipelines for training and inference **Cross-functional Collaboration & Communication** + Collaborate cross-functionally with engineers, product managers, and designers to translate business needs into NLP/ML solutions + Communicate findings and solutions effectively across stakeholders — from technical peers to executive leadership + Handle ambiguous or loosely defined problems and bring clarity through exploration and communication with domain experts **Qualifications** **Education** + Bachelor's degree in engineering, computer science, statistics, mathematics, information systems, or a related field from an accredited college or university; Master's degree from an accredited college or university is preferred. **Natural Language Processing (NLP) & Large Language Models (LLMs)** + Proficiency in Python and NLP frameworks: spaCy, NLTK, Hugging Face Transformers, OpenAI, LangChain + Strong understanding of LLMs, embedding techniques (e.g., SBERT, FAISS), RAG architecture, prompt engineering, and model evaluation + Experience in text classification, summarization, topic modeling, named entity recognition, and intent detection + Experience building data science models for use on front end, user facing applications, such as recommendation models **Machine Learning & Data Science** + Advanced knowledge of data science techniques, and experience building, maintaining, and documenting models + Strong experience working with data science/ML libraries in Python (SciPy, NumPy, TensorFlow, SciKit-Learn, etc.) + Experience deploying ML models in production and working with orchestration tools such as Airflow, MLflow + Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement + Strong analytic skills related to working with unstructured datasets + A successful history of manipulating, processing and extracting value from large disconnected datasets **Cloud & Data Engineering** + Comfortable working in cloud environments (Azure preferred) and with tools such as Docker, Kubernetes (AKS), and Git + Strong experience working in cloud development environments (especially Azure, ADF, PySpark, DataBricks, SQL) + Experience building and optimizing ADF and PySpark based data pipelines, architectures and data sets on Graph and Azure Datalake + Understanding of Jenkins, CI/CD processes using Git, for cloud configs and standard code repositories such as ADF configs and Databricks + Build processes supporting data transformation, data structures, metadata, dependency and workload management **Databases & Data Integration** + Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases preferably Graph DB + Strong understanding of RDBMS data structure, Azure Tables, Blob, and other data sources + Understanding of Graph data, Neo4j is a plus + Experience with REST APIs, JSON, streaming datasets **Project & Team Management** + Strong project management and organizational skills + Experience supporting and working with cross-functional teams in a dynamic environment
Por favor confirme su dirección de correo electrónico: Send Email