AI开发工程师
Lenovo
General Information Req # WD00084136 Career area: Information Technology Country/Region: China State: Tianjin City: 天津(Tianjin) Date: Thursday, June 12, 2025 Working time: Full-time Additional Locations: * China - Tianjin - 天津(Tianjin) Why Work at Lenovo We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub. Description and Requirements
Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub. Description and Requirements
岗位职责:
1. 模型训推平台架构设计与核心功能开发
主导模型训推一体平台的顶层设计(含训练框架适配、分布式训练支持、推理引擎优化、实验管理模块等),制定技术路线图(短期迭代如分布式训练优化、长期演进如多模态训练支持)。负责深度学习框架(PyTorch/TensorFlow/Hugging Face)的二次开发与适配,实现训练任务的高效调度(如数据并行、模型并行、流水线并行),支持千亿参数级大模型的稳定训练。研发推理引擎模块,集成TensorRT、ONNX Runtime等加速工具,优化模型推理延迟(如动态量化、剪枝、蒸馏)与吞吐量,支持GPU/CPU/国产芯片多硬件适配。设计实验管理与模型生命周期管理功能(如超参调优、版本控制、A/B测试、模型压缩),提供可视化界面与API接口,降低AI研发门槛。2. 智能体(Agent)设计与多场景落地
主导智能体(Agent)的核心架构设计,涵盖多模态感知(文本/图像/语音)、决策规划(强化学习/符号推理)、记忆模块(长期记忆存储与检索)等模块,支持复杂场景下的自主决策与交互。结合大语言模型(LLM)能力,开发基于LLM的智能体框架(如AutoGPT扩展),实现意图理解、任务拆解、工具调用(如API/数据库查询)的闭环,支持垂类场景(如客服、代码生成、数据分析)的快速定制。推动智能体与训推平台的深度融合,支持智能体的在线学习(持续优化策略)、多智能体协作(Multi-Agent System)及与外部系统(如RPA、业务系统)的集成。3. 智算基础设施协同与优化
深度适配Kubernetes容器编排平台,实现训练/推理任务的高效调度(如GPU资源切分、弹性扩缩容)、分布式训练的Pod协同(如参数服务器通信、梯度同步)及任务监控(如资源利用率、训练耗时)。对接智算存储系统(如对象存储、并行文件系统),优化模型/数据的高效读写(如分布式缓存、IO优先级调度),解决大模型训练中数据加载瓶颈(如TB级数据集的高效分片与预取)。协同智算网络团队,优化AI训练网络的通信性能(如RDMA/RoCEv2低延迟传输、NCCL通信库调优),支持多节点分布式训练的高带宽、低延迟需求(如AllReduce操作优化)。4. 技术攻坚与性能优化
解决大模型训练/推理中的关键技术问题(如训练稳定性、内存溢出、推理延迟),通过混合精度训练、梯度检查点、模型并行等手段提升训练效率与资源利用率。针对行业场景(如自动驾驶、医疗影像、工业质检)优化模型适配能力,支持小样本学习、迁移学习、领域微调等功能,降低行业客户的模型定制成本。主导平台性能压测与瓶颈分析,通过分布式架构调优(如任务队列优化、资源隔离)、硬件加速(如GPU算子优化、FPGA协处理)等手段,提升平台吞吐量(如单集群支持千卡级训练)与可靠性(如99.9%训练任务成功率)。5. 跨团队协作与场景落地
协同算法团队(NLP/CV/Multi-modal)完成智能体与大模型的需求对接,推动前沿算法(如Agent Fine-tuning、多模态推理)的工程化落地。对接客户与售前团队,参与POC(概念验证)项目,输出技术方案文档、场景解决方案及客户定制化需求实现方案(如行业专用智能体模板)。跟踪AI技术趋势(如多模态大模型、具身智能、AI Agent自治),推动技术创新与产品迭代,保持公司在AI研发工具链领域的技术领先性。6. 技术文档与知识沉淀
负责训推平台、智能体的技术文档体系建设(如架构设计文档、API接口规范、运维手册、最佳实践),推动团队知识共享与技术传承。主导内部技术分享(如分布式训练原理、LLM Agent开发实践、K8s在AI场景的调优技巧),提升团队整体技术深度。岗位要求:核心技术能力
精通PyTorch/TensorFlow等深度学习框架的二次开发
熟悉大模型训练优化技术(混合精度/梯度检查点/模型并行)
掌握TensorRT/ONNX Runtime等推理加速工具
智能体开发经验
具备智能体(Agent)系统设计与开发经验
熟悉大语言模型(LLM)应用开发与优化
了解强化学习、多智能体系统等技术
分布式系统能力
熟悉Kubernetes在大规模AI训练中的应用
掌握分布式训练优化技术(NCCL/RDMA等)
具备智算基础设施性能调优经验
问题解决能力
能独立解决大模型训练/推理中的技术难题
具备系统性能分析与优化能力
熟悉主流AI监控与诊断工具
团队协作能力
良好的跨团队协作与沟通能力
能有效对接业务需求并提供技术方案
具备技术文档编写与知识分享能力
Additional Locations: * China - Tianjin - 天津(Tianjin) * China * China - Tianjin * China - Tianjin - 天津(Tianjin)
Por favor confirme su dirección de correo electrónico: Send Email