AI Computing Performance Architect, Perf Analysis and Kernel Dev
NVIDIA
NVIDIA is developing processor and system architectures that accelerate machine learning, automotive and high performance computing (HPC) applications. We are seeking a strong candidate to do performance analysis and kernels development for NVIDIA's new architectures. Your work will play a critical role in shaping the future of deep learning hardware and software, ensuring optimal performance for next-generation AI applications. This position offers the opportunity to make a meaningful impact in a fast-moving, technology focused company.
What you'll be doing:
+ Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures.
+ Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs.
+ Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations.
+ Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency
+ Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations.
+ Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders.
+ Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.
What we need to see:
+ MS or PhD in relevant discipline (CS, EE, Math)
+ 3+ years of industry experience in GPU programming or performance optimization for DL applications.
+ Demonstrated experience in analyzing and improving the performance of GPU kernels, with measurable results (e.g. performance improvements, efficiency gains).
+ Strong programming skills in C, C++, Perl, or Python
+ Strong background in computer architecture
+ Excellent communication skills, both written and verbal.
+ Strong organizational and time management abilities, with the ability to prioritize tasks effectively.
Ways to stand out from the crowd:
+ LLM FMHA or GEMM related development or optimization experience will be a plus
+ Expertise in CUDA programming for GPU acceleration will be a plus.
+ Expertise in GPU/CPU Core or MemSys architecture modeling will be a plus.
#deeplearning
Por favor confirme su dirección de correo electrónico: Send Email