Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026
United States
Full-time
Remote
$124K/yr - $196K/yr
Entry Level
NVIDIA is a leading company in AI computing, seeking a Deep Learning Software Engineer focused on TensorRT Performance. The role involves analyzing and improving the performance of NVIDIA’s inference ecosystem while collaborating with diverse teams to develop innovative inference solutions.
Establish groundbreaking performance benchmarking methodologies and analysis workflows and identify performance issues and opportunities for NVIDIA’s inference ecosystem (e.g. TensorRT/TensorRT-EdgeLLM/Torch-TensorRT)
Contribute features and code to NVIDIA/OSS inference frameworks including but not limited to TensorRT/TensorRT-EdgeLLM/Torch-TensorRT
Develop new model pipelines for NVIDIA’s inference ecosystem with optimized performance including but not limited to areas like quantization, scheduling, memory management, and distributed inference to set the gold standard for Gen AI performance
Work with cross-collaborative teams inside and outside of NVIDIA across generative AI, automotive, robotics, image understanding, and speech understanding to set directions and develop innovative inference solutions
Scale performance of deep learning models across different architectures and types of NVIDIA accelerators
Qualification
Required
Bachelors, Masters, PhD, or equivalent experience in relevant fields (Computer Science, Computer Engineering, EECS, AI)
2 years of relevant software development experience
Strong C++, Python programming and software engineering skills
Experience with DL frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX) and inference libraries (e.g. TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer)
Experience with performance analysis and performance optimization
Preferred
Strong foundation and architectural knowledge of GPUs
Deep understanding of modern deep learning models and workloads (e.g. Transformers, Recommenders, ASR, TTS, Visual Understanding)
Proficiency in one of the deep learning programming domain specific languages (e.g. CUDA/TileIR/CuTeDSL/cutlass/Triton)
Prior contributions to major LLM inference frameworks (e.g. vLLM) or prior experience with graph compilers in deep learning inference (e.g. TorchDynamo/TorchInductor)
Prior experience optimizing performance for low-latency, resource-constrained systems or embedded AI pipelines (e.g. Jetson systems or other edge AI accelerators)
Benefits
You will also be eligible for equity and benefits.
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.