Generative Artificial Intelligence (Generative AI), Large Language Models (LLMs), and Machine Learning (ML) are rapidly transforming many aspects of AI hardware accelerators and System-on-Chips (SoCs). The high computational demands and characteristics of emerging AI/ML workloads are dramatically impacting the architecture, VLSI implementation, and circuit design tradeoffs of hardware accelerators. Furthermore, as we reach the end of Moore’s law, straightforward technology scaling offers limited opportunities for improved energy efficiency and performance. Instead, we must rely more on domain-specific architectural features and software/hardware design for AI model inferencing and training. In this talk, we will provide an overview of NVIDIA’s technology innovations, from circuits to software to the entire datacenter, needed to enable today’s latest supercomputers for GenAI. Next, we will highlight recent work from NVIDIA Research into energy-efficient deep learning inference acceleration, including optimized accelerator micro-architectures, SW/HW co-design for low-precision quantization, and LLM compression techniques. We also highlight recent testchips targeting Transformer neural network inference, including a recent 5nm deep learning inference accelerator testchip that achieves up to 95.6 TOPS/W and a low-power accelerator for always-on vision.
Brucek Khailany joined NVIDIA in 2009 and is the Senior Director of the Accelerators and VLSI Research group. He leads research projects in energy efficient AI accelerators, innovative VLSI design methodologies, ML and GPU assisted EDA, and quantum computing. Over 14 years at NVIDIA, he has contributed to many projects in research and product groups spanning computer architecture and VLSI design. Prior to NVIDIA, Dr. Khailany was a Co-Founder and Principal Architect at Stream Processors, Inc where he led R&D related to parallel processor architectures. At Stanford University, he led the VLSI implementation of the Imagine processor, which introduced the concepts of stream processing and partitioned register organizations. He received his PhD in Electrical Engineering from Stanford University and BSE degrees in Electrical and Computer Engineering from the University of Michigan.