NVIDIA’s AutoGaze makes AI video understanding up to 19x faster by mimicking human attention

Processing video is one of the most computationally expensive things you can ask an AI model to do. It works by feeding every pixel into the model whether it matters or not. For long, high-resolution video, that adds up fast.

AutoGaze, published by NVIDIA and UC Berkeley researchers, solves this the same way your eyes do. Instead of processing everything, it learns to identify which parts of a video actually changed or matter, and throws out the rest.

The module is surprisingly small, only 3 million parameters. It plugs into existing video models without rebuilding them.

How much it cuts: 4x to 100x fewer tokens depending on the video content, typically keeping 5 to 20% of patches.
Speed gains: Up to 19x faster for standard vision transformers, 10x for NVIDIA’s NVILA-8B model.
What it enables: Real-time processing of 4K video with over 1,000 frames on hardware that would previously struggle with it.

NVIDIA already shipped a model using it. NVILA-8B-HD-Video incorporates AutoGaze natively and handles long-form 4K video question answering significantly better than previous versions.

On a new benchmark called HLVid, which tests five-minute 4K videos with fine-detail questions:

NVILA-8B-HD-Video with AutoGaze: 52.6% accuracy
Improvement over baseline: +10.1% over the same model without AutoGaze
Improvement over previous best: +4.5% over the prior state of the art

The limitations are worth knowing. Heavy camera movement still causes problems, and pushing compression too far produces smearing artifacts. It is not perfect, but it removes a real bottleneck.

This is not the only video bet NVIDIA is making right now. The company is also partnering with Runway to build a model that generates HD video in real time, a story we covered recently.

The Bottom Line: AutoGaze will not make headlines the way a new language model does. But making AI video understanding 19x faster with a 3 million parameter plugin is exactly the kind of unglamorous infrastructure work that actually moves the field forward.

Source: NVIDIA / UC Berkeley

Efficienist Newsletter

Get the core business tech news delivered straight to your inbox. We track AI, automation, SaaS, and cybersecurity so you don't have to.

Just read what you want, and be done with it.

NVIDIA’s AutoGaze makes AI video understanding up to 19x faster by mimicking human attention

Anthropic doubles Claude Code limits after signing compute deal with SpaceX

A 12 million token LLM appeared out of nowhere, and the AI community isn’t sure what to make of it

Karpathy joins Anthropic, Google I/O delivers Gemini 3.5 and a 24/7 personal agent, Standard Chartered cuts 7,000 for AI

Federal Jury Dismisses Musk’s $150B OpenAI Claim in Under Two Hours

Cerebras IPO Soars 68%, Codex goes mobile, and Grok Build enters the coding agent race

Anthropic doubles Claude Code limits after signing compute deal with SpaceX

A 12 million token LLM appeared out of nowhere, and the AI community isn’t sure what to make of it