Modern Australian
Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

Top Fire Hazards in Commercial Buildings and How to Avoid Them

When it comes to protecting lives, assets, and business continuity, fire safety should be at the forefront of every commercial property owner’s ri...

Refillable Bottles and the Role of Reusable Caps in a Sustainable Packaging Future

As industries across the globe strive to reduce environmental impact, refillable bottles and reusable caps are emerging as champions of sustainable ...

The Comprehensive Guide to Physiotherapy: Benefits & Techniques

It employs hands-on care, movement and advice to aid recovery and prevent further issues. Individuals of all stages of life utilize physiotherapy for ...

What Can Be Funded by the NDIS? A Guide to Approved Supports

The National Disability Insurance Scheme (NDIS) is designed to empower Australians with disability by providing funding for supports that improve in...

10 Desserts That Aren’t Complete Without a Dollop of Whipped Cream

There’s something undeniably luxurious about a generous dollop of freshly whipped cream. Light, fluffy, and subtly sweet, it has the power to tran...

Why Personalised Ornaments Make the Perfect Christmas Gift

In a season brimming with sentiment and tradition, gift-giving is one of the most cherished ways to show we care. And while store-bought items can b...

What Causes Depression? Biological, Psychological, and Social Triggers Explained

Depression is a complex mental health condition that can impact every aspect of a person’s life—from sleep and energy levels to relationships an...

Brisbane Road Accident Lawyers and Bus Accident Compensation

Navigating the aftermath of a road accident in Brisbane, especially one involving a bus, can be a daunting experience. Whether you’re a passenger...

Freezer Room Hire in Perth: The Ultimate Solution for Reliable Cold Storage

Ever found yourself in a cold storage crisis and wondering how to save your perishables before they go bad?   Well, this isn't something new. Pe...

Why Exterior Window Cleaning Is More Than Just Aesthetic

Most people see window cleaning as a finishing touch. Something you do before guests come over or when the smudges finally start to annoy you. It is...

How Duct Inspections Can Save You Money Long-Term

Ever noticed your energy bill creeping up without changing your habits? It is more common than you might think, especially in Melbourne homes where ...

Designing Secure Content APIs in Headless CMS Environments

APIs are what enable content to be delivered from a headless CMS to anywhere digital content can exist from websites and apps to IoT and more. Yet whi...

How to Choose the Right First Aid Kit for Your Workplace

Ever walked past your workplace first aid kit and wondered if that dusty box actually contains what you'd need in a real emergency? We know that fee...

The Role of Mining Equipment Suppliers in the Industry

Mining is one of the most essential industries, driving the global economy by providing raw materials for manufacturing, energy production, and infr...

From White Belt To Warrior: How To Train Your Body For BJJ Endurance

Brazilian Jiu-Jitsu (BJJ) is as much a mental chess match as it is a physical grind. Whether you’re new to the mats or prepping for high-level com...

Driving After Brain Injury: Common Challenges and How To Overcome Them.

Suffering a traumatic brain injury (TBI) can drastically influence a person's ability to drive safely. For many, driving represents independence, bu...

The Psychology of Stepping Up: Why We Avoid Helping—and How to Fix That

Most of us like to think of ourselves as kind, helpful people. We imagine that if someone needed assistance—especially in an emergency—we’d st...

Cross-Continental Collaboration: Nutifood and Viplus Dairy to Create Australian- Standard Premium Dairy Brand

Vietnam’s Nutifood has officially entered a strategic partnership with ViPlus Dairy, a heritage-rich dairy manufacturer with over 130 years of ex...