Modern Australian
Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

Best Nail Care Routine for Frequent Nail Polish Wearers

For many people, nail polish is more than a beauty statement – it’s part of their everyday routine. Whether you love bold colours, chic neutrals...

Reinventing Research: How E-Libraries Are Changing Education Forever

A New Chapter for Learning For centuries libraries stood as temples of knowledge filled with shelves that smelled of dust and paper. Today the same...

Psychologists Explore Gestalt Vs Schema Therapy for PTSD Treatment

Recent research has revealed that in 2022, 1 in 9 Australians experienced post-traumatic stress disorder (PTSD). For some, this can significantly im...

Beyond Sunscreen: Building a Sun-Smart Culture in Modern Australia

Australia’s sun-soaked lifestyle is a defining part of its national identity. From beaches and sports fields to weekend barbecues and bushwalks, t...

What is Power BI & Why Should Your Business Use It?

In today's data-driven world, businesses are constantly searching for ways to gain a competitive edge. One tool that has emerged as a game-changer i...

From Service to Strength: How Aussie Veterans Are Rebuilding Their Lives with Everyday Support

Life after military service can bring new challenges. From physical limitations to mental health hurdles, many Australian veterans find everyday hou...

The Best Times of Year to Buy a Caravan

If you're shopping for caravans for sale, timing matters almost as much as the layout and features you desire. The calendar shapes price, stock and ...

The Growing Demand for Smart Living Through Home Automation

Technology has reshaped how we communicate, work, and travel—but now, it’s also changing the way we live at home. The rise of home automation i...

Beyond Clicks and Likes: Why Many Small Businesses in Australia Still Aren’t Leveraging Digital Marketing in 2025

Introduction In 2025, online marketing has become the driving force behind business growth for companies of all sizes. Yet, despite its proven effect...

Lighting Shop Perth: Your Comprehensive Guide to Choosing the Right Lighting Solutions

Lighting is a fundamental element in defining the ambiance, functionality, and aesthetic appeal of any space. Whether you are renovating your home, ...

Private Booze Cruisers – The New Must-Have Toy for Cashed Up Millennials

Did you hear that your 30s are the new 20s? We’ve finally rocked up that adult money and now it’s time to play with it. I was going for a walk ...

Grinding & Jaw Soreness: Signs You Might Need Night Guards and How We Protect Enamel

Waking with a tight jaw, tender muscles, or a dull temple headache is more than a bad night’s sleep. Many Australians grind or clench their teeth ...

Circular Interior Design: Furnishing with Salvaged & Reclaimed Materials

Circular interior design is gradually making its way from niche circles into mainstream Australian homes. At its core, this approach revolves around...

Invisible Braces vs Traditional Braces: Which Is Best for Adults?

Straightening teeth as an adult is common in Australia, and the options are better than ever. The two main choices are clear aligners, also called i...

Smoking, Vaping, and Healing: How Nicotine Affects Sockets and What you can do About it

Nicotine and oral surgery are a poor mix. After an wisdom teeth removal in Sydney, your body needs a stable blood clot and steady blood flow to rebu...

Titanium and Bone: How Dental Implants Become Part of the Jaw

Dental implants replace missing teeth by anchoring a metal fixture in the jaw and fitting a crown on top. Their success rests on a biological event ...

Do Wisdom Teeth Really Make You Wiser? Debunking Old Beliefs

Wisdom teeth are among the most discussed teeth in dentistry, not because of their function but because of the myths that surround them. The name it...

How Long Do Dental Implants Really Last? The Facts Dentists Won’t Skip

Australians often ask one simple question before green-lighting treatment: how long will a dental implant actually last? The short answer is that th...