Modern Australian
Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

How to Save Smart: Cheapest Travel Insurance for Schengen Visa without Cutting Corners

Picture this: you’ve found a last-minute flight to Milan, your hotel booking comes with breakfast and a rooftop view, and your itinerary is ready ...

Keeping Lone and Remote Workers Safe: Employer Duties and Practical Solutions

In Australia, thousands of employees work alone, in remote locations, or in direct contact with the public every day. While these roles are critical...

How Your General Dentist Supports Your Smile Over a Lifetime

A healthy grin is more than just a desirable feature; it reflects overall health, well-being, and self-esteem. Our oral health needs evolve from chi...

A Brighter Smile in Sydney: Expert Cosmetic Dentists and Veneers Solutions

A confident smile can open doors, boost your self-esteem, and leave a lasting impression. In Sydney, more people than ever are turning to cosmetic den...

How To Keep Vase Flowers Fresh Through Australia’s Coldest Months

Winter flowers develop slowly, which gives them stronger structure and longer vase life Heat from indoor environments is the biggest threat to th...

Artificial Intelligence is Powering the Growth of Australian Telehealth Services

Many Australians have traditionally experienced difficulties in accessing timely and quality healthcare, especially those who live in rural or remot...

VR Training in Australia – Customer Service Risk Management

In today’s rapidly evolving workplaces, Australian organisations are turning to immersive learning tools like VR to handle specialised needs such ...

Powering Shepparton’s Businesses: Expert Commercial Electrical Services You Can Count On

When it comes to running a successful business, having reliable, compliant, and efficient electrical systems is non-negotiable. From small retail ou...

Maximise Efficiency: Cleaner Solar Panels for Optimal Performance

Solar panels are a smart investment in energy efficiency, sustainability, and long-term savings—especially here in Cairns, where the tropical sun ...

7 Common Air Conditioner Issues in Melbourne – And How to Fix Them

Image by freepik Living in Melbourne, we all know how unpredictable the weather can be. One moment it’s cold and windy, the next it’s a scorchin...

Powering Palm QLD with Reliable Electrical Solutions

Image by pvproductions on Freepik When it comes to finding a trustworthy electrician Palm QLD locals can count on, the team at East Coast Sparkies s...

The Smart Way to Grow Online: SEO Management Sydney Businesses Can Rely On

If you’re a Sydney-based business owner, you already know the digital space is crowded. But with the right strategy, you don’t need to shout the...

What Your Car Says About You: The Personality Behind the Vehicle

You can tell a lot about someone by the car they drive—or at least, that’s what people think. True Blue Mobile Mechanics reckon the car says a l...

The Confidence Curve: Why Boudoir Photography Is the Empowerment Trend You Didn’t Know You Needed

Boudoir photography has been quietly taking over social feeds, Pinterest boards, and personal milestones—and for good reason. It’s not just abou...

Understanding Level 2 Electricians: Why Sydney Residents Need Licenced Experts for Complex Electrical Work

When it comes to electrical work around the home or business, not all electricians are created equal. In Sydney, particularly when you're dealing wi...

Retirement Anchored in Model Boat Building for Waterford’s Doug Unsold

WATERFORD — When Doug Unsold sees his ship come in, it’s usually one he’s crafted with his own hands. The 67-year-old retiree from Waterford ...

The Science Behind Alarm Clocks and Your Circadian Rhythm

Waking up on time isn’t just about setting an alarm—it’s about working with your body, not against it. At the heart of every restful night and...

How to Use Plants to Create a Calming Atmosphere in Your Home

In today’s fast-paced world, cultivating a calm, soothing environment at home has never been more important. Whether you live in a busy urban apar...