This article outlines the OW‑VISCap framework, which jointly detects, segments, and captions both seen and unseen objects within a video.This article outlines the OW‑VISCap framework, which jointly detects, segments, and captions both seen and unseen objects within a video.

Teaching AI to See and Speak: Inside the OW‑VISCap Approach

Abstract and 1. Introduction

  1. Related Work

    2.1 Open-world Video Instance Segmentation

    2.2 Dense Video Object Captioning and 2.3 Contrastive Loss for Object Queries

    2.4 Generalized Video Understanding and 2.5 Closed-World Video Instance Segmentation

  2. Approach

    3.1 Overview

    3.2 Open-World Object Queries

    3.3 Captioning Head

    3.4 Inter-Query Contrastive Loss and 3.5 Training

  3. Experiments and 4.1 Datasets and Evaluation Metrics

    4.2 Main Results

    4.3 Ablation Studies and 4.4 Qualitative Results

  4. Conclusion, Acknowledgements, and References

\ Supplementary Material

A. Additional Analysis

B. Implementation Details

C. Limitations

3 Approach

Given a video, our goal is to jointly detect, segment and caption object instances present in the video. Importantly, note that object instance categories may not be part of the training set (e.g., the parachutes shown in Fig. 3 (top row)), placing our goal in an open-world setting. To achieve this goal, a given video is first broken into short clips, each consisting of T frames. Each clip is processed using our approach OW-VISCap. We discuss merging of the results of each clip in Sec. 4.

\ We provide an overview of OW-VISCap to process each clip in Sec. 3.1. We then discuss our contributions: (a) introduction of open-world object queries in Sec. 3.2, (b) use of masked attention for object-centric captioning in Sec. 3.3, and (c) use of inter-query contrastive loss to ensure that the object qeries are different from each other in Sec. 3.4. In Sec. 3.5, we discuss the final training objective.

3.1 Overview

\ Both open- and closed-world object queries are processed by our specifically designed captioning head which yields an object-centric caption, a classification head which yields a category label, and a detection head which yields either a segmentation mask or a bounding-box.

\

\ We introduce an inter-query contrastive loss to ensure that the object queries are encouraged to differ from each other. We provide details in Sec. 3.4. For closed world objects, this loss helps in removing highly overlapping false positives. For open-world objects, it helps in the discovery of new objects.

\ Finally, we provide the full training objective in Sec. 3.5.

\

3.2 Open-World Object Queries

\

\

\ We first match the ground truth objects with the open-world predictions by minimizing a matching cost using the Hungarian algorithm [34]. The optimal matching is then used to calculate the final open-world loss.

\

\

3.3 Captioning Head

\

\

3.4 Inter-Query Contrastive Loss

\

\

3.5 Training

Our total training loss is

\ Table 1: Open-world tracking accuracy (OWTA) on the BURST validation and test sets for all, common (comm.) and uncommon (unc.) categories of objects. Onl. refers to online frame-by-frame processing. The best scores are highlighted in bold font, and the second-best scores are underlined.

\ Table 2: Dense video object captioning results on the VidSTG [57] dataset. Off. indicates offline methods and onl. refers to online methods.

\

:::info Authors:

(1) Anwesa Choudhuri, University of Illinois at Urbana-Champaign ([email protected]);

(2) Girish Chowdhary, University of Illinois at Urbana-Champaign ([email protected]);

(3) Alexander G. Schwing, University of Illinois at Urbana-Champaign ([email protected]).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
Sleepless AI Logo
Sleepless AI Price(AI)
$0.0355
$0.0355$0.0355
-2.14%
USD
Sleepless AI (AI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Shocking OpenVPP Partnership Claim Draws Urgent Scrutiny

Shocking OpenVPP Partnership Claim Draws Urgent Scrutiny

The post Shocking OpenVPP Partnership Claim Draws Urgent Scrutiny appeared on BitcoinEthereumNews.com. The cryptocurrency world is buzzing with a recent controversy surrounding a bold OpenVPP partnership claim. This week, OpenVPP (OVPP) announced what it presented as a significant collaboration with the U.S. government in the innovative field of energy tokenization. However, this claim quickly drew the sharp eye of on-chain analyst ZachXBT, who highlighted a swift and official rebuttal that has sent ripples through the digital asset community. What Sparked the OpenVPP Partnership Claim Controversy? The core of the issue revolves around OpenVPP’s assertion of a U.S. government partnership. This kind of collaboration would typically be a monumental endorsement for any private cryptocurrency project, especially given the current regulatory climate. Such a partnership could signify a new era of mainstream adoption and legitimacy for energy tokenization initiatives. OpenVPP initially claimed cooperation with the U.S. government. This alleged partnership was said to be in the domain of energy tokenization. The announcement generated considerable interest and discussion online. ZachXBT, known for his diligent on-chain investigations, was quick to flag the development. He brought attention to the fact that U.S. Securities and Exchange Commission (SEC) Commissioner Hester Peirce had directly addressed the OpenVPP partnership claim. Her response, delivered within hours, was unequivocal and starkly contradicted OpenVPP’s narrative. How Did Regulatory Authorities Respond to the OpenVPP Partnership Claim? Commissioner Hester Peirce’s statement was a crucial turning point in this unfolding story. She clearly stated that the SEC, as an agency, does not engage in partnerships with private cryptocurrency projects. This response effectively dismantled the credibility of OpenVPP’s initial announcement regarding their supposed government collaboration. Peirce’s swift clarification underscores a fundamental principle of regulatory bodies: maintaining impartiality and avoiding endorsements of private entities. Her statement serves as a vital reminder to the crypto community about the official stance of government agencies concerning private ventures. Moreover, ZachXBT’s analysis…
Share
BitcoinEthereumNews2025/09/18 02:13
[OPINION] Honduras’ election turmoil offers a warning — and a mirror — for the Philippines

[OPINION] Honduras’ election turmoil offers a warning — and a mirror — for the Philippines

IN PROTEST. Supporters of the Liberty and Refoundation party protest in front of the presidential palace in support of Honduran President Xiomara Castro in what
Share
Rappler2025/12/19 20:00
UST honors ‘heaven-sent’ Pastrana, Soriano as Tigresses reignite UAAP contender fire

UST honors ‘heaven-sent’ Pastrana, Soriano as Tigresses reignite UAAP contender fire

After crossing paths in UST for the first time in UAAP Season 86, Kent Pastrana and Eka Soriano leave the Growling Tigresses' lair as two-time champions, reigniting
Share
Rappler2025/12/19 20:21