Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MIVPG for efficiency.Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MIVPG for efficiency.

Evaluating Visual Adapters: MIVPG Performance on Single and Multi-Image Inputs

2025/11/15 11:12
2 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Abstract and 1 Introduction

  1. Related Work

    2.1. Multimodal Learning

    2.2. Multiple Instance Learning

  2. Methodology

    3.1. Preliminaries and Notations

    3.2. Relations between Attention-based VPG and MIL

    3.3. MIVPG for Multiple Visual Inputs

    3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios

  3. Experiments and 4.1. General Setup

    4.2. Scenario 1: Samples with Single Image

    4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding

    4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case Study

  4. Conclusion and References

\ Supplementary Material

A. Detailed Architecture of QFormer

B. Proof of Proposition

C. More Experiments

4. Experiments

To assess the effectiveness of our proposed approach, we conduct evaluations across various scenarios:

\

  1. where each sample comprises a single image, and patches are naturally considered as instances;

    \

  2. where each sample includes multiple instances, but we use a general embedding for each image;

    \

  3. where each sample contains multiple images, with each image containing multiple patches.

4.1. General Setup

We initialize our model using BLIP2 [22] with FLAN-T5- XL. MIVPG is initialized with weights from QFormer. The model consists of a frozen language model and a frozen visual model. During training, we only update the MIVPG. The visual encoder, ViT-G, is employed to encode images into patches of embeddings, and the images are resized to dimensions of 224 × 224. In our experiments, we observed that unfreezing the visual encoder does not lead to additional improvements in datasets with small sizes. Further details can be found in the supplementary C.1.

\

:::info Authors:

(1) Wenliang Zhong, The University of Texas at Arlington ([email protected]);

(2) Wenyi Wu, Amazon ([email protected]);

(3) Qi Li, Amazon ([email protected]);

(4) Rob Barton, Amazon ([email protected]);

(5) Boxin Du, Amazon ([email protected]);

(6) Shioulin Sam, Amazon ([email protected]);

(7) Karim Bouyarmane, Amazon ([email protected]);

(8) Ismail Tutar, Amazon ([email protected]);

(9) Junzhou Huang, The University of Texas at Arlington ([email protected]).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Vinexpo Paris overtakes ProWein as world’s largest trade show

Vinexpo Paris overtakes ProWein as world’s largest trade show

PARIS, France — For decades, ProWein in Düsseldorf held the uncontested title as the world’s most influential international wine trade fair. But in 2025, a decisive
Share
Bworldonline2026/03/19 00:03
Federal Reserve expected to slash rates today, here's how it may impact crypto

Federal Reserve expected to slash rates today, here's how it may impact crypto

                                                                               Market participants are eagerly anticipating at least a 25 basis point (BPS) interest rate cut from the Federal Reserve on Wednesday.                     The Federal Reserve, the central bank of the United States, is expected to begin slashing interest rates on Wednesday, with analysts expecting a 25 basis point (BPS) cut and a boost to risk asset prices in the long term.Crypto prices are strongly correlated with liquidity cycles, Coin Bureau founder and market analyst Nic Puckrin said. However, while lower interest rates tend to raise asset prices long-term, Puckrin warned of a short-term price correction.  “The main risk is that the move is already priced in, Puckrin said, adding, “hope is high and there’s a big chance of a ‘sell the news’ pullback. When that happens, speculative corners, memecoins in particular, are most vulnerable.”Read more
Share
Coinstats2025/09/18 01:42
Glenn Hughes Scores His Greatest Chart Debut On His Own

Glenn Hughes Scores His Greatest Chart Debut On His Own

The post Glenn Hughes Scores His Greatest Chart Debut On His Own appeared on BitcoinEthereumNews.com. Nearly 10 years after Resonate, Glenn Hughes scores a new career high as Chosen opens at No. 4 on the Official Rock and Metal Albums chart. NEW YORK, NEW YORK – APRIL 08: Glenn Hughes of Deep Purple speaks onstage during the 31st Annual Rock And Roll Hall Of Fame Induction Ceremony at Barclays Center on April 8, 2016 in New York City. (Photo by Mike Coppola/Getty Images) Getty Images Almost a decade after his last solo album Resonate arrived, Glenn Hughes returns with Chosen. The rock superstar’s fifteenth project under his own name debuts on multiple charts in the United Kingdom, where he remains a legend in his chosen field. Chosen opens inside loftiest tiers on multiple tallies and even gives Hughes his first solo win on one roster. Glenn Hughes Scores First Hit on One Chart Chosen debuts on the Official Albums Downloads chart at No. 60. Hughes scores his first solo win on the list of the bestselling full-lengths and EPs on download platforms like iTunes and Amazon in the U.K., as his latest project arrives. Glenn Hughes Reaches a New Peak Chosen earns its loftiest starting point on the Official Rock and Metal Albums chart, where it kicks off at No. 4. Hughes reaches a new all-time high as the set arrives and collects his second top 10. Resonate peaked at No. 6, earning Hughes his first top 10 bestseller almost 10 years back, while Music for the Divine only spent one frame at No. 33 nearly 20 years ago. Glenn Hughes on the Albums Charts Chosen also brings Hughes to new all-time peak positions on both the Official Albums Sales and Official Physical Albums charts. The set debuts at Nos. 25 and 26 on those tallies, respectively. Only Resonate had previously landed on those lists,…
Share
BitcoinEthereumNews2025/09/18 02:41