This article reviews the development and application of Vision-Large-Language-Models, focusing on their integration into autonomous driving systems.This article reviews the development and application of Vision-Large-Language-Models, focusing on their integration into autonomous driving systems.

The Integration of Vision-LLMs into AD Systems: Capabilities and Challenges

2025/09/28 04:00
3 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Abstract and 1. Introduction

  1. Related Work

    2.1 Vision-LLMs

    2.2 Transferable Adversarial Attacks

  2. Preliminaries

    3.1 Revisiting Auto-Regressive Vision-LLMs

    3.2 Typographic Attacks in Vision-LLMs-based AD Systems

  3. Methodology

    4.1 Auto-Generation of Typographic Attack

    4.2 Augmentations of Typographic Attack

    4.3 Realizations of Typographic Attacks

  4. Experiments

  5. Conclusion and References

2 Related Work

2.1 Vision-LLMs

Having demonstrated the proficiency of Large Language Models (LLMs) in reasoning across various natural language benchmarks, researchers have extended LLMs with visual encoders to support multimodal understanding. This integration has given rise to various forms of Vision-LLMs, capable of reasoning based on the composition of visual and language inputs.

\ Vision-LLMs Pre-training. The interconnection between LLMs and pre-trained vision models involves the individual pre-training of unimodal encoders on their respective domains, followed by large-scale vision-language joint training [17, 18, 19, 20, 2, 1]. Through an interleaved visual language corpus (e.g., MMC4 [21] and M3W [22]), auto-regressive models learn to process images by converting them into visual tokens, combine these with textual tokens, and input them into LLMs. Visual inputs are treated as a foreign language, enhancing traditional text-only LLMs by enabling visual understanding while retaining their language capabilities. Hence, a straightforward pre-training strategy may not be designed to handle cases where input text is significantly more aligned with visual texts in an image than with the visual context of that image.

\ Vision-LLMs in AD Systems. Vision-LLMs have proven useful for perception, planning, reasoning, and control in autonomous driving (AD) systems [6, 7, 9, 5]. For example, existing works have quantitatively benchmarked the linguistic capabilities of Vision-LLMs in terms of their trustworthiness in explaining the decision-making processes of AD [7]. Others have explored the use of VisionLLMs for vehicular maneuvering [8, 5], and [6] even validated an approach in controlled physical environments. Because AD systems involve safety-critical situations, comprehensive analyses of their vulnerabilities are crucial for reliable deployment and inference. However, proposed adoptions of Vision-LLMs into AD have been straightforward, which means existing issues (e.g., vulnerabilities against typographic attacks) in such models are likely present without proper countermeasures.

\

:::info Authors:

(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;

(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;

(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;

(4) Jie Zhang, Nanyang Technological University, Singapore;

(5) Aishan Liu, Beihang University, China;

(6) Yun Lin, Shanghai Jiao Tong University, China;

(7) Jin Song Dong, National University of Singapore, Singapore;

(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

X money beta expands with 6% yield and cashback in beta

X money beta expands with 6% yield and cashback in beta

The post X money beta expands with 6% yield and cashback in beta appeared on BitcoinEthereumNews.com. This week, Elon Musk moved another step toward his vision
Share
BitcoinEthereumNews2026/03/05 20:55
Is Doge Still The Best Crypto Investment, Or Will Pepeto Make You Rich In 2025

Is Doge Still The Best Crypto Investment, Or Will Pepeto Make You Rich In 2025

The post Is Doge Still The Best Crypto Investment, Or Will Pepeto Make You Rich In 2025 appeared on BitcoinEthereumNews.com. Crypto News 18 September 2025 | 13:39 Is Dogecoin actually running out of gas, after making people millionaires overnight? As investors hunt for the best crypto to buy now and the best crypto to invest in 2025, Dogecoin still owns the meme spotlight, yet its upside looks capped according to today’s Dogecoin price prediction. Focus is shifting toward projects that marry community with real on chain utility. People searching best crypto to buy now want shipped products, audits, and transparent tokenomics. That frames the honest matchup for this cycle, Dogecoin versus Pepeto. Meet Pepeto, an Ethereum based meme coin built with live rails, PepetoSwap for zero fee trading and Pepeto Bridge for smooth cross chain moves. By blending story with tools people can touch today, and speaking directly to crypto presale 2025 demand, Pepeto puts utility, clarity, and distribution first. In a market where older meme coins risk drifting on sentiment, Pepeto’s delivery gives it a credible seat in the best crypto investment debate. First, here is why Dogecoin may be fading. Dogecoin Price Prediction Is Dogecoin Losing Momentum Remember when Dogecoin made crypto feel effortless. In 2013, Doge turned an internet joke into money and a movement that welcomed everyone. A decade later the market is tougher and the relentless tailwind is gone, sentiment is choppier and patience matters. With Doge near $0.268, the setup reads bearish to neutral for the next few weeks. If the $0.26 shelf holds on daily closes, expect choppy range trading toward $0.29 to $0.30 where rallies keep stalling. Lose $0.26 and momentum often slides into $0.245 with risk of a deeper probe toward $0.22 to $0.21. Close back above $0.30 and the downside bias is likely neutralized, opening room for a squeeze into the low $0.30s. Beyond the price view, Dogecoin still centers…
Share
BitcoinEthereumNews2025/09/18 18:56
Surge Reload or Downside Drift Ahead?

Surge Reload or Downside Drift Ahead?

The post Surge Reload or Downside Drift Ahead? appeared on BitcoinEthereumNews.com. Pump.fun is hovering at the $0.0020 mark. PUMP’s trading volume has soared by
Share
BitcoinEthereumNews2026/03/05 21:25