Object detection has evolved from hand-crafted features to deep CNNs with much higher accuracy, but most production systems are still stuck with fixed label sets that are expensive to update. New open-vocabulary, vision-language detectors (like Grounding DINO) let you detect arbitrary, prompt-defined concepts and achieve strong zero-shot performance on benchmarks, even without dataset-specific labels. The most practical approach today is hybrid: use these promptable models as teachers and auto-annotators, then distill their knowledge into small, closed-set detectors you can reliably deploy on edge devices.Object detection has evolved from hand-crafted features to deep CNNs with much higher accuracy, but most production systems are still stuck with fixed label sets that are expensive to update. New open-vocabulary, vision-language detectors (like Grounding DINO) let you detect arbitrary, prompt-defined concepts and achieve strong zero-shot performance on benchmarks, even without dataset-specific labels. The most practical approach today is hybrid: use these promptable models as teachers and auto-annotators, then distill their knowledge into small, closed-set detectors you can reliably deploy on edge devices.

From Fixed Labels to Prompts: How Vision-Language Models Are Re-Wiring Object Detection

2025/12/04 13:50

Object detection has become the backbone of a lot of important products like safety systems that know where hands are near dangerous machinery, retail analytics counting products and people, autonomous vehicles, warehouse robots, ergonomic assessment tools, and more. Traditionally, those systems all shared one big assumption: you must decide up front which objects matter, hard-code that label set, and then spend a lot of time, money, and human resources in annotating data for those classes.

\ Vision-language models (VLMs) and open-vocabulary object detectors (OVD) eliminate this assumption completely. Instead of baking labels into the weights, you pass them in as prompts: “red mug”, “overhead luggage bin”, “safety helmet”, “tablet on the desk.” And surprisingly, the best of these models now match or even beat strong closed-set detectors without ever seeing that dataset’s labels.

\ In my day job, I work on real-time, on-device computer vision for ergonomics and workplace safety, think iPads or iPhones checking posture, reach, and PPE in warehouses and aircraft cabins. For a long time, every new “Can we detect X?” request meant another round of data collection, labeling, and retraining. When we started experimenting with open-vocabulary detectors, the workflow flipped: we could prompt for new concepts, see if the signals looked promising in real video, and only then decide whether it was worth investing in a dedicated closed-set model.

\ This article walks through:

  • How we got from HOG + SIFT to modern deep detectors
  • Why closed-set object detection is painful in production
  • What open-vocabulary / VLM-based detectors actually do
  • Benchmarks comparing classical, deep closed-set, and open-vocabulary models
  • A practical pattern: use OVDs as annotators, then distill to efficient closed-set models

1. Object detection 101: what and why?

Object detection tries to answer two questions for each image (or video frame):

  1. What is in the scene? (class labels)
  2. Where is it? (bounding boxes, sometimes masks)

Unlike plain image classification (one label per image), detection says “two people, one laptop, one chair, one cup” with coordinates. That’s what makes it useful for:

  • Safety – detecting people, PPE, vehicles, tools
  • Automation – robots localizing objects to pick or avoid
  • Analytics – counting products, tracking usage, analyzing posture
  • Search – “find all images where someone is holding a wrench”

In traditional pipelines, the object catalog (your label set) is fixed, for example, 80 COCO classes, or 1,203 LVIS classes. Adding “blue cardboard box”, “broken pallet”, or a specific SKU later is where things start to hurt.


2. A very quick history: from HOG to deep nets

2.1 Pre-deep learning: HOG, DPM, Regionlets

Before deep learning, detectors used hand-crafted features like HOG (Histograms of Oriented Gradients) and part-based models. You’d slide a window over the image, compute features, and run a classifier.

Two representative classical systems on PASCAL VOC 2007:

  • Deformable Part Models – a landmark part-based detector; later versions reached 33.7% mAP on VOC 2007 (no context).
  • Regionlets – richer region-based features plus boosted classifiers; achieved 41.7% mAP on VOC 2007.

VOC 2007 has 5,011 train+val images and 4,952 test images (9,963 total).

2.2 Deep learning arrives: R-CNN, Fast/Faster R-CNN

Then came CNNs:

  • Fast R-CNN (VGG16 backbone) trained on VOC 2007+2012 (“07+12”) improved VOC 2007 mAP to 70.0%.
  • Faster R-CNN (RPN + Fast R-CNN with VGG16) pushed that further to 73.2% mAP on VOC 2007 test using the same 07+12 training split.

The 07+12 setup uses VOC 2007 trainval (5,011 images) + VOC 2012 trainval, giving about 16.5k training images.

So on the same dataset, going from hand-crafted to CNNs roughly doubled performance:

Table 1 – Classical vs deep detectors on PASCAL VOC 2007

| Dataset | Model | # training images (VOC) | mAP @ 0.5 | |----|----|----|----| | VOC 2007 test | DPM voc-release5 (no context) | 5,011 (VOC07 trainval) | 33.7% | | VOC 2007 test | Regionlets | 5,011 (VOC07 trainval) | 41.7% | | VOC 2007 test | Fast R-CNN (VGG16, 07+12) | ≈16.5k (VOC07+12) | 70.0% | | VOC 2007 test | Faster R-CNN (VGG16, 07+12) | ≈16.5k (VOC07+12) | 73.2% |

That’s the story we’ve been telling for a decade: deep learning crushed classical detection.

But all of these are closed-set: you pick a fixed label list, and the model can’t recognize anything outside it.


3. Why closed-set deep detectors are painful in production

Closed-set detectors (Faster R-CNN, YOLO, etc.) are great if:

  • You know your label set in advance
  • It won’t change much
  • You can afford a full collect → annotate → train → validate → deploy loop each time you tweak it

In practice, especially in enterprise settings:

  • Stakeholders constantly invent new labels (“Can we detect and track this new tool?”).
  • Data is expensive – bounding box or mask annotation for niche industrial objects costs real money.
  • Model teams end up with a backlog of “can we add this label?” tickets that require yet another retrain.

Technically, closed-set detectors are optimized for one label space:

  • Classification heads have fixed size (e.g., 80 COCO classes, or 1,203 LVIS classes).
  • Adding classes often means changing the last layer and re-training or at least fine-tuning on freshly annotated data.
  • If you’re running on-device (phones, tablets, edge boxes), you also need those models to stay small and fast, which constrains how often you can change them.

This is where open-vocabulary detectors and vision-language models become interesting.


4. Open-vocabulary object detection: prompts instead of fixed labels

Open-vocabulary detectors combine two ideas:

  1. Vision backbone – like a detector / transformer that proposes regions.
  2. Language backbone – text encoder (often CLIP-style) that turns prompts like “red cup” or “overhead bin” into embeddings.

Instead of learning a classifier over a fixed set of one-hot labels, the detector learns to align region features and text embeddings in a shared space. At inference time, you can pass any string: “steel toe boot”, “forklift”, “wrench”, “coffee stain”, and the model scores regions against those text prompts.

Examples:

  • Grounding DINO – text-conditioned detector that achieves 52.5 AP on COCO detection in zero-shot transfer, i.e., without any COCO training data. After fine-tuning on COCO, it reaches 63.0 AP.
  • YOLO-World – a YOLO-style open-vocabulary detector that reaches 35.4 AP on LVIS in zero-shot mode at 52 FPS on a V100 GPU.

These models are usually pre-trained on millions of image–text pairs from the web, then sometimes fine-tuned on detection datasets with large vocabularies.

Visual comparison: promptable Grounding DINO vs. closed-set Fast R-CNN

In the side-by-side image below, the open-vocabulary Grounding DINO model is prompted with fine-grained phrases like “armrests,” “mesh backrest,” “seat cushion,” and “chair,” and it correctly identifies each region, not just the overall object. This works because Grounding DINO connects image regions with text prompts during inference, enabling it to recognize categories that weren’t in its original training list. In contrast, the closed-set Fast R-CNN model is trained on a fixed set of categories (such as those in the PASCAL VOC or COCO label space), so it can only detect the broader “chair” class and misses the finer parts. This highlights the real-world advantage of promptable detectors: they can adapt to exactly what you ask for without retraining, while still maintaining practical performance. It also shows why open-vocabulary models are so promising for dynamic environments where new items, parts, or hazards appear regularly.

Promptable vs. closed-set detection on the same scene. Grounding DINO (left) identifies armrests, mesh backrest, seat cushion, and the overall chair; Fast RCNN (right) detects only the chair. Photo: © 2025 Balaji Sundareshan: original photo by the author.


5. Benchmarks: closed-set vs open-vocabulary on COCO

Let’s look at COCO 2017, the standard 80-class detection benchmark. COCO train2017 has about 118k training images and 5k val images.

A strong closed-set baseline:

  • EfficientDet-D7, a fully supervised detector, achieves 52.2 AP (COCO AP@[0.5:0.95]) on test-dev with 52M parameters.

Now compare that to Grounding DINO:

  • 52.5 AP zero-shot on COCO detection without any COCO training data.
  • 63.0 AP after fine-tuning on COCO.

Table 2 – COCO closed-set vs open-vocabulary

| Dataset | Model | # training images from COCO | AP@[0.5:0.95] | |----|----|----|----| | COCO 2017 test-dev | EfficientDet-D7 (closed-set) | 118k (train2017) | 52.2 AP | | COCO det. (zero-shot) | Grounding DINO (open-vocab, zero-shot) | 0 (no COCO data) | 52.5 AP | | COCO det. (supervised) | Grounding DINO (fine-tuned) | 118k (train2017) | 63.0 AP |

You can fairly say:

An open-vocabulary detector, trained on other data, matches a COCO-specific SOTA detector on COCO, and then beats it once you fine-tune.

That’s a strong argument for reusability: with OVDs, you get decent performance on new domains without painstaking dataset-specific labeling.

In our own experiments on office ergonomics product, we’ve seen a similar pattern: a promptable detector gets us to a usable baseline quickly, and a small fine-tuned model does the heavy lifting in production.


6. Benchmarks on LVIS: long-tail, large vocabulary

COCO has 80 classes. LVIS v1.0 is more realistic for enterprise: ~100k train images, ~20k val, and 1,203 categories with a long-tailed distribution.

6.1 Closed-set LVIS

The Copy-Paste paper benchmarks strong instance/detection models on LVIS v1.0. With EfficientNet-B7 NAS-FPN and a two-stage training scheme, they report:

  • 41.6 Box AP on LVIS v1.0 using ~100k training images plus advanced augmentation.

Another line of work, Detic hits 41.7 mAP on the standard LVIS benchmark across all classes, using LVIS annotations plus additional image-level labels.

6.2 Zero-shot open-vocabulary on LVIS

Two representative OVDs:

  • YOLO-World: 35.4 AP on LVIS in zero-shot mode at 52 FPS.
  • Grounding DINO 1.5 Edge: 36.2 AP on LVIS-minival in zero-shot transfer, while running at 75.2 FPS with TensorRT.

These models use no LVIS training images, they rely on large-scale pre-training with grounding annotations and text labels, then are evaluated on LVIS as a new domain.

Table 3 – LVIS: closed-set vs open-vocabulary

| Dataset / split | Model | # training images from LVIS | AP (box) | |----|----|----|----| | LVIS v1.0 (val) | Eff-B7 NAS-FPN + Copy-Paste (closed-set) | 100k (LVIS train) | 41.6 AP | | LVIS v1.0 (all classes) | Detic (open-vocab-friendly, LVIS-trained) | 100k (LVIS train) | 41.7 mAP | | LVIS v1.0 (zero-shot) | YOLO-World (open-vocab, zero-shot) | 0 (no LVIS data) | 35.4 AP | | LVIS-minival (zero-shot) | Grounding DINO 1.5 Edge (open-vocab, edge-optimized) | 0 (no LVIS data) | 36.2 AP |

Takeaway that you can safely emphasize:

On LVIS, the best open-vocabulary detectors reach ~35–36 AP in pure zero-shot mode, not far behind strong closed-set models in the low-40s AP that use 100k fully annotated training images.

That’s a powerful trade-off story for enterprises: ~10k+ human hours of annotation vs zero LVIS labels for a ~5–6 AP gap.

In one of our internal pilots, we used an open-vocab model to sweep through a few hundred hours of warehouse video with prompts like “forklift”, “ladder”, and “cardboard boxes on the floor.” The raw detections were noisy, but they gave our annotators a huge head start: instead of hunting for rare events manually, they were editing candidate boxes. That distilled into a compact closed-set model we could actually ship on edge hardware, and it only existed because the open-vocab model gave us a cheap way to explore the long tail.


7. Limitations of open-vocabulary detection

Open-vocabulary detectors aren’t magic. They introduce new problems:

  1. Prompt sensitivity & hallucinations
  • “cup” vs “mug” vs “coffee cup” can change detections.
  • If you prompt with something that isn’t there (“giraffe” in an office), the model may still confidently hallucinate boxes.
  1. Calibration & thresholds
  • Scores aren’t always calibrated across arbitrary text prompts, so you may need prompt-specific thresholds or re-scoring.
  1. Latency & compute
  • Foundation-scale models (big backbones, large text encoders) can be heavy for edge devices.
  • YOLO-World and Grounding DINO 1.5 Edge show this is improving: 35.4 AP at 52 FPS, 36.2 AP at 75 FPS, but you’re still in GPU/accelerator territory.
  1. Governance & safety
  • Because they’re text-driven, you have to think about who controls the prompts and how to log/approve them in safety-critical systems. \n

So while OVDs are amazing for exploration, prototyping, querying, and rare-class detection, you might not always want to ship them directly to every edge device.


8. A practical recipe: OVD as annotator, closed-set as worker

A pattern that makes sense for many enterprises:

  1. Use an open-vocabulary detector as a “labeling assistant”
  • Run Grounding DINO / YOLO-World over your video/image streams with prompts like “pallet”, “fallen pallet”, “phone in hand”, “ladder”.
  • Let your annotators edit rather than draw boxes from scratch.
  • This creates a large, high-quality task-specific labeled dataset cheaply.
  1. Train a lean closed-set detector
  • Define the final label set you actually need in production.
  • Train an EfficientDet / YOLO / RetinaNet / lightweight transformer on your auto-bootstrapped dataset.
  • You now get fast, small, hardware-friendly models that are easy to deploy on edge devices (iPads, Jetsons, on-prem boxes).
  1. Iterate by “querying” the world with prompts
  • When product asks, “Can we also track X?” you don’t need to re-instrument hardware:
    • First, run an OVD with new prompts to mine candidate instances of X.
    • Curate + clean those labels.
    • Fine-tune or extend your closed-set detector with the new class.

This gives you the best of both worlds:

  • Open-vocabulary detectors act as a flexible, promptable teacher.
  • Closed-set detectors become the fast, robust, cheap workers that actually run everywhere. \n

9. Where this leaves us

If you zoom out over the last 15 years:

  • HOG + DPM and friends gave us ~30–40 mAP on VOC 2007.
  • CNN detectors like Fast/Faster R-CNN doubled that to ~70+ mAP on the same benchmark.
  • Large-scale detectors like EfficientDet hit 52.2 AP on COCO; open-vocabulary models like Grounding DINO match that without COCO labels and surpass it when fine-tuned.
  • On LVIS, zero-shot OVDs are only a few AP behind fully supervised large-vocab detectors that rely on 100k densely annotated images. \n

The story for readers is simple:

  • Yesterday: you picked a label set, paid a lot for labels, and got a good closed-set detector.
  • Today: you can prompt a detector with natural language, get decent zero-shot performance on new domains, and use that to cheaply bootstrap specialized detectors.
  • Tomorrow: the line between “object detection” and “ask a question about the scene” will blur even more, as vision-language models continue to eat classical CV tasks. \n

If you’re building enterprise systems, it’s a good time to start treating prompts as the new label files and vision-language detectors as your first stop for exploration, before you commit to yet another closed-set training cycle.

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

The Adoption of Web3 in Europe: Current Status, Opportunities, and Challenges

The Adoption of Web3 in Europe: Current Status, Opportunities, and Challenges

How decentralization technologies are advancing in the Old Continent.
Share
The Cryptonomist2025/12/06 15:00
Wang Yongli, former vice president of the Bank of China: Why did China resolutely halt stablecoins?

Wang Yongli, former vice president of the Bank of China: Why did China resolutely halt stablecoins?

Written by: Wang Yongli , former Vice President of Bank of China China's policy orientation of accelerating the development of the digital yuan and resolutely curbing virtual currencies, including stablecoins, is now fully clear. This is based on a comprehensive consideration of factors such as China's leading global advantages in mobile payments and the digital yuan, the sovereignty and security of the yuan, and the stability of the monetary and financial system. Since May 2025, the United States and Hong Kong have been racing to advance stablecoin legislation, which has led to a surge in global legislation on stablecoins and crypto assets (also known as "cryptocurrencies" or "virtual currencies"). A large number of institutions and capital are flocking to issue stablecoins and invest in crypto assets, which has also sparked heated debate on whether China should fully promote stablecoin legislation and the development of RMB stablecoins (including offshore ones). Furthermore, after the United States legislated to prohibit the Federal Reserve from issuing digital dollars, whether China should continue to promote digital RMB has also become a hot topic of debate. For China, this involves the direction and path of national currency development. With the global spread of stablecoins and the increasingly acute and complex international relations and fiercer international currency competition, this has a huge and far-reaching impact on how the RMB innovates and develops, safeguards national security, and achieves the strategic goals of a strong currency and a financial power. We must calmly analyze, accurately grasp, and make decisions early. We cannot be indifferent or hesitant, nor can we blindly follow the trend and make directional and subversive mistakes. Subsequently, the People's Bank of China announced that it would optimize the positioning of the digital yuan within the monetary hierarchy (adjusting the previously determined M0 positioning. This is a point I have repeatedly advocated from the beginning; see Wang Yongli's WeChat public account article "Digital Yuan Should Not Be Positioned as M0" dated January 6, 2021), further optimize the digital yuan management system (establishing an international digital yuan operations center in Shanghai, responsible for cross-border cooperation and use of the digital yuan; and establishing a digital yuan operations management center in Beijing, responsible for the construction, operation, and maintenance of the digital yuan system), and promote and accelerate the development of the digital yuan . On November 28, the People's Bank of China and 13 other departments jointly convened a meeting of the coordination mechanism for combating virtual currency trading and speculation. The meeting pointed out that due to various factors, virtual currency speculation has recently resurfaced, and related illegal and criminal activities have occurred frequently, posing new challenges to risk prevention and control. It emphasized that all units should deepen coordination and cooperation, continue to adhere to the prohibitive policy on virtual currencies, and persistently crack down on illegal financial activities related to virtual currencies. It clarified that stablecoins are a form of virtual currency , and their issuance and trading activities are also illegal and subject to crackdown. This has greatly disappointed those who believed that China would promote the development of RMB stablecoins and correspondingly relax the ban on virtual currency (crypto asset) trading. Therefore, China's policy orientation of accelerating the development of the digital yuan and resolutely curbing virtual currencies, including stablecoins, is now fully clear . Of course, this policy orientation remains highly debated both domestically and internationally, and there is no consensus among the public. So, how should we view this major policy direction of China? This article will first answer why China resolutely halted stablecoins; how to accelerate the innovative development of the digital yuan will be discussed in another article . There is little room or opportunity for the development of non-USD stablecoins. Since Tether launched USDT, a stablecoin pegged to the US dollar, in 2014 , USD stablecoins have been operating for over a decade and have formed a complete international operating system. They have basically dominated the entire crypto asset trading market, accounting for over 99% of the global fiat stablecoin market capitalization and trading volume . This situation arises from two main factors. First, the US dollar is the most liquid and has the most comprehensive supporting system of international central currencies, making stablecoins pegged to the dollar the easiest to accept globally. Second, it is also a result of the US's long-standing tolerant policy towards crypto assets like Bitcoin and dollar-denominated stablecoins, rather than leading the international community to strengthen necessary regulation and safeguard the fundamental interests of all humanity. Even this year, when the US pushed for legislation on stablecoins and crypto assets, it was largely driven by the belief that dollar-denominated stablecoins would increase global demand for the dollar and dollar-denominated assets such as US Treasury bonds, reduce the financing costs for the US government and society, and strengthen the dollar's international dominance. This was a choice made to enhance US support for dollar-denominated stablecoins and control their potential impact on the US, prioritizing the maximization of national interests while giving little consideration to mitigating the international risks of stablecoins. With the US strongly promoting dollar-denominated stablecoins, other countries or regions launching non-dollar fiat currency stablecoins will find it difficult to compete with dollar-denominated stablecoins on an international level, except perhaps within their own sovereign territory or on the issuing institution's own e-commerce platform. Their development potential and practical significance are limited . Lacking a strong ecosystem and application scenarios, and lacking distinct characteristics compared to dollar-denominated stablecoins, as well as the advantage of attracting traders and transaction volume, the return on investment for issuing non-dollar fiat currency stablecoins is unlikely to meet expectations, and they will struggle to survive in an environment of increasingly stringent legislation and regulation in various countries. The legislation on stablecoins in the United States still faces many problems and challenges. Following President Trump's second election victory, his strong advocacy for crypto assets such as Bitcoin fueled a new international frenzy in cryptocurrency trading, driving the rapid development of dollar-denominated stablecoin trading and a surge in stablecoin market capitalization. This not only increased demand for the US dollar and US Treasury bonds, strengthening the dollar's international status, but also brought huge profits to the Trump family and their cryptocurrency associates. However, this also posed new challenges to the global monitoring of the dollar's circulation and the stability of the traditional US financial system. Furthermore, the trading and transfer of crypto assets backed by dollar-denominated stablecoins has become a new and more difficult-to-prevent tool for the US to harvest global wealth, posing a serious threat to the monetary sovereignty and wealth security of other countries . This is why the United States has accelerated legislation on stablecoins, but its legislation is more about prioritizing America and maximizing American and even group interests, at the expense of the interests of other countries and the common interests of the world. After the legislation on US dollar stablecoins came into effect, institutions that have not obtained approval and operating licenses from US regulators will find it difficult to issue and operate US dollar stablecoins in the United States (for this reason, Tether has announced that it will apply for US-issued USDT). Stablecoin issuers subject to US regulation must meet regulatory requirements such as Know Your Customer (KYC), Anti-Money Laundering (AML), and Counter-Terrorist Financing (FTC). They must be able to screen customers against government watchlists and report suspicious activities to regulators. Their systems must have the ability to freeze or intercept specific stablecoins when ordered by law enforcement agencies. Stablecoin issuers must have reserves of no less than 100% US dollar assets (including currency assets, short-term Treasury bonds, and repurchase agreements backed by Treasury bonds) approved by regulators, and must keep US customer funds in US banks and not transfer them overseas. They are prohibited from paying interest or returns on stablecoins, and strict control must be exercised over-issuance and self-operation. Reserve assets must be held in custody by an independent institution approved by regulators and must be audited by an auditing firm at least monthly and an audit report must be issued. This will greatly enhance the value stability of stablecoins relative to the US dollar, strengthen their payment function and compliance, while weakening their investment attributes and illegal use; it will also significantly increase the regulatory costs of stablecoins, thereby reducing their potential for exorbitant profits in an unregulated environment. The US stablecoin legislation officially took effect on July 18, but it still faces numerous challenges : While it stipulates the scope of reserve assets for stablecoin issuance (bank deposits, short-term Treasury bonds, repurchase agreements backed by Treasury bonds, etc.), since it primarily includes Treasury bonds with fluctuating trading prices, even if reserve assets are sufficient at the time of issuance, a subsequent decline in Treasury bond prices could lead to insufficient reserves; if the reserve asset structures of different issuing institutions are not entirely consistent, and there is no central bank guarantee, it means that the issued dollar stablecoins will not be the same, creating arbitrage opportunities and posing challenges to relevant regulation and market stability; even if there is no over-issuance of stablecoins at the time of issuance, allowing decentralized finance (DeFi) to engage in stablecoin lending could still lead to stablecoin derivation and over-issuance, unless it is entirely a matchmaking between lenders and borrowers rather than proprietary trading; getting stablecoin issuers outside of financial institutions to meet regulatory requirements is not easy, and regulation also presents significant challenges. More importantly, the earliest and most fundamental requirement for stablecoins is the borderless, decentralized, 24/7 pricing and settlement of crypto assets on the blockchain. It is precisely because crypto assets like Bitcoin cannot fulfill the fundamental requirement of currency as a measure of value and a value token—that the total amount of currency must change in line with the total value of tradable wealth requiring monetary pricing and settlement—that their price relative to fiat currency fluctuates wildly (therefore, using crypto assets like Bitcoin as collateral or strategic reserves carries significant risks), making it difficult to become a true circulating currency. This has led to the development of fiat stablecoins pegged to fiat currencies. (Therefore, Bitcoin and similar crypto assets can only be considered crypto assets; calling them "cryptocurrency" or "virtual currency" is inaccurate; translating the English word "Token" as "币" or "币" is also inappropriate; it should be directly transliterated as "通证" and clearly defined as an asset, not currency.) The emergence and development of fiat-backed stablecoins have brought fiat currencies and more real-world assets (RWAs) onto the blockchain, strongly supporting on-chain cryptocurrency trading and development. They serve as a channel connecting the on-chain cryptocurrency world with the off-chain real-world, thereby strengthening the integration and influence of the cryptocurrency world on the real world. This will significantly enhance the scope, speed, scale, and volatility of global wealth financialization and financial transactions, accelerating the transfer and concentration of global wealth in a few countries or groups. In this context, failing to strengthen global joint regulation of stablecoins and cryptocurrency issuance and trading poses extremely high risks and dangers . Therefore, the surge in stablecoin and cryptocurrency development driven by the Trump administration in the United States has already revealed a huge bubble and potential risks, making it unsustainable. The international community must be highly vigilant about this! Stablecoin legislation could severely backfire on stablecoins. One unexpected outcome of stablecoin legislation is that the inclusion of fiat-backed stablecoins in legislative regulation will inevitably lead to legislative regulation of crypto asset transactions denominated and settled using fiat-backed stablecoins, including blockchain-generated assets such as Bitcoin and on-chain real-world assets (RWA). This will have a profound impact on stablecoins. Before crypto assets receive legislative regulation and compliance protection, licensed financial institutions such as banks find it difficult to directly participate in crypto asset trading, clearing, custody, and other related activities, thus ceding opportunities to private organizations outside of financial institutions. Due to the lack of regulation and the absence of regulatory costs, existing stablecoin issuers and crypto asset trading platforms have become highly profitable and attractive entities, exerting an increasing impact on banks and the financial system, forcing governments and monetary authorities in countries like the United States to accelerate legislative regulation of stablecoins. However, once crypto assets receive legislative regulation and compliance protection, banks and other financial institutions will undoubtedly participate fully. Payment institutions such as banks can directly promote the on-chain operation of fiat currency deposits (deposit tokenization), completely replacing stablecoins as a new channel and hub connecting the crypto world and the real world . Similarly, existing stock, bond, money market fund, and ETF exchanges can promote the on-chain trading of these relatively standardized financial products through RWA (Real-Time Asset Exchange). Having adequately regulated financial institutions such as banks act as the main entities connecting the crypto world and the real world on the blockchain is more conducive to implementing current legislative requirements for stablecoins, upholding the principle of "equal regulation for the same business" for all institutions, and reducing the impact and risks of crypto asset development on the existing monetary and financial system. This trend has already emerged in the United States and is rapidly intensifying, proving difficult to stop . Therefore, stablecoin legislation may seriously backfire on or subvert stablecoins ( see Wang Yongli's WeChat public account article "Stablecoin Legislation May Seriously Backfire on Stablecoins" on September 3, 2025 ). In this situation, it is not a reasonable choice for other countries to follow the US lead and vigorously promote stablecoin legislation and development. China should not follow the path of stablecoins taken by the United States. China already has a leading global advantage in mobile payments and the digital yuan. Promoting a stablecoin for the yuan has no advantage domestically, and it will have little room for development and influence internationally. It should not follow the path of the US dollar stablecoin, but should instead focus on promoting the development of stablecoins for the yuan, both domestically and offshore. More importantly, crypto assets and stablecoins like Bitcoin can achieve 24/7 global trading and clearing through borderless blockchains and crypto asset trading platforms. While this significantly improves efficiency, the highly anonymous and high-frequency global flow, lacking coordinated international oversight, makes it difficult to meet regulatory requirements such as KYC, AML, and FTC. This poses a clear risk and has been demonstrated in real-world cases of being used for money laundering, fundraising fraud, and illegal cross-border fund transfers. Given that US dollar stablecoins already dominate the crypto asset trading market, and the US has greater control or influence over major global blockchain operating systems, crypto asset trading platforms, and the exchange rate between crypto assets and the US dollar (as evidenced by the US's ability to trace, identify, freeze, and confiscate the crypto asset accounts of some institutions and individuals, and to punish or even arrest some crypto asset trading platforms and their leaders), China's development of a RMB stablecoin following the path of US dollar stablecoins not only fails to challenge the international status of US dollar stablecoins but may even turn the RMB stablecoin into a vassal of US dollar stablecoins. This could impact national tax collection, foreign exchange management, and cross-border capital flows, posing a serious threat to the sovereignty and security of the RMB and the stability of the monetary and financial system. Faced with a more acute and complex international situation, China should prioritize national security and exercise high vigilance and strict control over the trading and speculation of crypto assets, including stablecoins, rather than simply pursuing increased efficiency and reduced costs . It is necessary to accelerate the improvement of relevant regulatory policies and legal frameworks, focus on key links such as information flow and capital flow, strengthen information sharing among relevant departments, further enhance monitoring and tracking capabilities, and severely crack down on illegal and criminal activities involving crypto assets. Of course, while resolutely halting stablecoins and cracking down on virtual currency trading and speculation, we must also accelerate the innovative development and widespread application of the digital yuan at home and abroad, establish the international leading advantage of the digital yuan, forge a Chinese path for the development of digital currency, and actively explore the establishment of a fair, reasonable and secure new international monetary and financial system . Taking into account the above factors, it is not difficult to understand why China has chosen to resolutely curb virtual currencies, including stablecoins, while firmly promoting and accelerating the development of the digital yuan.
Share
PANews2025/12/06 15:08