AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.

Smarter AI Training with Few-Shot Natural Language Tasks

2025/10/02 17:00
3 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

A Few-shot NLU Datasets

Data. In contrast to the fully supervised setting in the above experiments, we also perform fewshot experiments following the prior study (Wang et al., 2021) on six tasks including MNLI (Williams et al., 2018), RTE (Dagan et al., 2005; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009), QQP[1] and SST-2 (Socher et al.). The results are reported on their development set following (Zhang et al., 2021). MPQA (Wiebe et al., 2005) and Subj (Pang and Lee, 2004) are used for polarity and subjectivity detection, where we follow (Gao et al., 2021) to keep 2, 000 examples for testing. The few-shot model only has access to |K| labeled samples for any task. Following true few-shot learning setting (Perez et al., 2021; Wang et al., 2021), we do not use any additional validation set for any hyper-parameter tuning or early stopping. The performance of each model is reported after fixed number of training epochs. For a fair comparison, we use the same set of few-shot labeled instances for training as in (Wang et al., 2021). We train each model with 5 different seeds and report average performance with standard deviation across the runs. In the few-shot experiments, we follow (Wang et al., 2021) to train AdaMix via the prompt-based fine-tuning strategy. In contrast to (Wang et al., 2021), we do not use any unlabeled data.

\

B Ablation Study

\ Table 11: Ablation study demonstrating the impact of parameter sharing in AdaMix adapter framework.

\

C Detailed Results on NLU Tasks

The results on NLU tasks are included in Table 1 and Table 13. The performance AdaMix with RoBERTa-large encoder achieves the best performance in terms of different task metrics in the GLUE benchmark. AdaMix with adapters is the

\ \ Table 12: Varying the bottleneck dimension of adapters in AdaMix with BERT-base and RoBERTa-large encoder. * denotes the bottleneck dimension used in AdaMix with adapters.

\ \ only PEFT method which outperforms full model fine-tuning on all the tasks and on average score. Additionally, the improvement brought by AdaMix is more significant with BERT-base as the encoder, demonstrating 2.2% and 1.2% improvement over the performance of full model fine-tuning and the best performing baseline UNIPELT with BERTbase. The improvement is observed to be consistent as that with RoBERTa-large on every task. The NLG results are included in Table 4 and 5.

D Hyper-parameter

Detailed hyper-parameter configuration for different tasks presented in Table 15 and Table 16.

\

:::info Authors:

(1) Yaqing Wang, Purdue University ([email protected]);

(2) Sahaj Agarwal, Microsoft ([email protected]);

(3) Subhabrata Mukherjee, Microsoft Research ([email protected]);

(4) Xiaodong Liu, Microsoft Research ([email protected]);

(5) Jing Gao, Purdue University ([email protected]);

(6) Ahmed Hassan Awadallah, Microsoft Research ([email protected]);

(7) Jianfeng Gao, Microsoft Research ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] https://www.quora.com/q/quoradata/

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

Hong Kong Backs Commercial Bank Tokenized Deposits in 2025

Hong Kong Backs Commercial Bank Tokenized Deposits in 2025

The post Hong Kong Backs Commercial Bank Tokenized Deposits in 2025 appeared on BitcoinEthereumNews.com. HKMA to support tokenized deposits and regular issuance of digital bonds. SFC drafting licensing framework for trading, custody, and stablecoin issuers. New rules will cover stablecoin issuers, digital asset trading, and custody services. Hong Kong is stepping up its digital finance ambitions with a policy blueprint that places tokenization at the core of banking innovation.  In the 2025 Policy Address, Chief Executive John Lee outlined measures that will see the Hong Kong Monetary Authority (HKMA) encourage commercial banks to roll out tokenized deposits and expand the city’s live tokenized-asset transactions. Hong Kong’s Project Ensemble to Drive Tokenized Deposits Lee confirmed that the HKMA will “continue to take forward Project Ensemble, including encouraging commercial banks to introduce tokenised deposits, and promoting live transactions of tokenised assets, such as the settlement of tokenised money market funds with tokenised deposits.” The initiative aims to embed tokenized deposits, bank liabilities represented as blockchain-based tokens, into mainstream financial operations. These deposits could facilitate the settlement of money-market funds and other financial instruments more quickly and efficiently. To ensure a controlled rollout, the HKMA will utilize its regulatory sandbox to enable banks to test tokenized products while enhancing risk management. Tokenized Bonds to Become a Regular Feature Beyond deposits, the government intends to make tokenized bond issuance a permanent element of Hong Kong’s financial markets. After successful pilots, including green bonds, the HKMA will help regularize the issuance process to build deep and liquid markets for digital bonds accessible to both local and international investors. Related: Beijing Blocks State-Owned Firms From Stablecoin Businesses in Hong Kong Hong Kong’s Global Financial Role The policy address also set out a comprehensive regulatory framework for digital assets. Hong Kong is implementing a regime for stablecoin issuers and drafting licensing rules for digital asset trading and custody services. The Securities…
Share
BitcoinEthereumNews2025/09/18 07:10
TRX Price Prediction: Testing $0.32-$0.35 Resistance Zone as Technical Momentum Builds

TRX Price Prediction: Testing $0.32-$0.35 Resistance Zone as Technical Momentum Builds

TRON (TRX) consolidates at $0.28 with neutral RSI signals. Technical analysis suggests potential breakout toward $0.32-$0.35 resistance zone amid mixed momentum
Share
BlockChain News2026/03/04 15:57
Pi Network DEX Launch Confirmed for March 12, 2026: A New Chapter for Picoin and Web3 Trading

Pi Network DEX Launch Confirmed for March 12, 2026: A New Chapter for Picoin and Web3 Trading

    Pi Network has officially confirmed the launch date of its decentralized exchange (DEX), scheduled for Marc
Share
Hokanews2026/03/04 15:52