AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.

Smarter AI Training with Few-Shot Natural Language Tasks

2025/10/02 17:00

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

A Few-shot NLU Datasets

Data. In contrast to the fully supervised setting in the above experiments, we also perform fewshot experiments following the prior study (Wang et al., 2021) on six tasks including MNLI (Williams et al., 2018), RTE (Dagan et al., 2005; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009), QQP[1] and SST-2 (Socher et al.). The results are reported on their development set following (Zhang et al., 2021). MPQA (Wiebe et al., 2005) and Subj (Pang and Lee, 2004) are used for polarity and subjectivity detection, where we follow (Gao et al., 2021) to keep 2, 000 examples for testing. The few-shot model only has access to |K| labeled samples for any task. Following true few-shot learning setting (Perez et al., 2021; Wang et al., 2021), we do not use any additional validation set for any hyper-parameter tuning or early stopping. The performance of each model is reported after fixed number of training epochs. For a fair comparison, we use the same set of few-shot labeled instances for training as in (Wang et al., 2021). We train each model with 5 different seeds and report average performance with standard deviation across the runs. In the few-shot experiments, we follow (Wang et al., 2021) to train AdaMix via the prompt-based fine-tuning strategy. In contrast to (Wang et al., 2021), we do not use any unlabeled data.

\

B Ablation Study

\ Table 11: Ablation study demonstrating the impact of parameter sharing in AdaMix adapter framework.

\

C Detailed Results on NLU Tasks

The results on NLU tasks are included in Table 1 and Table 13. The performance AdaMix with RoBERTa-large encoder achieves the best performance in terms of different task metrics in the GLUE benchmark. AdaMix with adapters is the

\ \ Table 12: Varying the bottleneck dimension of adapters in AdaMix with BERT-base and RoBERTa-large encoder. * denotes the bottleneck dimension used in AdaMix with adapters.

\ \ only PEFT method which outperforms full model fine-tuning on all the tasks and on average score. Additionally, the improvement brought by AdaMix is more significant with BERT-base as the encoder, demonstrating 2.2% and 1.2% improvement over the performance of full model fine-tuning and the best performing baseline UNIPELT with BERTbase. The improvement is observed to be consistent as that with RoBERTa-large on every task. The NLG results are included in Table 4 and 5.

D Hyper-parameter

Detailed hyper-parameter configuration for different tasks presented in Table 15 and Table 16.

\

:::info Authors:

(1) Yaqing Wang, Purdue University ([email protected]);

(2) Sahaj Agarwal, Microsoft ([email protected]);

(3) Subhabrata Mukherjee, Microsoft Research ([email protected]);

(4) Xiaodong Liu, Microsoft Research ([email protected]);

(5) Jing Gao, Purdue University ([email protected]);

(6) Ahmed Hassan Awadallah, Microsoft Research ([email protected]);

(7) Jianfeng Gao, Microsoft Research ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] https://www.quora.com/q/quoradata/

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Missed Bitcoin’s ICO? BullZilla’s Explosive Stage 13 Surge Is Your Second Shot

Missed Bitcoin’s ICO? BullZilla’s Explosive Stage 13 Surge Is Your Second Shot

The post Missed Bitcoin’s ICO? BullZilla’s Explosive Stage 13 Surge Is Your Second Shot appeared on BitcoinEthereumNews.com. Crypto Projects Bitcoin early believers made millions, and BullZilla Stage 13 is giving a new chance for those hunting the best crypto presales to buy with explosive ROI potential. Do cryptocurrency opportunities really come twice, or does lightning only strike once for those hunting the best crypto presales to buy? The world still talks about Bitcoin’s earliest days when the price hovered near pennies, and only a small circle of curious technophiles understood what was coming. Those early believers stacked thousands of coins when the market barely noticed them. Today, that tiny window sits in history as proof that early entries can build life-changing gains. Bitcoin’s rise from cents to tens of thousands of dollars remains the most prominent example of missed fortunes in the digital asset world. The story now moves into a new chapter as BullZilla climbs through its presale with a setup that feels familiar to anyone who watched Bitcoin explode long after ignoring it at the bottom. With the presale live, BullZilla brings a structure that pulls in traders searching for the best crypto presales to buy while regret-filled communities ask whether this could be their redemption moment. Stage 13 Zilla Sideways Smash shows the project heating up and attracting attention from those who once wished for a second chance at early prices before the next massive wave takes off. BullZilla Presale at a glance Stage: Stage 13 (Zilla Sideways Smash) Phase: 3 Current Price: $0.00033905 Presale Tally: Over $1M+ Raised  Token Holders: Over 3700 Tokens Sold: Over 32 B  Current ROI: ($1,454.75% ) from Stage 13C to the Listing Price of $0.00527 ROI until Stage 13C for the Earliest Joiners: $5,796.52% $1000 Investment =2.949 million $BZIL Tokens Upcoming Price Surge = 1.96% increase in 13D from 0.00033905 to 0.00034572 Join the BullZilla presale now while…
Share
BitcoinEthereumNews2025/12/10 07:15
US SEC Chairman: Many types of cryptocurrency ICOs are not under the SEC's jurisdiction.

US SEC Chairman: Many types of cryptocurrency ICOs are not under the SEC's jurisdiction.

PANews reported on December 10th, citing The Block, that SEC Chairman Paul Atkins stated at the Blockchain Association's annual policy summit on Tuesday that many types of Initial Coin Offerings (ICOs) should be considered non-securities transactions and are outside the jurisdiction of Wall Street regulators. He explained that this is precisely what the SEC wants to encourage, as these types of transactions, by their definition, do not fall under the category of securities. Atkins specifically mentioned the token taxonomy he introduced last month, which divides the crypto industry into four categories of tokens. He pointed out last month that network tokens, digital collectibles, and digital instruments should not be considered securities in themselves. On Tuesday, he further stated that ICOs involving these three types of tokens should also be considered non-securities transactions, meaning they are not subject to SEC regulation. Atkins also mentioned that, regarding initial coin offerings (ICOs), the SEC believes the only type of token it should regulate is tokenized securities, which are tokenized forms of securities already under SEC regulation and traded on-chain. He further explained that ICOs span four themes, three of which fall under the jurisdiction of the CFTC. The SEC will delegate these matters to the CFTC, while focusing on regulating tokenized securities.
Share
PANews2025/12/10 07:16
China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise

China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise

The post China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise appeared on BitcoinEthereumNews.com. China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise China’s internet regulator has ordered the country’s biggest technology firms, including Alibaba and ByteDance, to stop purchasing Nvidia’s RTX Pro 6000D GPUs. According to the Financial Times, the move shuts down the last major channel for mass supplies of American chips to the Chinese market. Why Beijing Halted Nvidia Purchases Chinese companies had planned to buy tens of thousands of RTX Pro 6000D accelerators and had already begun testing them in servers. But regulators intervened, halting the purchases and signaling stricter controls than earlier measures placed on Nvidia’s H20 chip. Image: Nvidia An audit compared Huawei and Cambricon processors, along with chips developed by Alibaba and Baidu, against Nvidia’s export-approved products. Regulators concluded that Chinese chips had reached performance levels comparable to the restricted U.S. models. This assessment pushed authorities to advise firms to rely more heavily on domestic processors, further tightening Nvidia’s already limited position in China. China’s Drive Toward Tech Independence The decision highlights Beijing’s focus on import substitution — developing self-sufficient chip production to reduce reliance on U.S. supplies. “The signal is now clear: all attention is focused on building a domestic ecosystem,” said a representative of a leading Chinese tech company. Nvidia had unveiled the RTX Pro 6000D in July 2025 during CEO Jensen Huang’s visit to Beijing, in an attempt to keep a foothold in China after Washington restricted exports of its most advanced chips. But momentum is shifting. Industry sources told the Financial Times that Chinese manufacturers plan to triple AI chip production next year to meet growing demand. They believe “domestic supply will now be sufficient without Nvidia.” What It Means for the Future With Huawei, Cambricon, Alibaba, and Baidu stepping up, China is positioning itself for long-term technological independence. Nvidia, meanwhile, faces…
Share
BitcoinEthereumNews2025/09/18 01:37