AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.

Smarter AI Training with Few-Shot Natural Language Tasks

2025/10/02 17:00

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

A Few-shot NLU Datasets

Data. In contrast to the fully supervised setting in the above experiments, we also perform fewshot experiments following the prior study (Wang et al., 2021) on six tasks including MNLI (Williams et al., 2018), RTE (Dagan et al., 2005; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009), QQP[1] and SST-2 (Socher et al.). The results are reported on their development set following (Zhang et al., 2021). MPQA (Wiebe et al., 2005) and Subj (Pang and Lee, 2004) are used for polarity and subjectivity detection, where we follow (Gao et al., 2021) to keep 2, 000 examples for testing. The few-shot model only has access to |K| labeled samples for any task. Following true few-shot learning setting (Perez et al., 2021; Wang et al., 2021), we do not use any additional validation set for any hyper-parameter tuning or early stopping. The performance of each model is reported after fixed number of training epochs. For a fair comparison, we use the same set of few-shot labeled instances for training as in (Wang et al., 2021). We train each model with 5 different seeds and report average performance with standard deviation across the runs. In the few-shot experiments, we follow (Wang et al., 2021) to train AdaMix via the prompt-based fine-tuning strategy. In contrast to (Wang et al., 2021), we do not use any unlabeled data.

\

B Ablation Study

\ Table 11: Ablation study demonstrating the impact of parameter sharing in AdaMix adapter framework.

\

C Detailed Results on NLU Tasks

The results on NLU tasks are included in Table 1 and Table 13. The performance AdaMix with RoBERTa-large encoder achieves the best performance in terms of different task metrics in the GLUE benchmark. AdaMix with adapters is the

\ \ Table 12: Varying the bottleneck dimension of adapters in AdaMix with BERT-base and RoBERTa-large encoder. * denotes the bottleneck dimension used in AdaMix with adapters.

\ \ only PEFT method which outperforms full model fine-tuning on all the tasks and on average score. Additionally, the improvement brought by AdaMix is more significant with BERT-base as the encoder, demonstrating 2.2% and 1.2% improvement over the performance of full model fine-tuning and the best performing baseline UNIPELT with BERTbase. The improvement is observed to be consistent as that with RoBERTa-large on every task. The NLG results are included in Table 4 and 5.

D Hyper-parameter

Detailed hyper-parameter configuration for different tasks presented in Table 15 and Table 16.

\

:::info Authors:

(1) Yaqing Wang, Purdue University ([email protected]);

(2) Sahaj Agarwal, Microsoft ([email protected]);

(3) Subhabrata Mukherjee, Microsoft Research ([email protected]);

(4) Xiaodong Liu, Microsoft Research ([email protected]);

(5) Jing Gao, Purdue University ([email protected]);

(6) Ahmed Hassan Awadallah, Microsoft Research ([email protected]);

(7) Jianfeng Gao, Microsoft Research ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] https://www.quora.com/q/quoradata/

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Coinbase Vs. State Regulators: Crypto Exchange Fights Legal Fragmentation

Coinbase Vs. State Regulators: Crypto Exchange Fights Legal Fragmentation

US-based crypto exchange Coinbase has made a significant appeal to the Department of Justice (DOJ) regarding a wave of lawsuits aimed at its operations. The company is urging federal action to address what it describes as an “increasingly fragmented and hostile” regulatory landscape for the crypto market. Coinbase Urges Federal Action  In a recent letter, Coinbase highlighted the steps taken by the current Administration to create a more equitable framework for digital asset regulation. This includes the introduction of stablecoin legislation and two pending bipartisan market-structure bills aimed at fostering uniformity in the oversight of cryptocurrencies.  Coinbase argues that these initiatives have begun to mitigate the adverse effects of the previous Administration’s enforcement-driven regulatory approach.  However, the company warns that certain states are perpetuating this problematic trend by adopting “expansive and flawed” interpretations of securities laws and implementing new licensing requirements that undermine the federal government’s pro-innovation stance. Related Reading: REX Shares Claims Its DOGE And XRP Spot ETFs Will Be Approved By US SEC Tomorrow They make an example with the Oregon Attorney General, who has filed a lawsuit against Coinbase, claiming that many digital assets traded on its platform qualify as alleged unregistered securities.  The letter affirms that the suit not only targets Coinbase but also encourages other states to address what the Attorney General perceives as a regulatory gap left by federal authorities.  Similarly, the New York Attorney General has initiated legal action to regulate transactions involving digital assets based on decentralized protocols as securities, further complicating the regulatory environment. Coinbase has faced cease-and-desist orders from four states, which demand the company halt its retail staking services. These orders are deemed by Coinbase as “legally unfounded and inconsistent.” Unified Framework For Digital Assets In light of these challenges, the letter to the DOJ calls for urgent federal intervention to establish broad preemption provisions. The crypto exchange argues that preemption has historically been an effective tool for addressing state interference in national markets, referencing past Congressional actions. Coinbase contends that the current patchwork of state regulations not only disrupts market efficiency but also leads to unequal access to cryptocurrency services based on geographic location. Related Reading: Citi’s Ethereum Forecast: No New All-Time High Expected, Year-End Target At $4,300 To remedy these issues, Coinbase advocates for Congress to adopt legislation that would exempt federally regulated digital assets from state blue-sky laws and clarify that state licensing requirements do not apply to crypto intermediaries.  Additionally, the company urges the SEC to expedite rulemaking and provide clearer guidance on why digital asset transactions and services, including staking, should not be classified as securities. Such clarity would help prevent states from imposing conflicting regulations based on their interpretations of securities laws. Featured image from Shutterstock, chart from TradingView.com
Share
NewsBTC2025/09/18 15:00
Maryland Man Sentenced for Allegedly Aiding North Korea’s US Company Infiltration and Sensitive Data Access

Maryland Man Sentenced for Allegedly Aiding North Korea’s US Company Infiltration and Sensitive Data Access

The post Maryland Man Sentenced for Allegedly Aiding North Korea’s US Company Infiltration and Sensitive Data Access appeared on BitcoinEthereumNews.com. North Korea’s IT workers infiltrated US companies through a Maryland man’s scheme, earning over $970,000 while enabling access to sensitive government systems. This operation supported the regime’s cyber activities, including crypto hacks that stole $2 billion in 2025, funding nuclear programs. Minh Phuong Ngoc Vong sentenced to 15 months in prison for aiding North Korean infiltration. He used fake credentials to secure jobs at 13 US firms, passing work to overseas conspirators. North Korea stole $2 billion in crypto in 2025 via hacks, totaling over $6 billion recently, per blockchain analytics firm Elliptic. Discover how North Korea’s IT infiltration and crypto hacking schemes threaten US security. Learn the details of the Maryland case and regime’s $6B theft. Stay informed on cybersecurity risks today. What is North Korea’s IT Infiltration Scheme in US Companies? North Korea’s IT infiltration scheme involves covertly placing regime-affiliated workers into US companies using fake identities to generate revenue and access sensitive systems. In a recent Maryland case, Minh Phuong Ngoc Vong was sentenced to 15 months in prison and three years of supervised release for facilitating this for three years across 13 companies. The operation netted over $970,000, much of which funded North Korea’s weapons programs through software work performed by overseas actors, including those in China near the border. How Does North Korea Use Crypto Hacking to Fund Its Programs? North Korea employs sophisticated cyber groups to target cryptocurrency exchanges and wallets, stealing digital assets that convert to fiat for regime funding. According to blockchain analytics firm Elliptic, these groups pilfered approximately $2 billion in cryptocurrencies in 2025 alone, contributing to a total exceeding $6 billion in recent years from hacks on platforms like Bybit and Upbit. This influx directly supports nuclear and missile development, as confirmed by US intelligence assessments. Experts note the regime’s…
Share
BitcoinEthereumNews2025/12/06 09:12