This article introduces a novel Knowledge Consolidation strategy for IIL that utilizes Exponential Moving Average to transfer learned knowledge from the student to the teacher model.This article introduces a novel Knowledge Consolidation strategy for IIL that utilizes Exponential Moving Average to transfer learned knowledge from the student to the teacher model.

Model Promotion: Using EMA to Balance Learning and Forgetting in IIL

2025/11/06 00:30
3 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

    \

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

4.2. Knowledge consolidation

Different from existing IIL methods that only focus on the student model, we propose to consolidate knowledge from student to teacher for better balance between learning and forgetting. The consolidation is not implemented through learning but through model exponential moving average (EMA). Model EMA was initially introduced by Tarvainen et al. [28] to enhance the generalizability of models. In the vanilla model EMA, the model is trained from scratch, and EMA is applied after every iteration. The underlying mechanism of model EMA is not thoroughly explained before. In this work, we leverage model EMA for knowledge consolidation (KC) in the context of IIL task and explain the mechanism theoretically. According to our theoretical analysis, we propose a new KC-EMA for knowledge consolidation. Mathematically, the model EMA can be formulated as

\

\ Hence, the teacher model can achieve a minima training loss on both the old task and the new task, which indicates improved generalization on both the old data and new observations. This has been verified by our experiments in Sec. 5. However, since α < 1, it is noteworthy that the gradient of the teacher model, whether on the old task or the new task, is larger than the initial gradient on the old task or the final gradient of the student model on the new task. That is, the obtained teacher model sacrifices some unilateral performance on either the old data or the new data in order to achieve better generalization on both. From this perspective, the mechanism of vanilla EMA could also be partially explained. In vanilla EMA, where the model starts from scratch and only the new task is considered, we only need to focus on the second term in Equation 13. Since the teacher model has larger gradient on the training data than the student model, it is less possible to overfit to the training data. As a result, the teacher model has better generalization as Tarvainen et al. [28] observed.

\

\

\

:::info Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

DBS Tests Repo With Ripple RLUSD and Franklin sgBENJI

DBS Tests Repo With Ripple RLUSD and Franklin sgBENJI

The post DBS Tests Repo With Ripple RLUSD and Franklin sgBENJI appeared on BitcoinEthereumNews.com. Ripple, DBS, and Franklin Templeton launch tokenized repo pilot on DBS Exchange. Repo trades use Ripple’s RLUSD stablecoin and Franklin Templeton’s sgBENJI token. sgBENJI issued on XRP Ledger enables fast collateralized lending and settlements. DBS, Ripple, and Franklin Templeton have signed a memorandum of understanding to bring repo transactions into tokenized finance. The framework pairs Ripple’s RLUSD stablecoin with Franklin Templeton’s sgBENJI tokenized money market fund, listed on DBS Digital Exchange. The setup gives accredited clients a path to rebalance cash into a regulated, yield-bearing vehicle while transacting with stablecoins that settle within minutes. For institutions used to overnight repo desks, this is a first look at how traditional liquidity tools can migrate onto public blockchains. Related: Franklin Templeton Launches its DeFi Solution Benji on Ethereum Demand From Institutions Shapes the Design The three firms cited rising demand for digital asset allocations, with surveys showing nearly nine in ten institutional investors plan to increase exposure in 2025. The repo model was chosen because it mirrors an existing backbone of global funding markets: collateralized lending against short-term securities. By allowing RLUSD to trade directly against sgBENJI on DBS Digital Exchange, desks can manage intraday liquidity, park stablecoin reserves into a fund earning regulated yield, and unwind positions quickly when cash is needed. DBS to Expand Collateralized Lending The next phase extends sgBENJI beyond a trading instrument into repo collateral. DBS plans to let investors pledge sgBENJI against credit lines arranged through the bank or third-party lenders. That opens deeper liquidity pools with the assurance that collateral sits inside a regulated balance sheet. For trading desks, that means onchain repo could eventually function like its traditional counterpart, rolling positions overnight, secured by tokenized assets that settle in near real-time. XRP Ledger as the Settlement Rail Franklin Templeton will issue sgBENJI tokens on…
Share
BitcoinEthereumNews2025/09/18 20:25
The Four Service Models That Actually Generate Revenue

The Four Service Models That Actually Generate Revenue

A practical guide to four repeatable AI service models—Speed-to-Lead, Workflow Automation, Specialized AI Training, and Productized Automation—with pricing, workflows
Share
Crypto Breaking News2026/03/16 20:08
2 Cryptocurrencies Under $0.50 That Could Reach $2.50 This Cycle

2 Cryptocurrencies Under $0.50 That Could Reach $2.50 This Cycle

In a market where most sub-$1 coins are speculation-driven, there are certain projects which are beginning to break through by offering real-world utility and long-term value for growth. Dogecoin (DOGE) and Mutuum Finance (MUTM) currently trading in the sub-$0.50 zone, have recently gained attention for their potential to hit as high as $2.50 in the […]
Share
Cryptopolitan2025/09/18 17:30