The proposed method mainly consists of three components: DBD with fused label (FL), DBD. with dusting the input space ( DIS), and knowledge consolidation (KC) The ablation study on these three components is shown in Fig. 5. It can be seen that all components contributes to the continual knowledge accumulation with new data.The proposed method mainly consists of three components: DBD with fused label (FL), DBD. with dusting the input space ( DIS), and knowledge consolidation (KC) The ablation study on these three components is shown in Fig. 5. It can be seen that all components contributes to the continual knowledge accumulation with new data.

Ablation: The Role of Fused Labels and Teacher EMA in Instance-Incremental Learning

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

    \

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

5.3. Ablation study

All ablation studies are implemented on Cifar-100 dataset.

\ Figure 5. Effect of three components in DBD: fused label (FL), dusted input space (DIS), and knowledge consolidation (KC). It can be seen that all components contributes to the continual knowledge accumulation with new data.

\ Figure 6. t-SNE visualization of features after first IIL phase, where the numbers denotes classes.

\ Effect of each component. The proposed method mainly consists of three components: DBD with fused label (FL), DBD with dusting the input space (DIS), and knowledge consolidation (KC). The ablation study on these three components is shown in Fig. 5. It can be seen that DBD with all components has the largest performance promotion in all phases. Although DBD with only DIS fails to enhance the model (which can be understood), it still shows great potential in resisting CF contrasted to fine-tuning with early stopping. The bad performance of fine-tuning also verifies our analysis that learning with one-hot label causes the decision boundary shifting to other than broadening to the new data. Different from previous distillation base on one-hot label, fused label well balances the need for retaining old knowledge and learning from new observation. Combining the boundary-aware distillation with knowledge consolidation, the model can better tame the knowledge learning and retaining problem with only new data. Consolidating the knowledge to teacher model during learning not only releases the student model in learning new knowledge, but also an effective way to avoid overfitting to new data.

\ Impact of fused labels. In Sec. 4.1, the different learning demands for new outer samples and new inner samples in DBD are analyzed. We propose to use fused label for unifying knowledge retaining and learning on new data. Features learned in the first incremental phase with fused label and one-hot label are contrasted in Fig. 6. Learning with fused label not only retains most of the feature’s distribution but also has better separability (more dispersed) compared to the base model M0. While learning with one-hot label

\ Table 2. Results of incremental sub-population learning on Entity-30 benchmark. Unseen, All and Fi denote the average test accuracy on unseen subclasses in incremental data, on all seen (base data) and unseen subclasses, and the average forgetting rate over all test data. More details of the metrics can be found in ISL [13].

\ Table 3. Performance of the student network in IIL tasks by assigning different labels to new samples as learning target.

\ changes the feature distribution into a elongated shape.

\ Tab. 3 shows the performance of student model when applying different learning target to the new inner samples and new outer samples, which reveals the influence of different labels in boundary distillation. As can be seen, utilizing one-hot label for learning new samples degrades the model with a large forgetting rate. Training the inner samples with teacher score and outer samples with one-hot labels is similar with existing rehearsal-based methods which distills knowledge using the teacher’s prediction on exemplars and learns from new data with annotated labels. Such a kind of manner reduces forgetting rate but benefits less in learning new. When applying fused labels to outer samples, the student model can well aware the existing DB and has much better performance. Notably, using the fused label for all new samples achieves the best performance. FL benefits retaining the old knowledge as well as enlarging the DB.

\ Impact of DIS. To locate and distill the learned decision boundary, we dust the input space with strong Gaussian noises as perturbations. The Gaussian noises are different from the one used in data augmentation because of its high deviation. In data augmentation, the image after strong augmentation should not change its label. However, there is no such limit in DIS. We hope to relocate inputs to

\ Table 4. Analysis of sensitivity to the noise intensity and DIS loss on Cifar-100. The student’s performance is reported as it is directly affected by DIS.

\ the peripheral area among classes other than keeping them in the same category as they are. In our experiments, the Gaussian noises in pixel intensity obey N(0, 10). Sensitivity to the noise intensity and related DIS loss are shown in Tab. 4. It can be seen that when noise intensity is small as used in data augmentation, the promotion is little considering the result of base model is 64.34%. Best result is attained when the deviation of noise δ = 10. When the noise intensity is too large, it might push all input images as outer samples and do no help in locating the decision boundary. Moreover, it will significantly alter the parameters of batch normalization layers in the network, which deteriorates the model. Visualization of the dusted input image can be found in our supplementary material. Different to noise intensity, the model is less sensitive to the DIS loss factor λ.

\ Impact of knowledge consolidation. It has been proved theoretically that consolidating the knowledge to teacher model is capable to achieve a model with better generalization on both of the old task and the new task. Fig. 7 left shows the model performance with and without the KC. No matter applied on the vanilla distillation method or the proposed DBD, the knowledge consolidation demonstrates great potential in accumulating knowledge from the student model and promoting the model’s performance. However, not all model EMA strategy can works. As shown in Fig. 7 right, traditional EMA that implements after every iteration fails to accumulate knowledge, where the teacher always performs inferior to the student. Too frequent EMA will cause the teacher model soon collapsed to the student

\ Figure 7. Knowledge consolidation (KC). Left: the solid lines draw the results with KC and the dashed lines are without KC. It shows that the KC significantly promotes the performance. Right: Influence of different model EMA strategies on the teacher (t, solid line) and student (s, dashed line) model.

\ model, which causes forgetting problem and limits the following learning. Lowering the EMA frequency to every epoch (EMA epoch1) or every 5 epoch (EMA epoch5) performs better, which satisfies our theoretical analysis to keep total updating steps n properly small. Our KC-EMA which empirically performs EMA every 5 epoch with adaptive momentum attains the best result.

\

:::info Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

VivoPower To Load Up On XRP At 65% Discount: Here’s How

VivoPower To Load Up On XRP At 65% Discount: Here’s How

VivoPower International, a Nasdaq-listed B-Corp now pivoting to an XRP-centric treasury, said on September 16 it has structured its mining and treasury operations so that it can acquire the token “at up to a 65% discount” to prevailing market prices—by mining other proof-of-work assets and swapping those mined tokens. VivoPower Doubles Down On XRP The […]
Share
Bitcoinist2025/09/18 10:00
Today’s Wordle #1671 Hints And Answer For Thursday, January 15

Today’s Wordle #1671 Hints And Answer For Thursday, January 15

The post Today’s Wordle #1671 Hints And Answer For Thursday, January 15 appeared on BitcoinEthereumNews.com. How to solve today’s Wordle. SOPA Images/LightRocket
Share
BitcoinEthereumNews2026/01/15 09:05
CME Group to launch options on XRP and SOL futures

CME Group to launch options on XRP and SOL futures

The post CME Group to launch options on XRP and SOL futures appeared on BitcoinEthereumNews.com. CME Group will offer options based on the derivative markets on Solana (SOL) and XRP. The new markets will open on October 13, after regulatory approval.  CME Group will expand its crypto products with options on the futures markets of Solana (SOL) and XRP. The futures market will start on October 13, after regulatory review and approval.  The options will allow the trading of MicroSol, XRP, and MicroXRP futures, with expiry dates available every business day, monthly, and quarterly. The new products will be added to the existing BTC and ETH options markets. ‘The launch of these options contracts builds on the significant growth and increasing liquidity we have seen across our suite of Solana and XRP futures,’ said Giovanni Vicioso, CME Group Global Head of Cryptocurrency Products. The options contracts will have two main sizes, tracking the futures contracts. The new market will be suitable for sophisticated institutional traders, as well as active individual traders. The addition of options markets singles out XRP and SOL as liquid enough to offer the potential to bet on a market direction.  The options on futures arrive a few months after the launch of SOL futures. Both SOL and XRP had peak volumes in August, though XRP activity has slowed down in September. XRP and SOL options to tap both institutions and active traders Crypto options are one of the indicators of market attitudes, with XRP and SOL receiving a new way to gauge sentiment. The contracts will be supported by the Cumberland team.  ‘As one of the biggest liquidity providers in the ecosystem, the Cumberland team is excited to support CME Group’s continued expansion of crypto offerings,’ said Roman Makarov, Head of Cumberland Options Trading at DRW. ‘The launch of options on Solana and XRP futures is the latest example of the…
Share
BitcoinEthereumNews2025/09/18 00:56