Sparse Spectral Training (SST) introduces a mathematically grounded framework for optimizing neural networks using low-rank spectral decompositions. By focusing on gradient direction rather than scale, SST reduces computational overhead while maintaining learning stability. The paper proves zero distortion with SVD initialization and enhanced gradient performance compared to default methods like LoRA and HyboNet. Extensive experiments on translation, language generation, and graph neural networks demonstrate SST’s efficiency and accuracy, showing its promise as a scalable alternative to full-rank training.Sparse Spectral Training (SST) introduces a mathematically grounded framework for optimizing neural networks using low-rank spectral decompositions. By focusing on gradient direction rather than scale, SST reduces computational overhead while maintaining learning stability. The paper proves zero distortion with SVD initialization and enhanced gradient performance compared to default methods like LoRA and HyboNet. Extensive experiments on translation, language generation, and graph neural networks demonstrate SST’s efficiency and accuracy, showing its promise as a scalable alternative to full-rank training.

Here’s Why AI Researchers Are Talking About Sparse Spectral Training

Abstract and 1. Introduction

  1. Related Work

  2. Low Rank Adaptation

    3.1 LoRA and 3.2 Limitation of LoRA

    3.3 ReLoRA*

  3. Sparse Spectral Training

    4.1 Preliminaries and 4.2 Gradient Update of U, VT with Σ

    4.3 Why SVD Initialization is Important

    4.4 SST Balances Exploitation and Exploration

    4.5 Memory-Efficient Implementation for SST and 4.6 Sparsity of SST

  4. Experiments

    5.1 Machine Translation

    5.2 Natural Language Generation

    5.3 Hyperbolic Graph Neural Networks

  5. Conclusion and Discussion

  6. Broader Impacts and References

Supplementary Information

A. Algorithm of Sparse Spectral Training

B. Proof of Gradient of Sparse Spectral Layer

C. Proof of Decomposition of Gradient of Weight

D. Proof of Advantage of Enhanced Gradient over Default Gradient

E. Proof of Zero Distortion with SVD Initialization

F. Experiment Details

G. Singular Value Pruning

H. Evaluating SST and GaLore: Complementary Approaches to Memory Efficiency

I. Ablation Study

A Algorithm of Sparse Spectral Training

B Proof of Gradient of Sparse Spectral Layer

We can express the differential of W as the sum of differentials:

\ \

\ \ We have chain rule for the gradient of W:

\ \

\ \ \

\

C Proof of Decomposition of Gradient of Weight

\

\

D Proof of Advantage of Enhanced Gradient over Default Gradient

\

\ \ \

\ \ \

\ \ As only the direction of update matters, the scale of update can be adjusted by changing learning rate. We measure similarity using the Frobenius norm of the differences between SST updates and 3 times of the full-rank update.

\ \

\

E Proof of Zero Distortion with SVD Initialization

\

F Experiment Details

F.1 Implementation Details for SST

\

\ \ \

\

F.2 Hyperparameters of Machine Translation

IWSLT’14. The hyperparameters can be found in Table 6. We employ the same codebase and hyperparameters as those used in HyboNet [12], which is derived from OpenNMT-py [54]. The final model checkpoint is utilized for evaluation. Beam search, with a beam size of 2, is employed to optimize the evaluation process. Experiments were conducted on one A100 GPU.

\ For SST, number of steps per iteration (T3) is set to 200. Each iteration begins with a warmup phase lasting 20 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST.

\ \ Table 6: Hyperparameters on IWSLT’14 for Euclidean and hyperbolic Transformer.

\ \ \

\ \ For SST, number of steps per iteration (T3) is set to 200 for Multi30K and 400 for IWSLT’17. Each iteration begins with a warmup phase lasting 20 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST

F.3 Hyperparameters of Natural Language Generation

The hyperparameters for our experiments are detailed in Table 8. We employ a linear warmup of 2000 steps followed by a stable learning rate, without decay. A larger learning rate (0.001) is used for only low rank parameters (U, VT and Σ for SST, B and A for LoRA and ReLoRA*. The total training tokens for each experiment is 19.7B, roughly 2 epochs of OpenWebText. Distributed training is facilitated using the Accelerate [55] library across four A100 GPUs on a Linux server.

\ For SST, number of steps per iteration (T3) is set to 200. Each iteration begins with a warmup phase lasting 20 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST.

\ \ Table 7: Hyperparameters on Multi30K and IWSLT’17 for vanilla Transformer.

\ \ \ Table 8: Hyperparameters for OPT Models

\

F.4 Hyperparameters of Hyperbolic Graph Neural Networks

We use HyboNet [12] as full-rank model, with same hyperparameters as those used in HyboNet. Experiments were conducted on one A100 GPU.

\ For SST, number of steps per iteration (T3) is set to 100. Each iteration begins with a warmup phase lasting 100 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST.

\ We set dropout rate to 0.5 for the LoRA and SST methods during the node classification task on the Cora dataset. This is the only one deviation from the HyboNet configuration.

\ \ \

:::info Authors:

(1) Jialin Zhao, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(2) Yingtao Zhang, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(3) Xinghang Li, Department of Computer Science;

(4) Huaping Liu, Department of Computer Science;

(5) Carlo Vittorio Cannistraci, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Computer Science, and Department of Biomedical Engineering Tsinghua University, Beijing, China.

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
WHY Logo
WHY Price(WHY)
$0.00000001895
$0.00000001895$0.00000001895
0.00%
USD
WHY (WHY) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

U.S. Moves Grip on Crypto Regulation Intensifies

U.S. Moves Grip on Crypto Regulation Intensifies

The post U.S. Moves Grip on Crypto Regulation Intensifies appeared on BitcoinEthereumNews.com. The United States is contending with the intricacies of cryptocurrency regulation as newly enacted legislation stirs debate over centralized versus decentralized finance. The recent passage of the GENIUS Act under Bo Hines’ leadership is perceived to skew favor towards centralized entities, potentially disadvantaging decentralized innovations. Continue Reading:U.S. Moves Grip on Crypto Regulation Intensifies Source: https://en.bitcoinhaber.net/u-s-moves-grip-on-crypto-regulation-intensifies
Share
BitcoinEthereumNews2025/09/18 01:09
Shocking OpenVPP Partnership Claim Draws Urgent Scrutiny

Shocking OpenVPP Partnership Claim Draws Urgent Scrutiny

The post Shocking OpenVPP Partnership Claim Draws Urgent Scrutiny appeared on BitcoinEthereumNews.com. The cryptocurrency world is buzzing with a recent controversy surrounding a bold OpenVPP partnership claim. This week, OpenVPP (OVPP) announced what it presented as a significant collaboration with the U.S. government in the innovative field of energy tokenization. However, this claim quickly drew the sharp eye of on-chain analyst ZachXBT, who highlighted a swift and official rebuttal that has sent ripples through the digital asset community. What Sparked the OpenVPP Partnership Claim Controversy? The core of the issue revolves around OpenVPP’s assertion of a U.S. government partnership. This kind of collaboration would typically be a monumental endorsement for any private cryptocurrency project, especially given the current regulatory climate. Such a partnership could signify a new era of mainstream adoption and legitimacy for energy tokenization initiatives. OpenVPP initially claimed cooperation with the U.S. government. This alleged partnership was said to be in the domain of energy tokenization. The announcement generated considerable interest and discussion online. ZachXBT, known for his diligent on-chain investigations, was quick to flag the development. He brought attention to the fact that U.S. Securities and Exchange Commission (SEC) Commissioner Hester Peirce had directly addressed the OpenVPP partnership claim. Her response, delivered within hours, was unequivocal and starkly contradicted OpenVPP’s narrative. How Did Regulatory Authorities Respond to the OpenVPP Partnership Claim? Commissioner Hester Peirce’s statement was a crucial turning point in this unfolding story. She clearly stated that the SEC, as an agency, does not engage in partnerships with private cryptocurrency projects. This response effectively dismantled the credibility of OpenVPP’s initial announcement regarding their supposed government collaboration. Peirce’s swift clarification underscores a fundamental principle of regulatory bodies: maintaining impartiality and avoiding endorsements of private entities. Her statement serves as a vital reminder to the crypto community about the official stance of government agencies concerning private ventures. Moreover, ZachXBT’s analysis…
Share
BitcoinEthereumNews2025/09/18 02:13
OpenVPP accused of falsely advertising cooperation with the US government; SEC commissioner clarifies no involvement

OpenVPP accused of falsely advertising cooperation with the US government; SEC commissioner clarifies no involvement

PANews reported on September 17th that on-chain sleuth ZachXBT tweeted that OpenVPP ( $OVPP ) announced this week that it was collaborating with the US government to advance energy tokenization. SEC Commissioner Hester Peirce subsequently responded, stating that the company does not collaborate with or endorse any private crypto projects. The OpenVPP team subsequently hid the response. Several crypto influencers have participated in promoting the project, and the accounts involved have been questioned as typical influencer accounts.
Share
PANews2025/09/17 23:58