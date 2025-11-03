ExchangeDEX+
Buy CryptoMarketsSpotFutures500XEarnEvents
More
Blue Chip Blitz
This study examines Transformer architectures' reasoning limitations using global reasoning challenges and syllogism composition as a framework. The authors show that Transformers encounter an exponential rise in learning difficulty as task complexity increases by formalizing the cycle problem, a synthetic benchmark that necessitates long-chain logical inference. Distribution localization, a measure of how many tokens beyond the fundamental statistics are required to meaningfully correlate with the goal output, is the idea they put up to explain this.This study examines Transformer architectures' reasoning limitations using global reasoning challenges and syllogism composition as a framework. The authors show that Transformers encounter an exponential rise in learning difficulty as task complexity increases by formalizing the cycle problem, a synthetic benchmark that necessitates long-chain logical inference. Distribution localization, a measure of how many tokens beyond the fundamental statistics are required to meaningfully correlate with the goal output, is the idea they put up to explain this.

Why Transformers Struggle with Global Reasoning

By: Hackernoon
2025/11/03 19:10
WHY
WHY$0.00000002282+0.35%
RISE
RISE$0.008081-4.72%

Abstract and 1. Introduction

1.1 Syllogisms composition

1.2 Hardness of long compositions

1.3 Hardness of global reasoning

1.4 Our contributions

  1. Results on the local reasoning barrier

    2.1 Defining locality and auto-regressive locality

    2.2 Transformers require low locality: formal results

    2.3 Agnostic scratchpads cannot break the locality

  2. Scratchpads to break the locality

    3.1 Educated scratchpad

    3.2 Inductive Scratchpads

  3. Conclusion, Acknowledgments, and References

A. Further related literature

B. Additional experiments

C. Experiment and implementation details

D. Proof of Theorem 1

E. Comment on Lemma 1

F. Discussion on circuit complexity connections

G. More experiments with ChatGPT

\

1.3 Hardness of global reasoning

As discussed previously, the cycle task appears to be challenging for Transformers as it requires some global reasoning. Other tasks such as subset parities exhibit the same challenge. However the latter can be proved to be not efficiently learnable by various regular neural networks and noisy gradient descent, as one can get explicitly a class of functions (through orbit arguments [12, 13]) that has large statistical dimension [14] or low cross-predictability [12, 15] (see Appendix A.2). For the cycle task, we have a single distribution, and it is unclear how to use the invariances of Transformers to get arguments as in [12, 13], as the input distribution is not invariant under the symmetries of the model. We thus would like to develop a more general complexity measure that unifies why such tasks are hard for Transformer-like models and that formalizes the notion of ‘local reasoning barrier’ when models are trained from scratch. We also would like to understand how the

\ Figure 1: Illustration of the cycle task for n = 4 (left) and the complexity to learn it (right).

\ scratchpad methodologies that have proved helpful in various settings (see Section 3) can help here. This raises the questions:

\ (1) How can we formalize the ‘local reasoning barrier’ in general terms?

\ (2) Can we break the ‘local reasoning barrier’ with scratchpad methodologies?

1.4 Our contributions

We provide the following contributions:

– A general conjecture (Conjecture 1), backed by experimental results, that claims efficient weak learning is achievable by a regular Transformer if and only if the distribution locality is constant.

\ – A theorem (Theorem 1) that proves the negative side of the above conjecture, the locality barrier, in the instance of a variant of the cycle task under certain technical assumptions. (The cycle task is also put forward in the paper as a simple benchmark to test the global reasoning capabilities of models.)

\ • We then switch to the use of ‘scratchpads’ to help with the locality barrier:

\ – Agnostic scratchpad: we extend Theorem 1 to cases where a polynomial-size scratchpad is used by the Transformer, without any supervision of the scratchpad. I.e., the scratchpad gives additional memory space for the Transformer to compute intermediate steps. This shows that efficient weak learning is still not possible with such an agnostic scratchpad if the locality is non-constant. An educated guess about what to learn in the scratchpad based on some target knowledge is thus required.

\ – Educated scratchpad: we generalize the measure of locality to the ‘autoregressive locality’ to quantify when an educated scratchpad is able to break the locality of a task with subtasks of lower locality. We give experimental results showing that educated scratchpads with constant autoregressive locality allow Transformers to efficiently learn tasks that may originally have high locality. This gives a way to measure how useful a scratchpad can be to break a target into easier sub-targets.

\ – We introduce the notion of inductive scratchpad, a type of educated scratchpad that exploits ‘induction’ compared to a fully educated scratchpad. We show that when the target admits an inductive decomposition, such as for the cycle, arithmetic, or parity tasks, the inductive scratchpad both breaks the locality and improves the OOD generalization in contrast to fully educated scratchpads. This gives significant length generalization on additions (from 10 to 20 or from 4 to 26 depending on the method) and parities (from 30 to 50-55). For instance, using different methods, [17] can length generalize from 10 to 13 digits for additions, and [11] can get roughly 10 extra bits for parities with moderate accuracy.

\

:::info Authors:

(1) Emmanuel Abbe, Apple and EPFL;

(2) Samy Bengio, Apple;

(3) Aryo Lotf, EPFL;

(4) Colin Sandon, EPFL;

(5) Omid Saremi, Apple.

:::

:::info This paper is available on arxiv under CC BY 4.0 license.

:::

[1] Answering ‘yes/1’ if the syllogism can be obtained by composing input ones or ‘cannot tell/0’ otherwise.

\ [2] At the time of the experiments, ChatGPT was in particular not successful at these two tasks.

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Markets await Fed’s first 2025 cut, experts bet “this bull market is not even close to over”

Markets await Fed’s first 2025 cut, experts bet “this bull market is not even close to over”

Will the Fed’s first rate cut of 2025 fuel another leg higher for Bitcoin and equities, or does September’s history point to caution? First rate cut of 2025 set against a fragile backdrop The Federal Reserve is widely expected to…
Tron Bull
BULL$0.001226+6.60%
Fuel
FUEL$0.00261-3.69%
Notcoin
NOT$0.0007725-0.45%
Share
Crypto.news2025/09/18 00:27
Which Best Crypto Presale Offers 100x?

Which Best Crypto Presale Offers 100x?

The post Which Best Crypto Presale Offers 100x? appeared on BitcoinEthereumNews.com. Crypto Presales Meta Description: Discover which of the leading crypto presales, Digitap ($TAP), BlockchainFX, or Bitcoin Hyper, offers the best 100x potential with innovative technologies. In a market filled with opportunities, could the next 100x crypto presale be lying in plain sight? Among the hottest tokens right now are Digitap ($TAP), BlockchainFX, and Bitcoin Hyper, each targeting various pain points in the crypto and traditional finance sectors. Digitap, the world’s first omni-bank, has already raised almost $1.7 million in its ongoing presale, giving investors a chance to buy $TAP tokens at just $0.0297, with a launch price of $0.14. Digitap could just be the best crypto to buy now in 2025. However, read on to find out why Digitap, BlockchainFX, and Bitcoin Hyper are emerging as top altcoins to buy this Q4. BlockchainFX: The Bridge Between Crypto and TradFi? BlockchainFX is aiming to enhance the trading environment with its all-in-one, crypto-native platform, which enables users to trade over 500 assets, including cryptocurrencies, forex, stocks, ETFs, futures, options, and bonds, all in one location. As one of the promising altcoins to buy in 2025, the $BFX token offers holders a unique option to earn daily rewards in USDT from up to 70% of the trading costs on the platform. With more than $10 million raised in the ongoing presale, $BFX has made good progress. However, Digitap ($TAP) shows stronger potential when comparing technological depth and real-world utility. Unlike BlockchainFX, Digitap integrates AI-enhanced routing for faster, borderless transactions and operates on a three-layer protocol. This advanced design gives Digitap a broader, more scalable edge, making it a more future-ready contender in the race for financial innovation. Bitcoin Hyper: Scaling Bitcoin for the Future? Bitcoin Hyper is targeting Bitcoin’s key limitations, which include slow transactions, high fees, and lack of programmability, by offering…
TAP Protocol
TAP$0.324-1.51%
Hyperlane
HYPER$0.17517-2.80%
Nowchain
NOW$0.00228+10.14%
Share
BitcoinEthereumNews2025/11/11 02:01
Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

The post Polygon Tops RWA Rankings With $1.1B in Tokenized Assets appeared on BitcoinEthereumNews.com. Key Notes A new report from Dune and RWA.xyz highlights Polygon’s role in the growing RWA sector. Polygon PoS currently holds $1.13 billion in RWA Total Value Locked (TVL) across 269 assets. The network holds a 62% market share of tokenized global bonds, driven by European money market funds. The Polygon POL $0.25 24h volatility: 1.4% Market cap: $2.64 B Vol. 24h: $106.17 M network is securing a significant position in the rapidly growing tokenization space, now holding over $1.13 billion in total value locked (TVL) from Real World Assets (RWAs). This development comes as the network continues to evolve, recently deploying its major “Rio” upgrade on the Amoy testnet to enhance future scaling capabilities. This information comes from a new joint report on the state of the RWA market published on Sept. 17 by blockchain analytics firm Dune and data platform RWA.xyz. The focus on RWAs is intensifying across the industry, coinciding with events like the ongoing Real-World Asset Summit in New York. Sandeep Nailwal, CEO of the Polygon Foundation, highlighted the findings via a post on X, noting that the TVL is spread across 269 assets and 2,900 holders on the Polygon PoS chain. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 Key Trends From the 2025 RWA Report The joint publication, titled “RWA REPORT 2025,” offers a comprehensive look into the tokenized asset landscape, which it states has grown 224% since the start of 2024. The report identifies several key trends driving this expansion. According to…
B
B$0.15087+9.36%
MemeCore
M$2.43657+1.44%
Threshold
T$0.01288-0.54%
Share
BitcoinEthereumNews2025/09/18 00:40

Trending News

More

Markets await Fed’s first 2025 cut, experts bet “this bull market is not even close to over”

Which Best Crypto Presale Offers 100x?

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

Bill In Advance To End The US Government Shutdown

Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be

Quick Reads

More

DOGE Price Prediction & Analysis: Will Dogecoin Hit $50 by 2030?

Dropee Complete Guide: Earn Crypto Airdrops with Daily Quiz Game

Solana(SOL) Price Prediction 2030: Will SOL Reach 1,000 USDT?

What Is Privacy Coin? Top Privacy Coins to Trade in 2025

EOS Price Prediction: Can EOS Reach $50 or Even $100 in the Next 10 Years?

Crypto Prices

mc_price_img_alt

Bitcoin

BTC

$106,048.85
$106,048.85$106,048.85

+0.94%

mc_price_img_alt

Ethereum

ETH

$3,569.69
$3,569.69$3,569.69

+1.42%

mc_price_img_alt

XRP

XRP

$2.5487
$2.5487$2.5487

+0.77%

mc_price_img_alt

Solana

SOL

$167.60
$167.60$167.60

+0.78%

mc_price_img_alt

DOGE

DOGE

$0.18076
$0.18076$0.18076

+0.85%