This article evaluates RECKONING's generalizability on the real-world multi-hop logical reasoning task, FOLIO.This article evaluates RECKONING's generalizability on the real-world multi-hop logical reasoning task, FOLIO.

RECKONING: Reasoning through Dynamic Knowledge Encoding: Generalization to Real-World knowledge

2025/10/25 00:41

Abstract and 1. Introduction

  1. Background

  2. Method

  3. Experiments

    4.1 Multi-hop Reasoning Performance

    4.2 Reasoning with Distractors

    4.3 Generalization to Real-World knowledge

    4.4 Run-time Analysis

    4.5 Memorizing Knowledge

  4. Related Work

  5. Conclusion, Acknowledgements, and References

\ A. Dataset

B. In-context Reasoning with Distractors

C. Implementation Details

D. Adaptive Learning Rate

E. Experiments with Large Language Models

4.3 Generalization to Real-World knowledge

To investigate how generalizable our method is to real-world knowledge beyond the synthetic setting, we evaluate RECKONING on a more real-world multi-hop logical reasoning task, FOLIO [29], and report the result in Table 2. The dataset has a rich vocabulary, diverse logic patterns, and abundant language variations. It has been shown to challenge LLMs in both supervised fine-tuning and in-context learning settings. We fine-tune the GPT-2 model following the in-context reasoning setting as the baseline. As before, we train the GPT-2 model and RECKONING using the multi-task objective. We also compare to more advanced baselines, including GPT-3.5 (text-davinci-003 [55]) and ChatGPT(gpt-3.5-turbo[2]), two popular large language models with around 175B parameters. For these two large models, we evaluate both in the zero-shot and few-shot settings. In the few-shot setting, we prompt the model with 8 single-task examples randomly sampled from the training set to perform in-context learning. We find that RECKONING’s performance (which is initiated here from GPT-2) is better than the GPT-2 in-context reasoning baseline. Compared to the two advanced large language models, RECKONING outperforms them by a significant margin (12% 0-shot and 7% 8-shot). We conclude that RECKONING is effective and significantly benefits reasoning tasks using real-world knowledge.

\ Table 2: Evaluation results on FOLIO. We compare RECKONING against the FT-ICR baseline with GPT-2 and two popular large language models.

\

:::info Authors:

(1) Zeming Chen, EPFL ([email protected]);

(2) Gail Weiss, EPFL ([email protected]);

(3) Eric Mitchell, Stanford University ([email protected])';

(4) Asli Celikyilmaz, Meta AI Research ([email protected]);

(5) Antoine Bosselut, EPFL ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

2 https://openai.com/blog/chatgpt

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Tokenization Key to Modernizing US Markets

Tokenization Key to Modernizing US Markets

The post Tokenization Key to Modernizing US Markets appeared on BitcoinEthereumNews.com. The Strategy: SEC Chair Paul Atkins designates “tokenization” as the industrial strategy to modernize US capital markets, launching the “Project Crypto” initiative. The Rules: A new “Token Taxonomy” will legally separate Digital Commodities, Collectibles, and Tools from Securities, ending the “regulation by enforcement” era. The Privacy: The SEC’s Dec 15 roundtable will feature Zcash founder Zooko Wilcox, signaling a potential policy thaw on privacy-preserving infrastructure. Securities and Exchange Commission (SEC) Chair Paul Atkins has formally aligned the agency’s mission with the digital asset revolution, declaring “tokenization” as the critical alpha required to modernize America’s aging capital markets infrastructure.  In a definitive signal to Wall Street, Atkins outlined the next phase of “Project Crypto,” a comprehensive regulatory overhaul designed to integrate blockchain rails into the federal securities system. Related: U.S. SEC Signals Privacy Enhancement in Tokenization of Securities U.S. SEC Chair Touts Tokenization as the Needed Element for Modernizing Capital Markets According to Chair Atkins, tokenization is the alpha needed to modernize the capital markets in the United States. As such, Chair Atkins noted that the SEC’s Project Crypto will focus on issuing clarity under the existing rules as Congress awaits passing the CLARITY  Act. Moreover, the SEC Chair believes that major global banks and brokers will adopt tokenization of real-world assets (RWA) in less than 10 years. Currently, the SEC is working closely with the sister agency Commodity Futures Trading Commission (CFTC) to catalyze the mainstream adoption of tokenized assets. Chair Atkins stated that tokenization of capital markets provides certainty and transparency in the securities industry. From a regulatory perspective, Chair Atkins stated that tokenized securities are still securities and thus bound by the existing securities laws. However, Chair Atkins stated that digital collectibles, commodities, and tools are not securities, thus not bound by the 1940s Howey test. As such,…
Share
BitcoinEthereumNews2025/12/08 18:35