This article explores how GPT-4 was tested on BYTESIZED32 games to generate and explain object, action, and score rules. Researchers used deterministic prompts in JSON mode and compared AI-generated rules with human-written annotations to ensure accuracy. The study reveals both the potential and limits of LLMs in understanding game dynamics, with humans still needed to validate rules and gameplay integrity.This article explores how GPT-4 was tested on BYTESIZED32 games to generate and explain object, action, and score rules. Researchers used deterministic prompts in JSON mode and compared AI-generated rules with human-written annotations to ensure accuracy. The study reveals both the potential and limits of LLMs in understanding game dynamics, with humans still needed to validate rules and gameplay integrity.

Testing GPT-4 on Game State Predictions

Abstract and 1. Introduction and Related Work

  1. Methodology

    2.1 LLM-Sim Task

    2.2 Data

    2.3 Evaluation

  2. Experiments

  3. Results

  4. Conclusion

  5. Limitations and Ethical Concerns, Acknowledgements, and References

A. Model details

B. Game transition examples

C. Game rules generation

D. Prompts

E. GPT-3.5 results

F. Histograms

A Model details

For the GPT-3.5 model, we use the gpt-3.5-turbo-0125 model. For the GPT-4 model, we use the gpt-4-0125-preview model. For both models, the temperature is set to 0 to get deterministic results. We also turn on the JSON mode of both models, which ensures that the model gives a valid JSON response. Our experiments cost approximately $5,000 for OpenAI API usage.

B Game transition examples

We manually pick the wash-clothes game in BYTESIZED32 as the example game as it contains both state transitions driven by actions and game’s underlying dynamics. In tasks where the model predicts action transition, environment-driven transitions, or the game progress alone, we provide one corresponding in-context example. In the task that requires the model to predict everything, we offer two in-context examples in the prompt. The two examples are manually picked such that in one example the game state is changed directly by the action taken while in the other example the game state is changed by the game’s underlying dynamics.

C Game rules generation

C.1 LLM generated rules

For LLM generated rules, we manually check all of them to avoid misinformation and offensive content.

\ We prompt GPT-4 (gpt-4-0125-preview) with the code of each object class to acquire the rules of each object. We also provide one in-context example. We ask GPT-4 to describe the meaning of each critical property (i.e. properties that do not inherit from parent) of the object and the tick function of the object (i.e. a function that defines how object properties may change at each time step regardless of the action taken). Below is an example of our prompt of object rule generation:

\ \

\ \ For action rules generation, we prompt GPT-4 (gpt-4-0125-preview) with the code of the whole game, but unlike object rules, we do not offer any in-context example. We ask GPT-4 to describe each of the actions in the game. Below is an example of our prompt for action rule generation:

\ \

\ \ Similar to action rules, we generate score rules by prompting GPT-4 (gpt-4-0125-preview) with the code of the game and ask GPT-4 to describe how the game can be won or lose and how rewards can be earned. Below is an example of our prompt for score rule generation:

\ \

\

C.2 Human-Written Action Rules

The action rules describe how each action can change the game states. The expert annotator reads the game description and source code for each game. They went through the list of available actions in the game and their corresponding functions in the game. Each action rule has three main parts: Action, Description, and Rules. The Action specifies the name of the action (e.g., action). The Description explains the general purpose of the action (e.g., connect two objects with input terminals). The Rules is an unordered list of rule descriptions that describe the constraints of the action when interacting with different objects (e.g., At least one of the objects should be a wire or a multimeter) or how the rule might function under different conditions (e.g., Disconnect terminal if the terminal is already connected to other objects). To ensure accuracy, the annotator plays through the game and checks if the written object rules were correctly reflected in the gameplay.

C.3 Human-Written Object Rules

The object rules describe the meaning of each object property (e.g., temperature, size, weight, etc.) and how they will be changed at each time step. The expert annotators read the game description and source code for each game. They went through the object classes in the code script and wrote the object rules. Each object rule has three main parts: Object, Description, and Properties. The Object specifies the name of the object. The Description explains the general purpose of the object (e.g., GarbageCan is a container that can hold garbage). In the Description, the inheritance of the object class has been noted. The Properties is an unordered list of property descriptions that describe each property of that object (e.g., A Mold has its shape.) and their default value (e.g., By default, a GameObject is not combustible.) if the object is an abstract class. For objects with tick function, there is another property describing how an object may change under each tick. To ensure accuracy, the annotator plays through the game and checks if the written object rules were correctly reflected in the gameplay.

C.4 Human-Written Score Rules

Score rules describe the conditions to win or lose the game and how rewards can be earned. An expert annotator (one of the BYTESIZED32 game authors) creates the rules by reading the game description and the code of the score function.

\

:::info Authors:

(1) Ruoyao Wang, University of Arizona ([email protected]);

(2) Graham Todd, New York University ([email protected]);

(3) Ziang Xiao, Johns Hopkins University ([email protected]);

(4) Xingdi Yuan, Microsoft Research Montréal ([email protected]);

(5) Marc-Alexandre Côté, Microsoft Research Montréal ([email protected]);

(6) Peter Clark, Allen Institute for AI ([email protected]).;

(7) Peter Jansen, University of Arizona and Allen Institute for AI ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Market Opportunity
Mode Network Logo
Mode Network Price(MODE)
$0.0003945
$0.0003945$0.0003945
+10.75%
USD
Mode Network (MODE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

VIRTUAL Weekly Analysis Jan 21

VIRTUAL Weekly Analysis Jan 21

The post VIRTUAL Weekly Analysis Jan 21 appeared on BitcoinEthereumNews.com. VIRTUAL closed the week up 3.57% at $0.84, but the long-term downtrend maintains its
Share
BitcoinEthereumNews2026/01/22 06:54
Dogecoin, Shiba Inu & XYZVerse: Three Meme Coin Paths — Stability, Gradual Growth & Explosive Upside?

Dogecoin, Shiba Inu & XYZVerse: Three Meme Coin Paths — Stability, Gradual Growth & Explosive Upside?

Three meme tokens are taking unique routes in the market. One is holding firm, another is making slow gains, and a third is causing excitement with its big jumps. What sets these coins apart and makes each path interesting? The coming analysis looks at how these strategies could shape their future and what it might mean for traders. From Meme to Mainstream: Is Dogecoin Ready for Another Lift-Off? Dogecoin burst onto the scene in 2013 with a grinning Shiba Inu and a shrug. Its creators, Billy Marcus and Jackson Palmer, wanted a light-hearted twist on serious crypto. They set no hard limit on coins; in fact 10,000 fresh DOGE roll out every minute. What began as a joke became a juggernaut. Social media rallies, led by Elon Musk, pushed its worth above $50 billion in 2021, planting it in the top ten. The surge proved one thing: an online crowd can turn a meme into a market force. Under the hood DOGE runs on the same proof-of-work idea as Bitcoin, yet blocks clear faster and fees stay tiny. That makes tipping gamers, streamers, and friends quick and cheap. The endless supply fuels spending but also keeps a lid on scarcity. In today’s cycle Bitcoin’s rebound has traders hunting for lagging plays. New meme coins flash brighter, yet many fade fast. Dogecoin still owns the biggest fan club and sits on every major exchange, giving it staying power. If utility grows—or another Musk tweet lands—momentum could return in a hurry. Shiba Inu: The Meme Dog That Sniffed Out a Spot on Ethereum Shiba Inu burst onto the scene in 2020, barking at Dogecoin’s heels. Built on Ethereum, it plugs into a huge network of apps and wallets. Its maker, known only as Ryoshi, unleashed one quadrillion tokens. Half went to Vitalik Buterin, who later gave much away and burned the rest. That bold move grabbed headlines and trust. At the same time, it showed the coin was more than a joke. Today, SHIB powers ShibaSwap, a place to trade tokens without a middleman. Soon, holders may vote on new changes and even mint art pieces called NFTs. This wider plan gives SHIB tools that Dogecoin still lacks. The market cycle now rewards coins with clear stories and active teams. Meme coins often ride big waves, and Ethereum-based ones get extra attention because they fit with popular chains like Uniswap and OpenSea. SHIB also has a huge, vocal fan base that can drive fast moves. Prices are still far below last year’s peak, so some see room for a fresh run if the next bull phase appears. Demand for $XYZ Surges As Its Capitalization Hits the $15M Milestone XYZVerse ($XYZ), recently recognized as Best NEW Meme Project, is drawing significant attention thanks to its standout concept. It is the first ever meme coin that merges the thrill of sports and the innovation of web3. Unlike typical meme coins, XYZVerse offers real utility and a clear roadmap for long-term development. It plans to launch gamified products and form partnerships with big sports teams and platforms. Notably, XYZVerse recently delivered on one of its goals ahead of schedule by partnering with bookmaker.XYZ, the first fully on-chain decentralized sportsbook and casino. As a bonus, $XYZ token holders receive exclusive perks on their first bet. Price Dynamics and Listing Plans During its presale phase, the $XYZ token has shown steady growth. Since its launch, the price has increased from $0.0001 to $0.0055, with the next stage set to push it further to $0.0056. With an anticipated listing price of $0.10, the token is set to launch on leading CEXs and DEXs. The projected listing price of $0.10 could generate up to 1,000x returns for early investors, provided the project secures the necessary market capitalization. So far, more than $15 million has been raised, and the presale is approaching another significant milestone of $20 million. This fast progress is signaling strong demand from both retail and institutional investors. Champions Get Rewarded In XYZVerse, the community calls the plays. Active contributors are rewarded with airdropped XYZ tokens for their dedication. It’s a game where the most passionate players win big. The Road to Victory With solid tokenomics, strategic CEX and DEX listings, and consistent token burns, $XYZ is built for a championship run. Every play is designed to push it further, to strengthen its price, and to rally a community of believers who believe this is the start of something legendary. Airdrops, Rewards, and More - Join XYZVerse to Unlock All the Benefits Conclusion DOGE offers steadiness, SHIB moves upward in steps, yet XYZVerse (XYZ) blends sports and memes, presale live, community-led, aiming to beat past 17,000% stars in the 2025 bull run. You can find more information about XYZVerse (XYZ) here: https://xyzverse.io/, https://t.me/xyzverse, https://x.com/xyz_verse   Disclaimer: This article is provided for informational purposes only. It is not offered or intended to be used as legal, tax, investment, financial, or other advice.
Share
Coinstats2025/09/20 16:32
YZi Labs invests in Ethena Labs to support the expansion of the USDe ecosystem

YZi Labs invests in Ethena Labs to support the expansion of the USDe ecosystem

PANews reported on September 19th that YZi Labs announced it has deepened its holdings in Ethena Labs and will continue its strategic support for the development of the USDe ecosystem. USDe is the fastest-growing and third-largest dollar-denominated crypto asset in history, with a current circulating supply exceeding $ 13 billion. YZi Labs' support will promote the expansion of USDe's application across centralized and decentralized platforms, and will contribute to the development of new products : USDtb (a fiat-backed stablecoin) and Converge (an institutional settlement layer).
Share
PANews2025/09/19 21:07