The system has numerous issues, such as failure to record successful actions, complex plans that confuse the system in simple scenarios, lack of full automation, the need for interactive user interaction, and others.The system has numerous issues, such as failure to record successful actions, complex plans that confuse the system in simple scenarios, lack of full automation, the need for interactive user interaction, and others.

Analysis of the ROGUE Agent-Based Automated Web Testing System

2025/09/19 12:57
6분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 [email protected]으로 연락주시기 바랍니다

In this series of articles about agents in cybersecurity, specifically pentesting, I will examine and describe various online projects, explain their operating principles, test their functionality, verify the quality of the results, and summarize. In parallel, we will develop our own pentesting agent from start to finish, testing various language models, both for local use and those accessible only via the API. We will also train our own large language model for pentesting.

In this article, I want to analyze a simple project. https://github.com/faizann24/rogue

\n

1. Technology set

There is no release number on repository that why I mark that we use version from latest commit from Jun 18 2025.

So, judging by the project description, it is positioned as an agent for AUTOMATIC WEB scanning, most likely targeting web applications, not the web as a general concept.

It is written in Python. OpenAI is used as a large language model; other models cannot be used because OpenAI usage is hardcoded in the llm.py file. The gpt-4o and o4-mini models are used. As can be seen from the code in this file, there was an attempt to use Anthropic models, but it never went beyond the class declaration. The entire web application system is built using the Playwright testing framework.

Now, regarding the agent features used by this project, these include tools, a system prompt describing the agent's role, and a knowledge base. Let's describe each of these in more detail. The tools are located in the tools.py file. Part of the tools is a call to Playwright functionality, a method for running Python code generated by the language model, as well as unfinished functions and a method that attempts to obtain output from the tools used.

The system prompts vary depending on the steps. The llm system prompt describes it as a security check agent, adds some knowledge about SQL injection and other hacking hints, describes Playwright functions, and describes their output.

Checks for subdomains or passwords are done in the most stupidly simple way possible, simply checking data from a custom collection in the list directory. \n

The description claims that the system operates automatically, but this is not true. The system has a tool that requests information from the user, and not only is it unknown when the system will stop and wait for a response, but the function itself is simply a stub that doesn't use the entered data.

\n

\n

2.Description of the system operation

\n A general description of the application's workflow is shown in the diagram. Briefly, when run.py is launched, an Agent object is created. When it is initialized, a simple knowledge base is created based on data from third-party websites. In the diagram, these are pentestmonkey websites, but there are others, as can be seen in the knowledge_fetcher.py file. This data is passed to the LLM object, the scheduler is initialized with its system prompt (I've highlighted it in red), and finally, the Reporter is initialized.

Based on the parsed page, self.scanner.scan(url) generates a page for analysis through Summarizer with its system prompt. This involves two stages: simple parsing and final parsing through LLM, where it attempts to find the required elements.

Based on the findings, a scan plan is generated and executed until it has completed all the plan's steps. The system analyzes, uses tools and their outputs, and creates an intermediate summary using its system prompt.

At the end, a final report is generated with its own system prompt.

\n

3.Practical testing

3.1 Testing Environment

For this study, I used the well-known Metasploitable 2 project, which includes mutillidae, a web application for testing web vulnerabilities. I chose the simplest XSS vulnerability. For this, I booted a machine on my closed network and selected the following page for testing. http://192.168.127.166/mutillidae/index.php?page=add-to-your-blog.php

the simplest vulnerability is at work here <script></script>

\n

3.2. Launch technique

Run with command

python run.py -u http://192.168.127.166/mutillidae/index.php?page=add-to-your-blog.php

by the default LLM model is o4-mini \n

3.3. Observation and analysis of the process

We see that the system has selected the o4-mini model, and the data from the RAG is disabled.

The system saw that JS was triggered and realized it needed to run an XSS check, but it immediately started doing complex checks and going down the wrong path. There were attempts to do Stored XSS Injection via Client-Side Filter Evasion but absolutely illiterate with simple vulnerability, which led to failure fill("textarea[name='content']", "")

System tried Stored XSS Injection via Alternative Payload Encoding

During the scan, for some reason, it jumped to another page.

It took 10-15 minutes to scan one page, but if you need to scan 20-30 pages, that's hours per site. In real life, you're given a client network with several web services and a bunch of other services that need to be scanned within a reasonable time, and no one will wait a week.

The report was completely disappointing, even though I saw that the attack was successful when the code was executed.

\n but it was not saved and the report turned out to be empty on vulnerabilities

\n The second time the system was launched with a simple RAG

And I saw the coveted XSS \n

But the result didn't change much, and even got worse. The vulnerability that was found wasn't even tested this time at the same stage, only at the very end.

And the system asked for user actions.

This didn't help either, requiring user input. And the final report ended up empty of useful data.

\n

As tested, the system has numerous issues, such as failure to record successful actions, complex plans that confuse the system in simple scenarios, lack of full automation, the need for interactive user interaction, and others. All of these issues have been resolved in our SxipherAI system. \n

Summary

• The system is not fully automated.

• The system uses only OpenAI.

• It cannot fully test web applications; it attempts XSS, CSF, and SQL injection, but all to no avail.

• Even with the most basic testing labs, which even a novice pen tester could handle, the system failed.

• The project was abandoned without ever reaching some logical conclusion.

• There were attempts to use tools, Playwright, and some Summari, but it was useless.

\n

In upcoming lessons, we'll demonstrate how the SxipherAI system handles such websites, specifically how SxipherAI handles XSS vulnerabilities, and develop our own simple agent for checking for XSS vulnerabilities.

\n

And next up will be PentestGPT.

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, [email protected]으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

추천 콘텐츠

XRP Moves Above $1.40 as Traders Watch Bullish Signals

XRP Moves Above $1.40 as Traders Watch Bullish Signals

The post XRP Moves Above $1.40 as Traders Watch Bullish Signals appeared on BitcoinEthereumNews.com. XRP climbed above $1.40 with $3.5B volume as traders highlight
공유하기
BitcoinEthereumNews2026/03/14 18:54
Paramount-WBD 2027 movie slate could dominate. Can it sustain?

Paramount-WBD 2027 movie slate could dominate. Can it sustain?

The post Paramount-WBD 2027 movie slate could dominate. Can it sustain? appeared on BitcoinEthereumNews.com. Paramount Skydance CEO David Ellison speaks during
공유하기
BitcoinEthereumNews2026/03/14 19:06
How is the xStocks tokenized stock market developing?

How is the xStocks tokenized stock market developing?

Author: Heechang Compiled by: TechFlow xStocks offers a tokenized stock service, allowing investors to trade tokenized versions of popular US stocks like Tesla in real time. While still in its early stages, it’s already showing some interesting signs of growth. Observation 1: Trading is concentrated in Tesla (TSLA) As in many emerging markets, trading activity has quickly concentrated on a handful of stocks. Data shows a high concentration of trading volume in the most well-known and volatile stocks, with Tesla being the most prominent example. This concentration is not surprising: liquidity tends to accumulate in assets that retail investors already favor, and early adopters often use familiar high-beta stocks to test new infrastructure. Observation 2: Liquidity decreases on weekends Data shows that on-chain equity trading volume drops to 30% or less of weekday levels over the weekend. Unlike crypto-native assets, which trade seamlessly around the clock, tokenized stocks still inherit the behavioral inertia of traditional market trading hours. Traders appear less willing to trade when reference markets (such as Nasdaq and the New York Stock Exchange) are closed, likely due to concerns about arbitrage, price gaps, and the inability to hedge positions off-chain. Observation 3: Prices move in line with the Nasdaq Another key signal comes from pricing behavior during the initial launch period. Initially, xStocks tokens traded at a significant premium to their Nasdaq counterparts, reflecting market enthusiasm and potential friction in bridging fiat liquidity. However, these premiums gradually diminished over time. Current trading patterns show that the token price is at the upper limit of Tesla's intraday price range and is highly consistent with the Nasdaq reference price. Arbitrageurs appear to be maintaining this price discipline, but there are still small deviations from the intraday highs, indicating some market inefficiencies that may present opportunities and risks for active traders. New opportunities for Korean stock investors? South Korean investors currently hold over $100 billion in US stocks, with trading volume increasing 17-fold since January 2020. Existing infrastructure for South Korean investors to trade US stocks is limited by high fees, long settlement times, and slow cash-out processes, creating opportunities for tokenized or on-chain mirror stocks. As the infrastructure and platforms supporting on-chain US stock markets continue to improve, a new group of South Korean traders will enter the crypto market, which is undoubtedly a huge opportunity.
공유하기
PANews2025/09/18 08:00