ClearAgent: Agentic Binary Analysis for Effective Vulnerability Detection

Venue

LMPL 2025

Abstract

Statically detecting bugs at the binary level has been crucial for the security of Commercial-Off-The-Shelf (COTS) software when source code is not available. However, traditional methods suffer from the inherent limitations of binary translation and static analysis, which hinders their scalability for complex real-world binaries. Recent efforts that leverage large language models (LLMs) for bug detection are still limited by possible hallucination, inaccurate code property retrieval, and insufficient guidance.

In this paper, we propose a new agentic binary analysis framework ClearAgent, which features a novel binary language server that provides both LLM-friendly and analyzer-friendly interfaces to facilitate effective understanding of binary code semantics, enabling effective vulnerability detection. ClearAgent works by automatically interacting with the server and iteratively exploring for buggy locations. For candidate bug reports, ClearAgent further tries to verify the existence of the vulnerability by constructing concrete inputs that can trigger the buggy locations.

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

Download

Venue

2025 Forty-second International Conference on Machine Learning

Abstract

Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. EnIGMA introduces new Agent-Computer Interfaces (ACIs) to improve the success rate on CTF challenges. We establish the novel Interactive Agent Tool concept, which enables LM agents to run interactive command-line utilities essential for these challenges. Empirical analysis of EnIGMA on over 350 CTF challenges from three different benchmarks indicates that providing a robust set of new tools with demonstration of their usage helps the LM solve complex problems and achieves state-of-the-art results on the NYU CTF and Intercode-CTF benchmarks. Finally, we discuss insights on ACI design and agent behavior on cybersecurity tasks that highlight the need to adapt real-world tools for LM agents.

Code & Dataset

URL: https:// github.com/princeton-nlp/SWE-agent and https://github.com/ NYU-LLM-CTF/LLM_CTF_Dataset_Dev

CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

Download

Venue

preprint, under review.

Abstract

Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN’s effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25–30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to public https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.

Code & Dataset

URL: https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.

Research Papers

ClearAgent: Agentic Binary Analysis for Effective Vulnerability Detection

Venue

Abstract

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

Venue

Abstract

Code & Dataset

CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

Venue

Abstract

Code & Dataset