Skip to main content

Showcase Collection

Admin Admin avatar
Written by Admin Admin
Updated over a month ago

Large Language Models (LLMs) have evolved beyond basic code generation to actively solve some of the most complex tasks in cybersecurity—traditionally the domain of expert human analysts. One of the most striking examples is CyberGym, where autonomous AI agents discovered 15 zero-day vulnerabilities in major open-source software projects. This unprecedented achievement, validated by a peer-reviewed study on arXiv (2506.02548), covered 1,507 real-world vulnerability tests across 188 projects, showcasing the depth and precision of AI-driven security analysis.

Equally transformative is BountyBench, which demonstrates LLMs excelling in real-world bug bounty challenges. AI agents—using models like Codex CLI—achieved up to 90% success in generating patches, effectively solving high-value security tasks across 25 complex systems and 40 bounties. This signals a pivotal shift: LLMs are no longer just assistants—they are becoming primary actors in cyber defense.

Recognizing this acceleration, the Frontier AI Cybersecurity Observatory was established to monitor the rapidly advancing capabilities of AI in offensive and defensive security operations.

When armed with powerful analysis tools, LLMs become even more capable—able to take on deeper, more specialized tasks. That’s the idea behind Dr.Binary: an LLM-centric binary analysis assistant designed to solve tough challenges like binary diffing, firmware vulnerability detection, exploit tracing, and backdoor identification. With this combination, complex cybersecurity jobs that once took days or weeks can now be completed in minutes.

To explore how these capabilities are being applied in the real world, check out the following Dr.Binary Showcase Collection.

Ransomware Analysis

Summary:

Ransomware is a type of malicious software that encrypts a victim's files and demands payment to restore access. This demo shows how Dr. Binary can analyze a suspicious binary and identify it as potential ransomware.

Chat Links:

ECU analysis and diffing

Summary:

ECU binaries refer to compiled firmware or software that runs on Electronic Control Units (ECUs) — specialized embedded systems used in vehicles to control various functions. This demo shows how to use Dr. Binary to find the differences between two ECU binaries.

Chat Links:

Identify known vulnerabilities in firmware images

Summary:

Dr. Binary detects known vulnerabilities in firmware images by performing binary diffing against patched versions. In this demo, it successfully identifies CVE-2023-21273 and CVE-2023-21241 in provided binaries. Detailed technical explanation of the underlying techniques can be found here.

Chat Links:

Identify and patch unknown vulnerabilities in CGC binaries.

Summary:

CGC binaries are challenge programs from DARPA’s Cyber Grand Challenge (CGC)—a competition that featured synthetic software with known and unknown vulnerabilities, designed to test automated vulnerability discovery and patching systems.

This demo showcases how Dr. Binary analyzes a CGC binary to:

  • Identify previously unknown vulnerabilities (e.g., memory corruption)

  • Understand root causes through disassembly and reasoning

  • Propose patches to mitigate the issues

Chat Links:

Detect backdoor attack

Summary:

This demo shows how Dr. Binary analyzes backdoors by diffing two versions of a binary to identify suspicious changes. By comparing control flow, function logic, and inserted code, Dr. Binary helps uncover malicious modifications introduced between versions.

Chat Links:

Solve CTF Binaries

Summary:

This demo shows how you can solve CTF challenge binaries simply by chatting with Dr. Binary. Just upload the binary, and Dr. Binary will orchestrate advanced analysis tools—such as disassemblers and decompilers—to understand the binary’s logic. It then guides you step-by-step, explains key conditions, and even generates helpful scripts (e.g., Python or Angr) to assist in solving the challenge.

Chat Links:

Did this answer your question?