Question 1

什么是“CVE-Bench: testing LLM agents on real-world vulnerability patches”？

Accepted Answer

CVE-Bench: testing LLM agents on real-world vulnerability patches 是 Link News 基于事实数据库聚合的新闻话题，当前摘要为：CVE-Bench is introduced to test LLM agents (e.g., Anthropic’s Mythos, Poolside’s Laguna, OpenAI models) on fixing 20 real-world CVEs across 18 Python projects, using 3 prompt conditions (advisory, diagnose, locate) in sandboxed containers. It evaluates solve rates, token usage, tool calls, and regression, revealing model performance gaps, failure modes (e.g., wrong-search drift, partial fixes), and challenges in benchmarking. The benchmark aims to improve security vulnerability fixes before exploitation, with open data and tools for the community.

Question 2

这个话题覆盖了哪些来源？

Accepted Answer

这个话题当前覆盖 1 个来源平台，并持续汇总相关新闻、搜索与社交讨论信号。

Question 3

这个话题有哪些可追溯证据？

Accepted Answer

当前页面展示 1 条来源证据、1 个时间线节点，并保留原始出处链接便于核验。

CVE-Bench: testing LLM agents on real-world vulnerability patches

Why this topic matters

Keywords

Source evidence

CVE-Bench: testing LLM agents on real-world vulnerability patches

Timeline

Related topics