Question 1

什么是“Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks”？

Accepted Answer

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks 是 Link News 基于事实数据库聚合的新闻话题，当前摘要为：A reliability layer for self-hosted LLM tool-calling. Forge lifts an 8B local model to the top of its class on multi-step agentic workflows through guardrails (rescue parsing, retry nudges, step enforcement) and context management (VRAM-aware budgets, tiered compaction). The current top self-hosted config (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% across forge's 26-scenario eval suite — and 76% on the hardest tier.

Three ways to use it:

WorkflowRunner — Define tools, pick a backend, run structured agent loops. Forge manages the full lifecycle: system prompts, tool execution, context compaction, and guardrails. SlotWorker adds priority-queued access to a shared inference slot with auto-preemption — for multi-agent architectures where specialist workflows share a GPU slot. Best when you're building on forge directly.

Guardrails middleware — Use forge's reliability stack ( composable middleware ) inside your own orchestration loop. You control the loop; forge validates responses, rescues malformed tool calls, and enforces required steps.

Proxy server — Drop-in OpenAI-compatible proxy ( python -m forge.proxy ) that sits between any client (opencode, Continue, aider, etc.) and a local model server. Applies guardrails transparently — the client thinks it's talking to a smarter model.

Supports Ollama, llama-server (llama.cpp), Llamafile, and Anthropic as backends.

llama-server (recommended — top 10 eval configs all run on llama-server):

Ollama (alternative — easier setup, slightly weaker on harder workloads):

Anthropic (API, no local GPU needed):

See Backend Setup for full instructions and Model Guide for which model fits your hardware.

For multi-step workflows, multi-turn conversations, and backend auto-management, see the User Guide . If you're building a long-running session (CLI, chat server, voice assistant), see the long-running session advisory for important guidance on filtering transient messages.

Drop-in replacement for a local model server. Point any OpenAI-compatible client at the proxy and get forge's guardrails for free.

Then configure your client to use http://localhost:8081/v1 as the API base URL.

Note: The proxy automatically injects a synthetic respond tool when tools are present in the request. The model calls respond(message="...") instead of producing bare text, keeping it in tool-calling mode where forge's full guardrail stack applies. The respond call is stripped from the outbound response — the client sees a normal text response ( finish_reason: "stop" ) and never knows the tool exists. This is essential for small local models (~8B), which cannot be trusted to choose correctly between text and tool calls — guiding them to a tool is a must. See ADR-013 for the full analysis.

See Backend Setup for installation and Model Guide for which model to pick.

26 scenarios measuring how reliably a model + backend combo navigates multi-step tool-calling workflows — split into an OG-18 baseline tier and an 8-scenario advanced_reasoning tier for top-end separation. See Eval Guide for full CLI reference.

The forge guardrail framework and ablation study are published as:

Zambelli, A. Forge: A Reliability Layer for Self-Hosted LLM Tool-Calling. https://doi.org/10.1145/3786335.3813193

A pre-publication preprint is also available at docs/forge_ieee_preprint.pdf — kept as a historical artifact. Cite the published version above; the DOI link may not resolve immediately depending on the publisher's release timing.

Question 2

这个话题覆盖了哪些来源？

Accepted Answer

这个话题当前覆盖 1 个来源平台，并持续汇总相关新闻、搜索与社交讨论信号。

Question 3

这个话题有哪些可追溯证据？

Accepted Answer

当前页面展示 1 条来源证据、1 个时间线节点，并保留原始出处链接便于核验。

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Why this topic matters

Keywords

Source evidence

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Timeline

Related topics