Heat score
1Topic analysis
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
A reliability layer for self-hosted LLM tool-calling. Forge lifts an 8B local model to the top of its class on multi-step agentic workflows through guardrails (rescue parsing, retry nudges, step enforcement) and context management (VRAM-aware budgets, tiered compaction). The current top self-hosted config (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% across forge's 26-scenario eval suite — and 76% on the hardest tier. Three ways to use it: WorkflowRunner — Define tools, pick a backend, run structured agent loops. Forge manages the full lifecycle: system prompts, tool execution, context compaction, and guardrails. SlotWorker adds priority-queued access to a shared inference slot with auto-preemption — for multi-agent architectures where specialist workflows share a GPU slot. Best when you're building on forge directly. Guardrails middleware — Use forge's reliability stack ( composable middleware ) inside your own orchestration loop. You control the loop; forge validates responses, rescues malformed tool calls, and enforces required steps. Proxy server — Drop-in OpenAI-compatible proxy ( python -m forge.proxy ) that sits between any client (opencode, Continue, aider, etc.) and a local model server. Applies guardrails transparently — the client thinks it's talking to a smarter model. Supports Ollama, llama-server (llama.cpp), Llamafile, and Anthropic as backends. llama-server (recommended — top 10 eval configs all run on llama-server): Ollama (alternative — easier setup, slightly weaker on harder workloads): Anthropic (API, no local GPU needed): See Backend Setup for full instructions and Model Guide for which model fits your hardware. For multi-step workflows, multi-turn conversations, and backend auto-management, see the User Guide . If you're building a long-running session (CLI, chat server, voice assistant), see the long-running session advisory for important guidance on filtering transient messages. Drop-in replacement for a local model server. Point any OpenAI-compatible client at the proxy and get forge's guardrails for free. Then configure your client to use http://localhost:8081/v1 as the API base URL. Note: The proxy automatically injects a synthetic respond tool when tools are present in the request. The model calls respond(message="...") instead of producing bare text, keeping it in tool-calling mode where forge's full guardrail stack applies. The respond call is stripped from the outbound response — the client sees a normal text response ( finish_reason: "stop" ) and never knows the tool exists. This is essential for small local models (~8B), which cannot be trusted to choose correctly between text and tool calls — guiding them to a tool is a must. See ADR-013 for the full analysis. See Backend Setup for installation and Model Guide for which model to pick. 26 scenarios measuring how reliably a model + backend combo navigates multi-step tool-calling workflows — split into an OG-18 baseline tier and an 8-scenario advanced_reasoning tier for top-end separation. See Eval Guide for full CLI reference. The forge guardrail framework and ablation study are published as: Zambelli, A. Forge: A Reliability Layer for Self-Hosted LLM Tool-Calling. https://doi.org/10.1145/3786335.3813193 A pre-publication preprint is also available at docs/forge_ieee_preprint.pdf — kept as a historical artifact. Cite the published version above; the DOI link may not resolve immediately depending on the publisher's release timing. MIT — Copyright (c) 2025-2026 Antoine Zambelli
Sources
1Platforms
1Relations
0- First seen
- May 19, 2026, 8:23 PM
- Last updated
- May 20, 2026, 4:00 AM
Why this topic matters
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks is currently shaped by signals from 1 source platforms. This page organizes AI analysis summaries, 1 timeline events, and 0 relationship edges so search engines and AI systems can understand the topic's factual basis and propagation arc.
Keywords
10 tagsSource evidence
1 evidence itemsShow HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
News · 1Timeline
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
May 19, 2026, 8:23 PM
Related topics
No related topics have been aggregated yet, but this page still preserves the AI summary, source links, and timeline.