Speculative Decoding: 20-50% Faster LLM Inference

Speculative Decoding: 20-50% Faster LLM Inference

Faster LLM inference without quality loss - a practical guide

A 70B model generates one token per forward pass, and each pass reloads weights from VRAM, computes attention across the context, and synchronizes memory. Between tokens, the GPU sits idle while it waits for sequential dependencies to resolve.

Multi-Agent Orchestration Patterns: A Practical Guide

Multi-Agent Orchestration Patterns: A Practical Guide

40% of multi-agent pilots fail. Here's how to pick the right orchestration pattern - and avoid the ones that break.

Single-agent AI systems peaked in 2025 — you gave one LLM a prompt, some tools, and a goal, and it did reasonably well on bounded tasks.

Transactional Outbox Pattern in Go with PostgreSQL

Transactional Outbox Pattern in Go with PostgreSQL

Write the event with the data. Never split them.

Two writes that should succeed together will eventually fail separately. Your order service saves the order to the database, then publishes an order.created event to a message broker.

Decision Records for AI-Driven Software Development

Decision Records for AI-Driven Software Development

Keep intent close to the code.

Decision records are the missing memory layer in AI-assisted software development. They capture not just what was built, but why — and that distinction becomes critical when AI tools are writing your code.

Testing Concurrent Go Code with synctest

Testing Concurrent Go Code with synctest

Stop sleeping in concurrent Go tests.

Testing concurrent Go code has always required a bit of discipline. Goroutines are cheap, channels are simple, and context cancellation is idiomatic — background workers and timers are everywhere in real Go services.

Subscribe

Get new posts on AI systems, Infrastructure, and AI engineering.