llama.cpp Quickstart with CLI and Server

llama.cpp Quickstart with CLI and Server

How to Install, Configure, and Use the OpenCode

I keep coming back to llama.cpp for local inference—it gives you control that Ollama and others abstract away, and it just works. Easy to run GGUF models interactively with llama-cli or expose an OpenAI-compatible HTTP API with llama-server.

Airtable for Developers & DevOps - Plans, API, Webhooks, and Go/Python Examples

Airtable for Developers & DevOps - Plans, API, Webhooks, and Go/Python Examples

Airtable - Free plan limits, API, webhooks, Go & Python.

Airtable is best thought of as a low‑code application platform built around a collaborative “database-like” spreadsheet UI - excellent for rapidly creating operational tooling (internal trackers, lightweight CRMs, content pipelines, AI evaluation queues) where non-developers need a friendly interface, but developers also need an API surface for automation and integration.

Garage vs MinIO vs AWS S3: Object Storage Comparison and Feature Matrix

Garage vs MinIO vs AWS S3: Object Storage Comparison and Feature Matrix

AWS S3, Garage, or MinIO - overview and comparison.

AWS S3 remains the “default” baseline for object storage: it is fully managed, strongly consistent, and designed for extremely high durability and availability.
Garage and MinIO are self-hosted, S3-compatible alternatives: Garage is designed for lightweight, geo-distributed small-to-medium clusters, while MinIO emphasises broad S3 API feature coverage and high performance in larger deployments.

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production

End-to-end observability strategy for LLM inference and LLM applications

LLM systems fail in ways that traditional API monitoring cannot surface — queues fill silently, GPU memory saturates long before CPU looks busy, and latency blows up at the batching layer rather than the application layer. This guide covers an end-to-end observability strategy for LLM inference and LLM applications: what to measure, how to instrument it with Prometheus, OpenTelemetry, and Grafana, and how to deploy the telemetry pipeline at scale.

Subscribe

Get new posts on AI systems, Infrastructure, and AI engineering.