LLM Performance

How Ollama Handles Parallel Requests

How Ollama Handles Parallel Requests

Configuring ollama for parallel requests executions.

When the Ollama server receives two requests at the same time, its behavior depends on its configuration and available system resources.

Gemma2 vs Qwen2 vs Mistral Nemo vs...

Gemma2 vs Qwen2 vs Mistral Nemo vs...

Testing logical fallacy detection

Recently we have seen several new LLMs were released. Exciting times. Let’s test and see how they perform when detecting logical fallacies.

Large Language Models Speed Test

Large Language Models Speed Test

Let's test the LLMs' speed on GPU vs CPU

Comparing prediction speed of several versions of LLMs: llama3 (Meta/Facebook), phi3 (Microsoft), gemma (Google), mistral(open source) on CPU and GPU.