KZ kevinz.ai
--:--:-- -- EST · YYZ
01hardware
local compute node
01
Mac Mini M4
primary host
Primary compute for the self-hosted fleet: Ollama local inference, PM2 services, Redis, SQLite memory stores, Docker workloads, and private routing for sovereignty-sensitive work.
alfred on Tailscale · local model cache · 36-agent control plane
self-host
02models / llm providers
models routed in production
14
Claude
primary reasoning
Primary reasoning path for architecture, code review, planning, and high-stakes operator work. Runs through Claude Code and ClaudeSwap where rate windows matter.
Gemma
local / air-gapped
Local open-weight route for private prompts, cheap drafts, and offline tests on the Mac Mini. Useful when data should stay on the machine.
Gemma local via Ollama
free
Mistral
codestral / code spec
Codestral and Mistral Small are cost-control routes for code completion, EU-friendly inference, and lightweight tasks that do not need a frontier model.
Codestral · long-context coding route
HuggingFace
121-model fallback
Router for open models and provider experiments. It is where niche models enter the fleet before they earn a permanent route.
DeepSeek, Llama, Qwen, and specialty backends
free
Nous Research
open-weights research
Open-weight research models for experiments, evals, and checking whether a cheaper route can handle work before a paid model gets involved.
free
Perplexity
research / grounded
Grounded research lane when current source lookup matters. It feeds briefs, competitive checks, and sanity passes before decisions get handed to agents.
Ollama
local model runtime
Runtime for private local models and embeddings. It keeps test prompts and sensitive client work inside the Mac Mini boundary.
Gemma · Llama · local embeddings
self-host
Groq
bulk agents / llama
Fast low-cost inference for bulk agent lanes, workers, and routing tests. It handles volume when latency matters more than deep reasoning.
Llama route for high-throughput tasks
free
Cerebras
qwen / gpt-oss
Large-model inference lane for one-shot research and free-tier pressure relief. Best when scale matters and the task is bounded.
free
Cloudflare AI
edge inference
Edge inference and fallback routing when API providers rate-limit. It pairs with Workers and Tunnel in the same Cloudflare account.
Workers AI · edge route · fallback lane
free
OpenRouter
fallback router
Single-key access to many model routes. Useful for fallback tests, cost comparisons, and trying new models before wiring native providers.
free
MiniMax
long-context / multimodal
Long-context and multimodal route for experiments where frontier labs are not the right cost fit or where alternate behavior is useful.
Portkey
llm gateway
Gateway layer for model routing, request logs, and cost controls. It gives the fleet one place to reason about provider behavior.
Plausible
privacy analytics
Privacy-first analytics for site and funnel checks. It stays in the toolchain because routing decisions need traffic context.
03cloud agent platforms
client-ready managed agent surfaces
05
OpenAI Agent Builder
gpt-native agents
Managed agent surface for clients who want GPT-native tool calls, retrieval, voice, and vision without owning a custom orchestration layer.
Vertex AI Agent Builder
google cloud agents
Enterprise agent path for GCP shops, especially when grounding against BigQuery, Google Cloud data, and Workspace workflows.
Copilot Studio
microsoft agents
Custom copilots over Microsoft 365, Dataverse, and Power Platform. Strong fit when governance and existing enterprise identity matter.
HubSpot Breeze
crm agents
Marketing and CRM agents for teams already operating inside HubSpot. Good for fast deployment when the data model already lives there.
Salesforce Agentforce
enterprise agents
Sales and service agents inside Salesforce. Best fit when Data Cloud, CRM workflows, and audit trails are already part of the business.
04agent frameworks / orchestration
build and dispatch
13
OpenClaw
self-hosted gateway
Backbone for the multi-channel agent platform. It routes messages, loads skills per agent, tracks budgets, and keeps the 36-agent fleet coordinated.
Gateway :18789 · 36 agents · 6 channels
free
Claude Code
primary ide
Primary coding surface. Custom agents, hooks, skills, and repository context all start here before work moves through review gates.
CC Commander · 60 plugin skills
Codex CLI
oauth code agents
OAuth-only coding agents for implementation, review, and local automation. No API key path is used for this lane.
free
Cursor
rapid prototyping
Traditional editor lane for quick visual diffs, prototypes, and work that benefits from a file tree plus agent chat in one window.
Commander
60 plugin skills
Open-source command layer for managing Claude Code, Codex, and agent-driven workflows. It keeps repeatable operations out of memory and inside skills.
free
Paperclip
issue-first dispatch
Task dispatch and governance layer. It creates issue-first work, routes implementation, and pushes changes through spec and quality gates.
localhost :3100 · branch naming by ticket
self-host
n8n
self-hosted workflows
Webhook and integration layer for non-code automation. It runs self-hosted on Docker for workflows that should stay close to private data.
Docker service · webhook triggers
self-host
Langfuse
llm tracing / cost
Trace inspection, latency analysis, and per-request cost visibility for the model lanes that need deeper observability.
Promptfoo
eval regression
Evaluation framework for prompt changes, routing regressions, and provider comparison before a route is trusted in production.
free
Datadog
observability
Infrastructure and app observability for hosted services where Langfuse is the wrong shape or logs need a broader operational view.
OpenWebUI
local model ui
Local model interface for testing Ollama routes, comparing prompts, and giving non-terminal access to private models.
self-host
Inngest
durable workflows
Durable background workflow option for jobs that need retries, visibility, and clearer state than a raw queue.
Logsnag
events / alerts
Lightweight event tracking for agent actions, deployment markers, and operator alerts that should show up fast.
05infra / self-hosted
where it runs
21
AWS
iam / primary cloud
Primary cloud account for infra pieces that need IAM, mature primitives, and long-running reliability.
Vercel
frontend hosting
Hosts kevinz.ai and static frontend properties. Fast deploys, edge delivery, and clean rollback paths for public web surfaces.
Railway
backend services
Quick backend deployment lane for services where a small always-on app beats serverless complexity.
Cloudflare
tunnel / workers / cdn
DNS, CDN, Workers, and Tunnel for the home lab. Keeps local services reachable without opening inbound ports.
Tailscale
sovereign mesh
Private network between machines, services, and admin surfaces. The Mac Mini is reachable as alfred without public exposure.
self-host
Coolify
self-hosted deploys
Self-hosted deployment control plane for apps that belong on owned infrastructure instead of a managed platform.
self-host
Docker
containers
Container baseline for n8n, local services, sandboxes, and repeatable dev environments on macOS and servers.
free
BigQuery
analytics / grounding
Warehouse layer for analytics and Google Cloud grounding when client data already sits inside the GCP world.
Supabase
postgres / auth
Postgres, auth, and storage when a product needs real app primitives without standing up a custom backend stack.
Fly.io
region failover
Always-on backend services and regional failover when serverless is the wrong operational model.
OrbStack
docker on macOS
Fast Docker runtime on Apple Silicon. Lower friction than Docker Desktop for local services and agent-adjacent containers.
PM2
persistent services
Process manager for long-running local services. Restarts failed agents and keeps the home-lab process list sane after reboots.
20+ persistent services
free
Caddy
reverse proxy
Reverse proxy for local services, paired with Cloudflare Tunnel for clean routing and auto-TLS where needed.
free
Redis
cache / queue
Cache and queue layer for agent coordination, session state, and fast handoffs between services.
free
GitHub
source / ci
Source of truth for code, issues, reviews, releases, and public open-source work.
free
Sentry
error tracking
Error tracking for production web surfaces and APIs where failures need context, grouping, and alerts.
Stripe
billing
Billing and SaaS revenue infrastructure. It remains the cleanest default for payments, subscriptions, and invoices.
Resend
transactional email
Transactional email and newsletter delivery for site forms, notifications, and publication workflows.
PostHog
product analytics
Product analytics, events, and funnels when a project needs more behavioral detail than page analytics.
Twilio
sms / voice
SMS and voice rails for customer-facing automations, verification, and agent-assisted communications.
1Password
op:// secrets
Secret source of truth. Agents and services read credentials through op:// references, never hardcoded keys.
06knowledge / workflow
think and write
12
Obsidian
second brain
Personal knowledge base, notes, vault sync, and long-lived thinking. It feeds writing and agent memory workflows.
free
Notion
specs / client docs
Client-facing docs, specs, research summaries, and structured pages where collaborators need readable context.
Slack
team comms / alerts
Team communication and agent alert surface. It is where operational updates land when they need human attention.
Discord
community / dev forums
Community, developer forums, and live discussion channels where projects and tools surface early signal.
free
Todoist
tasks / inbox
Personal task inbox for work that should not become a repo issue, plus quick capture before agent routing.
Home Assistant
local automations
Local automation and device state. It gives home workflows a private control plane instead of another cloud dependency.
self-host
Linear
personal dev tasks
Issue tracker for implementation discipline. Agent work starts with a ticket, not a vague request floating in chat.
Attio
crm / lead scoring
CRM and relationship database for cleaner lead tracking, customer context, and agent-accessible contact records.
Clay
data enrichment
Data enrichment and go-to-market operations. Useful when a workflow needs structured company or contact context.
Granola
meeting ingestion
Meeting notes and transcripts routed into memory and follow-up workflows. Calls become structured inputs for agents.
BlueBubbles
imessage bridge
iMessage bridge for personal automation experiments and channel testing where mobile messages need an agent lane.
self-host
Raycast
launcher / snippets
Launcher, snippets, quick commands, and local workflow shortcuts. It is the operator console for small actions.
Some outbound links include affiliate refs. I list tools I use in production, evaluate for client architecture, or keep in the active fleet. Commissions help fund the 36-agent stack.