/ index

I build agent systems - and the governance and guardrails that make them safe to trust.

Once an agent can spend, sign, or act on its own, the hard part isn't the code - it's keeping it inside its limits, knowing what it did, and proving it afterward. That's what I build for teams putting agents into production.

CTO at Compass ↗ - the security and audit layer for money-moving agents.

Co-Founder of AI Workshop Berlin - AI enablement for SMBs.

open to: contracts, advisory, conversations

/ selected work

proprietor · Circle AI Agents Hackathon Berlin (1st)

An AI agent that owns and operates a profitable micro-SaaS. Its own CEO, CFO, and only employee.

The product is a Company Enrichment API, fulfilled by a multi-agent research pipeline: parallel domain researchers under a central orchestrator, entity disambiguation before any expensive call, iterative knowledge-gap loops. The business runs itself around it. Customers pay retail through an HTTP 402 paywall; a Claude-powered CFO agent checks the treasury, approves wholesale supplier payments, and defends margin with dynamic repricing - declining unprofitable orders before any money moves. Revenue and costs settle on-chain in USDC through a Circle Agent Wallet acting as the corporate treasury.

ClaudeMulti-agentAgentic commerceUSDCx402Python

agentplane · AI Workshop Berlin

Self-hostable, framework-agnostic governance layer that sits inline between AI agents and company systems - enforcement, not after-the-fact observation.

Every tool call is checked against a policy engine before it runs. When a call crosses a line, the gateway holds the connection open and waits for a human. A context server exposes the company's own rules and precedents to every connected agent, so decisions are made against institutional knowledge rather than a generic model's guess. Every decision is written to an audit trail you can replay.

GoPythonMCPagentgatewayOpenTelemetryPostgres/pgvector

agent_workspace · QESTIT

Internal agent IDE for authoring and executing custom agent skills.

Skill system with runtime discovery, SSE streaming for long-running agent operations with mid-session question/permission handling, context integration (Jira, Confluence). Three-layer evaluation framework: execution trace parser with subagent timing and parallelism metrics; deterministic quality calculator across coverage, traceability, syntax validity, reusability (30-case test suite); 5 parallel LLM judges with externalized rubric prompts and contrastive integration tests.

TauriReactTypeScriptSSELLM-as-judgeOpenCode

agent_analytics · QESTIT

Per-call token, cost, and latency tracking across every agent LLM call - the observability layer for what an agent spends.

Request analytics service streaming daily cost trends, per-model breakdowns, and usage comparison across all AI API calls (35 tests). Token-counting service built from scratch: tiktoken with the OpenAI Vision token formula (image scaling, tile computation, detail modes), pre-request estimates validated against actual API usage (89 tests).

tiktokenPythonFastAPISSE

browser_automation_agent

Browser agent that navigates and completes forms on sites it has never seen - LLM vision instead of per-site selectors.

8-state navigation FSM, two-stage field resolution combining fuzzy semantic matching with LLM content generation, multimodal screenshot analysis via Llama-4-Maverick. Tracks per-step success/failure to prevent navigation loops. Runs across arbitrary company career sites with no platform-specific code - the failure modes are the point: unseen layouts, cookie walls, dead ends.

PlaywrightOllamaGeminiLlama 4Python

task_automation_system

Hybrid RAG end to end - dense + sparse retrieval, query expansion, time-decay reranking, and a fine-tuned T5 extraction model in production.

Weaviate hybrid search (BM25 + dense) with configurable alpha, query expansion via synonym generation, post-retrieval reranking with time-decay × priority-boost. T5 fine-tuned on synthetic task scenarios, 4-bit quantized, deployed on Cloud Run with GPU.

WeaviateHF TransformersT5Cloud RunPython

/ background

The hardest part of agent work isn't the code - it's deciding what to build and knowing when it's good enough to trust. I care about both: problem selection and system reliability. That means state machines instead of vibes, eval frameworks instead of "looks good to me."

Right now that focus is agent security and governance for teams putting autonomous agents in front of real money - the earliest adopters, and the first to find out what happens when an agent gets it wrong. I'm CTO of Compass, building through Founder School 2026 and Emprelatam, where I own the enforcement core and the provenance layer. It's the intersection of the two things I've spent my career on: agent reliability and onchain systems.

Previously as a Senior AI Consultant at QESTIT I built the orchestration and evaluation tooling behind a team of in-house agents - and coached engineers on adopting LLM workflows.

Before that I was a blockchain engineer. At Civic Technologies, I shipped the first production app built on Internet Identity's then-undocumented blockchain identity stack - a secure, private credential-sharing system for identity verification (Rust, TypeScript and Node). As a Polkadot DevRel at Parity Technologies I was selected for the Polkadot Academy with Gavin Wood (co-founder of Ethereum). At Sunrise Stake I wrote high-performance smart contracts for cross-chain transactions.

1st overall prize at ETHGlobal Paris 23 (1000 hackers), first prize at ETHGlobal New York 23, ETHPrague 23, ETHGlobal Istanbul 23 (×2), and NFTBerlin 22 (×3). Graduated with my Computer Science BSc degree in October 2023.

book a call