I build agent systems - and the observability and guardrails that make them safe to trust.
Once an agent can spend money or move funds, the hard part isn't the code - it's knowing what it did, keeping it inside its limits, and being able to prove it afterward. I build the observability, policy, and audit layers that make autonomous agents safe to run in production. Also co-founding AI Workshop Berlin ↗
agent_workspace · QESTIT
Internal agent IDE for authoring and executing custom agent skills.
Skill system with runtime discovery, SSE streaming for long-running agent operations with mid-session question/permission handling, context integration (Jira, Confluence). Three-layer evaluation framework: execution trace parser with subagent timing and parallelism metrics; deterministic quality calculator across coverage, traceability, syntax validity, reusability (30-case test suite); 5 parallel LLM judges with externalized rubric prompts and contrastive integration tests.
agent_analytics · QESTIT
Per-call token, cost, and latency tracking across every agent LLM call — the observability layer for what an agent spends.
Request analytics service streaming daily cost trends, per-model breakdowns, and usage comparison across all AI API calls (35 tests). Token-counting service built from scratch: tiktoken with the OpenAI Vision token formula (image scaling, tile computation, detail modes), pre-request estimates validated against actual API usage (89 tests).
browser_automation_agent
Multi-agent system that fills LinkedIn + external job applications using LLM vision for navigation and semantic field matching.
8-state navigation FSM, two-stage field resolution combining fuzzy semantic matching with LLM content generation, multimodal screenshot analysis via Llama-4-Maverick. Tracks per-step success/failure to prevent navigation loops.
job_agent
Dual-LLM (Claude + Gemini) CV customization pipeline with 93% reduction in generation time (30 min → <2 min).
Status-driven SQLite state machine coordinating filtering + customization agents. Five-stage prompt chain. Parallel Claude + Gemini execution via asyncio.gather. Two-phase constraint satisfaction (MODERATE → TIGHT → MINIMAL) with PyPDF2 page validation. Full LaTeX compilation pipeline.
startup_research_agent
Parallel domain researchers coordinated by central orchestrator for automated due diligence.
Multi-model strategy (Perplexity Sonar + Gemini) routed via Ollama-compatible gateway. Entity disambiguation before expensive research, iterative knowledge gap identification, depth-stratified modes (basic/standard/comprehensive) with parallel domain agents.
beeai_job_match
Framework-bridging adapters and lifecycle plumbing for BeeAI Framework alpha (Feb 2025) — the layer that makes a pre-1.0 agent framework usable.
Custom OllamaChatModel adapter bridging the Ollama SDK to BeeAI's ChatModel interface (sync + async generation). Full MCP client lifecycle via AsyncExitStack. run_async() bridge for calling async agents from Streamlit's sync context. The job-search use case sits on top of all this — but the adapter layer is the actual contribution.
linkedin_mcp_server
Published MCP server exposing LinkedIn job search with dual stdio + SSE transport.
search_jobs + get_job_details with comprehensive filtering. Custom serializer producing safe LLM-friendly structured output. Built on FastMCP + python-jobspy, Docker-containerized, supports both CLI and HTTP network access.
task_automation_system
Hybrid RAG over personal data → prioritized task lists, fine-tuned T5 for extraction.
Weaviate hybrid search (BM25 + dense) with configurable alpha, query expansion via synonym generation, post-retrieval reranking with time-decay × priority-boost. T5 fine-tuned on synthetic task scenarios, 4-bit quantized, deployed on Cloud Run with GPU.
The hardest part of agent work isn't the code — it's deciding what to build and knowing when it's good enough to trust. I care about both: problem selection and system reliability. That means state machines instead of vibes, eval frameworks instead of "looks good to me."
Right now that focus is agent observability and governance for crypto-native teams — the earliest adopters putting autonomous agents in front of real money. It's the intersection of the two things I've spent my career on: agent reliability and onchain systems.
Previously as a Senior AI Consultant at QESTIT I built the orchestration and evaluation tooling behind a team of in-house agents — and coached engineers on adopting LLM workflows.
Before that I was a blockchain engineer. At Civic Technologies, I shipped the first production app built on Internet Identity's then-undocumented blockchain identity stack — a secure, private credential-sharing system for identity verification (Rust, TypeScript and Node). As a Polkadot DevRel at Parity Technologies I was selected for the Polkadot Academy with Gavin Wood (co-founder of Ethereum). At Sunrise Stake I wrote high-performance smart contracts for cross-chain transactions.
1st overall prize at ETHGlobal Paris 23 (1000 hackers), first prize at ETHGlobal New York 23, ETHPrague 23, ETHGlobal Istanbul 23 (×2), and NFTBerlin 22 (×3). Graduated with my Computer Science BSc degree in October 2023.