lilly guo
AI engineer & founder
/ index

I scope, design, and ship multi-agent systems that solve real problems.

Most agent projects fail before the code — wrong problem, wrong scope, wrong trust boundaries. I start with the workflow, figure out what's actually worth automating, then build and ship the system end-to-end. A few shipped systems below. Currently co-founding AI Workshop Berlin ↗

open to: contracts, co-founders, conversations
/ live agent demo

A working agent I built to scope my own inbound engagements. Paste a project description or describe what you'd want an agent for — it maps your need against my shipped work, surfaces honest overlap and gaps, and drafts a 2-week PoC plan if it's a real fit. Or tells you it isn't.

~/lilly · agent: fit-assessor · session #00081
LIVE
Enter to send · Shift+Enter for newline
tool calls
awaiting input…
fit evidence
no fit verdict yet
/ selected work

agent_workspace · QESTIT

Internal agent IDE for authoring and executing custom agent skills.

Skill system with runtime discovery, SSE streaming for long-running agent operations with mid-session question/permission handling, context integration (Jira, Confluence). Three-layer evaluation framework: execution trace parser with subagent timing and parallelism metrics; deterministic quality calculator across coverage, traceability, syntax validity, reusability (30-case test suite); 5 parallel LLM judges with externalized rubric prompts and contrastive integration tests.

TauriReactTypeScriptSSELLM-as-judgeOpenCode

browser_automation_agent

Multi-agent system that fills LinkedIn + external job applications using LLM vision for navigation and semantic field matching.

8-state navigation FSM, two-stage field resolution combining fuzzy semantic matching with LLM content generation, multimodal screenshot analysis via Llama-4-Maverick. Tracks per-step success/failure to prevent navigation loops.

PlaywrightOllamaGeminiLlama 4Python

job_agent

Dual-LLM (Claude + Gemini) CV customization pipeline with 93% reduction in generation time (30 min → <2 min).

Status-driven SQLite state machine coordinating filtering + customization agents. Five-stage prompt chain. Parallel Claude + Gemini execution via asyncio.gather. Two-phase constraint satisfaction (MODERATE → TIGHT → MINIMAL) with PyPDF2 page validation. Full LaTeX compilation pipeline.

ClaudeGeminiPythonSQLiteLangSmith

startup_research_agent

Parallel domain researchers coordinated by central orchestrator for automated due diligence.

Multi-model strategy (Perplexity Sonar + Gemini) routed via Ollama-compatible gateway. Entity disambiguation before expensive research, iterative knowledge gap identification, depth-stratified modes (basic/standard/comprehensive) with parallel domain agents.

PerplexityGeminiFastAPIStreamlitPython

beeai_job_match

Framework-bridging adapters and lifecycle plumbing for BeeAI Framework alpha (Feb 2025) — the layer that makes a pre-1.0 agent framework usable.

Custom OllamaChatModel adapter bridging the Ollama SDK to BeeAI's ChatModel interface (sync + async generation). Full MCP client lifecycle via AsyncExitStack. run_async() bridge for calling async agents from Streamlit's sync context. The job-search use case sits on top of all this — but the adapter layer is the actual contribution.

BeeAIMCPOllamaStreamlitPython

linkedin_mcp_server

Published MCP server exposing LinkedIn job search with dual stdio + SSE transport.

search_jobs + get_job_details with comprehensive filtering. Custom serializer producing safe LLM-friendly structured output. Built on FastMCP + python-jobspy, Docker-containerized, supports both CLI and HTTP network access.

MCPFastMCPDockerPython

task_automation_system

Hybrid RAG over personal data → prioritized task lists, fine-tuned T5 for extraction.

Weaviate hybrid search (BM25 + dense) with configurable alpha, query expansion via synonym generation, post-retrieval reranking with time-decay × priority-boost. T5 fine-tuned on synthetic task scenarios, 4-bit quantized, deployed on Cloud Run with GPU.

WeaviateHF TransformersT5Cloud RunPython
/ background

The hardest part of agent work isn't the code — it's deciding what to build and knowing when it's good enough to trust. I care about both: problem selection and system reliability. That means state machines instead of vibes, eval frameworks instead of "looks good to me."

Previously a Senior AI Consultant at QESTIT, where I built agent orchestration tooling for software testing and coached engineers on adopting LLM workflows.

Before that I was a blockchain engineer. At Civic Technologies, I shipped the first production app built on Internet Identity's then-undocumented blockchain identity stack — a secure, private credential-sharing system for identity verification (Rust, TypeScript and Node). As a Polkadot DevRel at Parity Technologies I was selected for the Polkadot Academy with Gavin Wood (co-founder of Ethereum). At Sunrise Stake I wrote high-performance smart contracts for cross-chain transactions.

1st overall prize at ETHGlobal Paris 23 (1000 hackers), first prize at ETHGlobal New York 23, ETHPrague 23, ETHGlobal Istanbul 23 (×2), and NFTBerlin 22 (×3). Graduated with my Computer Science BSc degree in October 2023.