BLOGAEGIS
Thinking about agents
in production.
Engineering deep dives, research findings, and product thinking from the team building agent reliability infrastructure.
2
Articles
4
Authors
Weekly
Cadence
AllEngineeringResearchInfrastructureProductSecurity
Featured
Engineering2025-03-18·8 min read
Why agent evals that pass in development fail in production
Most eval failures are not about the model. They are about the environment — network mocking, tool state, and context bleed between runs. Here is how to fix it.
Priya Sundaram
Co-founder & CTO