What does 10x engineering productivity look like in practice, using agentic engineering?

For months, my gut feel has been that agentic engineering makes our team operate at 10x productivity compared to manual coding. I decided to do some data gathering from the last 30 days, and that estimate turned out to be about right. And with higher-than-average code quality as well.

Disclaimer: I know that most teams do not experience this level of improvement. Everyone has a different context, etc. And yes, engineering productivity isn't really possible to measure in an exact way. But I want to share my data, and hopefully provide some inspiration, to show what is actually possible.

Quick context: The codebase is the Abundly agent platform, which has been live for 2+ years now, hosting hundreds of AI agents doing real work for our customers. Agentic Engineering in our case means using AI agents as dev team members. Cursor agents for raw coding and Abundly agents for release management, backlog management, end-to-end product design/development/orchestration, and other things upstream and downstream of the actual coding. Underlying LLM: Claude Opus in most cases. We have put a lot of effort into optimizing our codebase and architecture to work well for agents.

Key data for these 30 days, for a 5 person engineering team:

Codebase size was 260k lines. We've modified about 90k lines and added about 90k new lines (Git reports ~95k removals and ~183k additions, since a modified line is a removal + addition). So it's a living codebase, not just adding new stuff on top all the time.
Shipped 60 new customer-facing product features, 52 improvements, 49 bug fixes (see the changelog).
306 PRs merged. 83 of them made by Grace (our end-to-end engineering agent running on the Abundly platform itself). Most of Grace's PRs were triggered by non-engineers. The rest were made by human devs using Cursor. Here’s a demo video showing Grace and the other agents in action if you are curious.
Less than 1% of the code was written by hand, over 99% AI generated.
Of the 99% AI-written code, 30% was carefully reviewed, 45% was reviewed at high level (quickly skim the diff), and 24% was not reviewed at all. Each engineer decides what is worth reviewing in a PR, and this is where we ended up (rough numbers, self-reported estimates).

Fun fact: with Grace available on Slack, almost everyone at the company is directly involved in engineering, bantering with her, asking her to fix stuff, discussing what is possible/hard/easy, having conversations about design tradeoffs. Once a solution is agreed upon, implementation of most features takes a few minutes and is shipped within a day. However human engineers do all the PR merging (at least for now).

I was curious about quality, did some analytics compared to industry benchmarks. I didn't know what it would show, and was positively surprised!

Code duplication: 5.3%. Benchmark <5%. Result: Average.
Complexity: 3.2. Benchmark <10. Result: Great!
Functions >10 in complexity: 5.7%. Benchmark <10%. Result: Good.
Maintainability: 30%. Benchmark: >=20 good, 10-19 moderate, <10 low. Result: Good.

So overall pretty far above average quality.

For comparison, I asked Claude to estimate and calculate how much time and how large team would typically be required to ship this amount of stuff. It estimated about 9 months for 6 engineers, or 14 months for 4 engineers. So about 54-56 man-months. I would estimate about the same, from personal experience.

So 5 people x 1 month is an 11x improvement!

What about the human side? How does it feel to work this way, as an engineer?

I don't have concrete survey data there, just anecdotal evidence, hanging out with my dev team on a day-to-day basis. This is probably the happiest bunch I've ever worked with. The general feeling is that we have all gained superpowers. We've entirely stopped writing code by hand, yes, but instead we spend most of our time on architecture, product design, UX, user research, making decisions, orchestrating the agents, tuning the cursor rules, getting involved with customers, learning new stuff, and exploring what is possible to do.

Turns out that in software engineering, the act of manually typing lines of code wasn't really the fun part in the first place. And the fear that we devs would become project managers and lose a sense of purpose hasn't materialized. We feel very much like engineers still, and we feel a strong sense of ownership of our codebase and architecture.

OK so after painting this rosy picture - what are the pain points & challenges?

Handling the rate of change. As we add more and more agents to handle different parts of our process, the constraint/bottleneck moves. We need to keep up with that.
Not working overtime. I struggle a bit with this. Development is more fun than ever, which means it is harder to shut down.
Decision making is the main bottleneck now. So we need to optimize how and when we make decisions.
Testing and PR review is starting to become the next bottleneck. We already use AI help with that, and we only review where necessary. But we will need to improve things like agent-driven browser testing and regression testing in general.
When "everyone" is an engineer, talking to Grace, we need to provide guidelines and constraints so we don't go overboard with new stuff in the platform.

So what's the key takeaway?

I think most dev teams greatly underestimate what is possible with agentic engineering. I hope this article provides some inspiration. And if you are worried that this will make engineering more boring or reduce quality, here's at least one datapoint showing the opposite.

Fun fact: Grace did most of the analytics and generated the image above.

What does 10x engineering productivity look like in practice?

Read more

How to not AI Slop

The Trust Ladder: how to onboard AI agents like new colleagues

The First 100 Days as an AI Lead: The Playbook in 5 Minutes

Demo: The Human + AI-Agent Dev Team

Webinar slides: AI Powered Software Development from the Trenches

One File to Rule Them All — A Lesson in AI Agent Unsafety