Essay / Note
The next software management job is deciding what gets automated
As coding agents move from assistance toward automation, the practical management question is no longer just whether developers use AI. It is which classes of work should be automated, supervised, staged, or kept human-led.
A useful data point slipped past a lot of the usual model chatter this week.
Anthropic’s latest Economic Index on software development found that 79% of Claude Code conversations looked like automation rather than augmentation. In plain English: a lot of people are no longer just asking AI to help them think. They are asking it to do chunks of the work.
That matters more than another benchmark screenshot.
Because once coding agents are used mainly for automation, the management question changes.
It is no longer just:
Should our developers use AI?
It becomes:
Which kinds of software work should be automated, which should be supervised closely, and which should stay firmly human-led?
I think a lot of teams are still underreacting to that shift.
What changed
Two things now seem clearer.
First, coding agents are moving from assistive surfaces toward delegated execution.
Anthropic’s data is one signal. The broader product market is another. The tools being packaged around agents now assume longer-running tasks, tool use, repository context, environment access, and multi-step execution rather than just code suggestions in a text box.
Second, the adoption pattern still looks uneven.
Anthropic’s same report suggests startups are much earlier and more aggressive adopters of Claude Code than enterprises. That tracks with what you would expect. Smaller teams can accept more workflow mess, narrower controls, and faster experimentation. Larger organizations usually cannot.
That gap is important.
It does not mean enterprises are behind because they are stupid. It often means the real problem is no longer raw model access. It is operational design:
- what the agent is allowed to touch
- which tasks it can complete end to end
- where review belongs
- what counts as safe enough to automate
- who owns the failure when the automation is wrong
That is a management design problem, not just a tooling problem.
What people are overreacting to
I think people are still overreacting to the simple headline:
AI can now code more of the software lifecycle.
Yes, it can. But that statement is too crude to be useful.
The mistake is jumping from “the agent is impressive” to “we should automate as much development work as possible.”
Software work is not one thing.
Some tasks are:
- bounded
- reversible
- easy to test
- easy to diff
- low-cost to redo
Those are strong candidates for automation.
Other tasks are:
- architecture-shaping
- dependency-sensitive
- politically cross-functional
- security-relevant
- hard to evaluate until much later
Those are not the same kind of work.
If a team treats both categories as equally automatable, it will confuse model capability with managerial judgment.
What people are underreacting to
The underreaction is that teams now need task classes for automation.
Not just AI policies. Not just approved tools. Not just “engineers may use copilots.”
Task classes.
In other words, a team should be able to say something like:
Class 1 — Safe to automate
Good candidates:
- test generation for well-understood modules
- routine refactors with strong diff visibility
- documentation drafts
- boilerplate UI work
- simple migration scripts with clear rollback
Class 2 — Automate, but with staged review
Good candidates:
- multi-file feature implementation
- bug fixing in familiar code paths
- data transformation jobs
- support tooling
- infra changes in non-production environments
Class 3 — Human-led, AI-assisted
Good candidates:
- architecture decisions
- permission model changes
- sensitive customer workflows
- production incident response
- changes with large blast radius or unclear evaluation
That structure matters because it shifts the conversation away from vague tool enthusiasm and toward governed delegation.
And that is where the real leverage is.
Why this matters for managers
If you manage software teams, I think one of the least useful questions you can ask now is:
Are we using coding agents enough?
That question invites status theater.
People can answer it with screenshots, anecdotes, and noisy productivity claims. But it does not tell you whether the team is automating the right things.
The more useful questions are:
- What work are we comfortable automating end to end?
- What work requires plan approval before execution?
- What work should stay human-led because evaluation happens too late or failure cost is too high?
- Where do we expect the agent to prepare work, not finish it?
- Which mistakes are cheap, and which ones create hidden cleanup?
Those questions are much more operationally honest.
They also reveal something important.
The next management job is not just tool adoption. It is automation boundary design.
That includes:
- task classification
- review placement
- repo and environment permissions
- rollback expectations
- escalation rules
- evaluation standards by work type
That is how teams escape the false choice between:
- fully manual software work
- blind faith in coding agents
Why this matters for builders
If you are building coding-agent products, there is a real lesson here too.
A lot of the market still sells the dream of a general software worker. But real trust usually grows through narrower, better-defined operating envelopes.
The strongest products will not only say:
- here is a powerful coding agent
They will also help teams answer:
- what sort of work is this agent fit for?
- what approval shape should this task use?
- what context should be visible before execution?
- what is reversible?
- what is too risky to batch blindly?
In other words, the better product will often be the one that helps a team classify work, not just generate more of it.
A practical operating model
If I were managing a software team using coding agents right now, I would not start by saying:
Everyone should use the agent more.
I would start with a lightweight automation map.
For each recurring task type, define:
1. Task shape
Is it predictable, repetitive, and easy to inspect?
2. Blast radius
If the agent gets this wrong, what actually breaks?
3. Evaluation speed
Can we tell quickly whether the work is good, or will the mistake surface much later?
4. Required context
Does success require local code understanding only, or broader system and business judgment?
5. Approval mode
Should the agent:
- execute directly
- prepare and wait
- propose options only
- stay out of this class entirely
That gives you a much better operating model than generalized hype about AI coding productivity.
The practical mistake to avoid
The mistake is not adopting coding agents. The mistake is adopting them without a view of which work deserves which level of automation.
That is how teams end up with one of two bad outcomes.
Failure mode 1: underuse
The organization gets spooked and restricts AI to trivial autocomplete-style assistance, leaving a lot of useful automation on the table.
Failure mode 2: overdelegation
The organization lets agents roam across large, messy, weakly evaluated tasks, then discovers too late that the cleanup burden erased the speed gains.
The right answer is usually in the middle.
Not generic caution. Not blind acceleration.
Structured delegation.
Working thesis
My current view is this:
The next software management job is not deciding whether AI can code. It is deciding what kinds of software work should be automated, staged, supervised, or kept human-led.
Anthropic’s data matters because it suggests the transition is already underway.
Coding agents are no longer just another way to brainstorm. They are becoming a delegation layer.
And once that happens, the practical winners will not be the teams with the loudest AI enthusiasm. They will be the teams that get more precise about automation boundaries, review design, and task classes.
That is a much less glamorous story than “AI writes code now.”
It is also the more useful one.