The organisations that get automation right are typically not the ones with the most sophisticated requirements. They're the ones that start small, measure everything, and expand based on evidence.
A 30-day pilot is the standard we recommend for every first deployment. Here's what that looks like.
Why 30 days
Long enough to generate real signal. Short enough that if something isn't working, you haven't committed two quarters to finding out.
In a 30-day pilot, a digital worker will typically handle several hundred to several thousand tasks — enough volume to surface edge cases, identify gaps in the logic, and give you a confident read on the automation rate before you move to production.
Days 1–7: Definition
The first week is entirely planning. No code. No integrations.
We map the task in exhaustive detail:
- What triggers the task?
- What data sources does it need?
- What are all the decision branches?
- What does a successful output look like?
- What should the digital worker do when it encounters something it can't handle?
The last question is usually where the interesting conversation happens. Most teams haven't explicitly defined their escalation logic — they've left it to individual judgement. Making it explicit is valuable regardless of whether you automate.
By the end of week one, you have a complete decision map. This document becomes the spec for the digital worker, and later, the standard against which you measure it.
Days 8–14: Build and integration
The digital worker is configured against the spec. Integrations to existing systems — CRM, help desk, ERP, whatever the task requires — are built and tested with synthetic data.
This is also when we configure the handoff logic. How does the digital worker pass context when it escalates? What information does the receiving human need to see immediately? What tone should the customer communication take?
End of week two: the digital worker runs flawlessly on synthetic data.
Days 15–21: Shadow mode
The digital worker operates in parallel with the existing process. It processes every real task, but its outputs are reviewed by a human before any action is taken.
Shadow mode serves two purposes. First, it surfaces any issues with real-world data that synthetic testing didn't catch. Second, it builds team confidence — the people whose workflow this affects can see what the digital worker is doing before it does anything autonomously.
This is the phase most organisations want to skip. Don't. The feedback from shadow mode is invaluable, and the trust it builds with your team makes the production rollout substantially smoother.
Days 22–28: Supervised production
The digital worker operates autonomously on tasks within a defined scope, with daily review of a sample of its outputs. The escalation rate, automation rate, and quality metrics are tracked.
Most pilots hit an automation rate between 70–82% in this phase. The remaining tasks go to humans with full context from the digital worker — which typically means they're resolved faster than if the digital worker hadn't been involved at all.
Days 29–30: Decision point
At the end of day 28, you have a complete dataset:
- Volume handled autonomously vs. escalated
- Error rate on autonomous actions
- Time-to-resolution compared to pre-pilot baseline
- Team feedback
This data drives the production decision. In most pilots we've run, the numbers make the case clearly. Occasionally, they surface a need to refine the scope before expanding. Either outcome is valuable.
What a 30-day pilot costs
Less than you expect. The upfront investment is the definition phase — the time your team spends mapping the process in week one. That investment pays back regardless of what you decide at day 30, because a well-mapped process is easier to manage even without automation.
The technical build and integration is on us. The pilot ends with either a production deployment or a clear understanding of what needs to change before one.
One pilot becomes a programme
Every organisation that completes a successful pilot deploys a second digital worker within 90 days. Not because we push them to, but because the first one demonstrates what's possible clearly enough that the next candidates are obvious.
Start with one task. Prove the model. The rest follows.
