User Impact
Users were unable to release print jobs on the device. The job could be seen but on release this would timeout. Scanning was also impacted with documents scanned not reaching the selected destination.
Scope of Impact
Europe (EU) Deployment
Incident Start Date and Time
August 12, 2024, 7:25 UTC
Incident End Date and Time
August 12, 2024, 8:50 UTC
Root Cause
uniFLOW Online worker roles reached an operational limit resulting in requests being queued waiting for resources. The operational limit was the result of an architectural migration made a week earlier.
Following Microsoft EOL directives, we had scheduled maintenance to move to new architecture as part of standard azure component lifecycle. This was completed a week earlier and tested without issue and continued in production for the next week (starting 5th August). This migration did not carry across a specific scaling limit previously set and was not identified during our testing and validation. The new value was below uniFLOW Online required operational limits.
On the 12th we saw a very high load as many people in Europe returned to work from summer holidays. The summer holiday period was the reason this was not seen in the week of the 5th as the load was much lower.
How did we respond
8:02 UTC: Support and Operations teams members collaborated on validation and remediation actions.
8:23 UTC: Operations team raise a public status page. Resources supporting the worker roles were over provisioned to quickly work through the queued request and return the system to normal operation.
8:50 UTC: On full-service recovery the operational limits were re-established.
Next Steps
We apologize for the impact on affected customers. We are continuously taking steps to improve the uniFLOW Online Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
We have reviewed this incident across the global operations team and key management members.
Monitoring and alerting improvements have been identified and scheduled for implementation.