On the 8th of February the uniFLOW Online system was seriously impacted by critical system failure within the Southeast Asia Microsoft Azure Data Centre.
During the affected period users of uniFLOW Online will have experienced delays and timeouts of core functionality. Print and scanning services will have been impacted resulting in print and scan functions failing or needing to be performed multiple time before the action would complete.
This incident was limited to our Singapore deployment.
uniFLOW Online incident end: Feb 8th, 2023 – 07:30 UTC
Microsoft incident end: Feb 9th, 2023 – 04:30 UTC
(This information is derived from the Microsoft incident, “Datacentre Cooling Event – Southeast Asia, Tracking ID: VN11-JD8), preliminary report.
NOTE: At the time of writing this PM (Post-Mortem) Microsoft has yet to release their own full incident report and NT-ware recommends you follow official Microsoft status updated for a full account of the incident.
While multiple Azure resources were impacted across the affected infrastructure it was the Azure Service Bus component that NT-ware utilises that affected uniFLOW Online. The Azure Service Bus is vital for the queueing and management of events within uniFLOW Online. These events can be a user requesting a print job to be released or a mobile print job submission, etc.
In the degraded state the service was not able to handle the number of requests which resulted in delays and timeouts. It is important to note that the service was NOT offline, and many requests were being processed and if a user experienced a timeout, it was very likely that subsequent attempts would work. For example, if a print job was not released at the device a second or third attempt may have. This was evident in our testing and could be seen in the azure metrics.
uniFLOW Online Emergency Mode:
The uniflow Online emergency was designed to allow local printing when the desktop SmartClient was not able to contact the cloud service for job submission. As the SmartClient was connecting to unaffected azure services it did not meet the fail-over condition, hence not enabled automatically by the system.
Manually enforcing Emergency mode was considered by the NT-ware Operations teams and not actioned. It was determined based on available metrices that there was an equal number of successful queued tasks getting through as there were errors. Enforcing this action may well have further impacted already working tenants depending on their configuration. The option was designed as a failsafe if the automated detection mentioned above was not triggered, however print submission by SmartClient was not available due to a cloud service failure. To additionally meet the required criteria ALL printing would been to be affected and during this incident this was not the case.
Feb 7th, 2023 – 20:30 UTC
Feb 7th, 2023 – 21:03 UTC
Feb 08, 2023 - 01:21 UTC
Feb 08, 2023 - 02:00 UTC (Approx)
Feb 08, 2023 - 05:30 UTC
Feb 08, 2023 - 07:00 UTC
Feb 8th, 2023 – 12:00 UTC
We apologize for the impact to affected customers. We are continuously taking steps to improve the uniFLOW Online Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):