Notification of new incident. Intermittent Login Issue: EU
Incident Report for uniFLOW Online
Postmortem

User Impact  

Users could not log into their uniFLOW Online tenant. This impacted both admin and user logins, blocking tenant administration and end user functionality. End user functionality such as account admin, print job upload. 

 

Scope of Impact: 

EU Deployment  

The impact was not across all tenants and only impacted a percentage of the users logging into the service. 

 

Incident Start Date and Time  

September 9th 5:22PM UTC 

Incident End Date and Time  

September 9th 10:50PM UTC 

 

Root Cause: 

It was identified that the login session was being revoked prematurely. The login session (cookie) is derived from several information sources, (tokens). It was found one of these sources could change during a scaling event as the underlying Azure hardware changed. 

Due to this the resulting cookie was no longer seen as valid and would force the logged in session to be revoked and the user passed back to the login screen. 

  

How did we respond: 

On detection of the event the Operations team were able to cycle the web services providing a consistent session token for the cookie creation and allowing login to take place. 

 

Next Steps : 

We apologize for the impact on affected customers. We are continuously taking steps to improve the uniFLOW Online Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):  

  • Monitoring for this event was put in place providing early detection of this specific failing condition. 

  • We will implement a cookie handling architecture independent of the Azure offering. This is being worked on with high priority and will be deployed once our review and Quality Assurance process are complete. 

 

Was this incident related to previous incidents? 

Yes, this incident happened twice, once on the 25th of September and the 27th.  

 

Customer Recommendations: 

  • There are no recommendations for this incident type.
Posted Oct 17, 2024 - 21:59 UTC

Resolved
Hello Everyone,

Update: Incident Resolved.

Date/Time: 10:50am UTC 9th September 2024

We have maintained a monitoring status for several hours to ensure we see no more indicators of this issue. Internal testing has seen a full recovery of the login service; we are also not seeing any negative field reports.

Preliminary Finding:
There was an issue validating the authentication cookies. Initial mitigation procedure was to cycle the web roles releasing the failed authentication session. The root cause of the failed cookie validation process is an ongoing investigation and will be detailed in the Post-Mortem once identified.

We are closing this without an identified Root Cause at this time. As a precautionary action the operations team has testing and monitoring procedures in place for early identification. If this is identified in the coming days, we will raise a new status page.

There will be a postmortem published once we have concluded our investigation but no longer then 20 business days.

We are sorry for the inconvenience this has caused.

Kind Regards
Online Operations Team
Posted Sep 09, 2024 - 11:34 UTC
Monitoring
Hello Everyone,

The issue has been isolated to a set of Web Roles and cookie session handling.

Operations have refreshed these roles and monitoring the situation closely.

We have positive reports from the field and internal testing. While mitigation actions have been successful a root cause is still under investigation.

Regards,
uniFLOW Online Operations
Posted Sep 09, 2024 - 08:20 UTC
Update
Hello Everyone,

NT-ware Operations team is working on mitigation actions to address this issue. Current actions are producing a positive result which we are monitoring closely as the service move to full recovery.

The issue is intermittent, if you find you cannot login, please wait 5 minutes and try again.

Next update will be in 30 minutes.

Regards
uniFLOW Online Operations
Posted Sep 09, 2024 - 07:13 UTC
Investigating
Incident details:

Investigating: We have multiple reports confirm that users are having login issues.

Start Time:
Approximately 5:30am UTC

Incident Scope: 
EU

Description:
User cannot log into their tenant.
User once logged in are immediately logged out.

Next Update:
The next update will be in 30 minutes.
Posted Sep 09, 2024 - 06:43 UTC
This incident affected: EU Deployment (Identification).