In this impromptu episode, we look into the recent CrowdStrike update that led to a massive global IT outage, affecting millions of Windows devices. We’ll also share our experiences and observations, including how the IT community came together to tackle this crisis and valuable insights on resilience and the importance of robust business continuity procedures.
Episode Highlights:
In a world where cybersecurity threats are ever-evolving, staying ahead of the curve is crucial. However, even the most reliable security providers can face unexpected challenges. This was the case with CrowdStrike, one of the leading cybersecurity companies, which recently experienced a significant outage that impacted millions of devices globally. This blog post explores what happened during the CrowdStrike outage, its implications, and the lessons learned for businesses and Managed Service Providers (MSPs).
On a seemingly typical Friday afternoon, CrowdStrike users in various parts of the world began experiencing issues with their devices. The root cause was identified as a faulty content update released by CrowdStrike, which affected approximately 8.5 million devices, making it one of the largest outages in cybersecurity history.
The timing was particularly challenging, as the issue began affecting devices late on Friday in Australia, catching many off guard just as they were winding down for the weekend. The faulty update caused devices to go offline, and users quickly realised that the problem stemmed from CrowdStrike’s security software.
The affected devices ranged from self-service checkouts in supermarkets to high-end servers, causing widespread disruption. The fix for the issue was manual, requiring each device to be individually booted into safe mode to rename a specific file or directory. This labour-intensive process meant that recovery was slow and cumbersome, particularly for organisations with a large number of affected devices.
MSPs like First Focus played a critical role in managing the fallout from the CrowdStrike outage. While CrowdStrike worked on pulling back the problematic update, MSPs were on the front lines, helping their clients get their systems back online. For First Focus, the impact was somewhat mitigated because CrowdStrike was not their default recommended solution. However, they still had to assist the 10% of their clients who had opted for CrowdStrike’s services.
The situation highlighted the importance of having a responsive and proactive MSP. As the CEO of First Focus noted, the MSP community came together, offering mutual support and empathy during the crisis. This collaborative approach was essential in managing the situation and minimising the impact on clients.
Interestingly, the media initially linked the outage to Microsoft, causing some confusion. There was a separate, unrelated Azure issue on the same day, which further muddied the waters. However, it was clear to industry insiders that the problem was solely with CrowdStrike. Despite this, Microsoft’s name was frequently mentioned in media reports, leading to some unfair targeting.
This incident raises important questions about the relationship between security vendors and operating systems. Microsoft’s trust in the testing and deployment processes of its partners was brought into question. Going forward, there may be a need for Microsoft to rethink its approach to code-signing and software updates to ensure such issues do not recur.
The CrowdStrike outage serves as a stark reminder of the potential risks associated with security software updates. It underscores the need for robust testing and verification processes to prevent such incidents. For businesses and MSPs, there are several key takeaways:
For cybersecurity vendors like CrowdStrike, this incident highlights the critical importance of rigorous testing and quality control. The ability to push updates to millions of devices in real time is a powerful tool, but it also comes with significant responsibility. Vendors must ensure that their updates are thoroughly tested and that they have robust rollback mechanisms in place in case of issues.
CrowdStrike’s response to the outage, including their efforts to assist affected clients, will be scrutinised in the coming months. The company will need to demonstrate that they have learned from this incident and have put measures in place to prevent a recurrence.
One of the more concerning aspects of the CrowdStrike outage was how quickly cyber attackers tried to exploit the situation. Within hours of the issue becoming public, malicious actors began sending out phishing emails purporting to offer fixes for the problem. These emails targeted end-users and businesses alike, seeking to capitalise on the confusion and urgency of the moment.
This highlights the importance of cybersecurity awareness and training for employees. Businesses must ensure that their staff are trained to recognise phishing attempts and understand the correct protocols for dealing with such incidents.
As we look to the future, it is clear that incidents like the CrowdStrike outage will continue to occur. The key for businesses and MSPs is to build resilience and ensure they are prepared to respond effectively. This involves:
The CrowdStrike outage of 2024 was a significant event that disrupted businesses around the world. However, it also provided valuable lessons in disaster recovery, vendor management, and the importance of collaboration. By taking these lessons to heart, businesses and MSPs can improve their resilience and be better prepared for future challenges.
As we continue to explore the complex landscape of cybersecurity, it is essential to remain vigilant, proactive, and prepared. The CrowdStrike incident may have been one of the largest outages in history, but with the right strategies and mindset, we can ensure that its impact is minimised and that we are better equipped to handle whatever comes next.