24 July 2024

The Great CrowdStrike Outage of 2024: What Happened & What’s Next? with Ross Sardi – Business Focus Ep 13

The Great CrowdStrike Outage of 2024: What Happened & What’s Next? with Ross Sardi – Business Focus Ep 13
Business Focus Ep 13:

The Great CrowdStrike Outage of 2024: What Happened & What’s Next?

In this impromptu episode, we look into the recent CrowdStrike update that led to a massive global IT outage, affecting millions of Windows devices. We’ll also share our experiences and observations, including how the IT community came together to tackle this crisis and valuable insights on resilience and the importance of robust business continuity procedures.

Episode Highlights: 

  • CrowdStrike’s response and the manual remediation process
  • Insights into disaster recovery planning and business continuity
  • The immediate and long-term impacts on businesses and essential services

 

Lessons Learned from the 2024 CrowdStrike Outage

In a world where cybersecurity threats are ever-evolving, staying ahead of the curve is crucial. However, even the most reliable security providers can face unexpected challenges. This was the case with CrowdStrike, one of the leading cybersecurity companies, which recently experienced a significant outage that impacted millions of devices globally. This blog post explores what happened during the CrowdStrike outage, its implications, and the lessons learned for businesses and Managed Service Providers (MSPs).

The Incident: What Happened?

On a seemingly typical Friday afternoon, CrowdStrike users in various parts of the world began experiencing issues with their devices. The root cause was identified as a faulty content update released by CrowdStrike, which affected approximately 8.5 million devices, making it one of the largest outages in cybersecurity history.

The timing was particularly challenging, as the issue began affecting devices late on Friday in Australia, catching many off guard just as they were winding down for the weekend. The faulty update caused devices to go offline, and users quickly realised that the problem stemmed from CrowdStrike’s security software.

The Immediate Impact

The affected devices ranged from self-service checkouts in supermarkets to high-end servers, causing widespread disruption. The fix for the issue was manual, requiring each device to be individually booted into safe mode to rename a specific file or directory. This labour-intensive process meant that recovery was slow and cumbersome, particularly for organisations with a large number of affected devices.

The Role of MSPs

MSPs like First Focus played a critical role in managing the fallout from the CrowdStrike outage. While CrowdStrike worked on pulling back the problematic update, MSPs were on the front lines, helping their clients get their systems back online. For First Focus, the impact was somewhat mitigated because CrowdStrike was not their default recommended solution. However, they still had to assist the 10% of their clients who had opted for CrowdStrike’s services.

The situation highlighted the importance of having a responsive and proactive MSP. As the CEO of First Focus noted, the MSP community came together, offering mutual support and empathy during the crisis. This collaborative approach was essential in managing the situation and minimising the impact on clients.

Microsoft’s Involvement and Media Perception

Interestingly, the media initially linked the outage to Microsoft, causing some confusion. There was a separate, unrelated Azure issue on the same day, which further muddied the waters. However, it was clear to industry insiders that the problem was solely with CrowdStrike. Despite this, Microsoft’s name was frequently mentioned in media reports, leading to some unfair targeting.

This incident raises important questions about the relationship between security vendors and operating systems. Microsoft’s trust in the testing and deployment processes of its partners was brought into question. Going forward, there may be a need for Microsoft to rethink its approach to code-signing and software updates to ensure such issues do not recur.

Lessons Learned and Future Implications

The CrowdStrike outage serves as a stark reminder of the potential risks associated with security software updates. It underscores the need for robust testing and verification processes to prevent such incidents. For businesses and MSPs, there are several key takeaways:

  • Disaster Recovery Planning: The importance of having a comprehensive disaster recovery plan cannot be overstated. This includes ensuring that backup systems are in place and that recovery plans are regularly tested and updated.
  • Vendor Agnosticism: While having a single vendor for security solutions can simplify management, it also introduces a single point of failure. Businesses, particularly larger organisations, might consider diversifying their security vendors to mitigate risk.
  • Insurance and Liability: Businesses should review their insurance policies to ensure they cover business continuity and consequential losses. In the event of a significant outage, insurance can help cover the costs associated with downtime and recovery.
  • Collaboration and Community: The response from the MSP community during the CrowdStrike outage was a testament to the power of collaboration. Sharing knowledge and resources can significantly improve response times and outcomes during crises.
  • Proactive Communication: Transparent and proactive communication with clients is crucial during such incidents. Keeping clients informed about the issue, the steps being taken to resolve it, and the expected timelines can help manage expectations and maintain trust.

The Role of Cybersecurity Vendors

For cybersecurity vendors like CrowdStrike, this incident highlights the critical importance of rigorous testing and quality control. The ability to push updates to millions of devices in real time is a powerful tool, but it also comes with significant responsibility. Vendors must ensure that their updates are thoroughly tested and that they have robust rollback mechanisms in place in case of issues.

CrowdStrike’s response to the outage, including their efforts to assist affected clients, will be scrutinised in the coming months. The company will need to demonstrate that they have learned from this incident and have put measures in place to prevent a recurrence.

The Opportunistic Nature of Cyber Attackers

One of the more concerning aspects of the CrowdStrike outage was how quickly cyber attackers tried to exploit the situation. Within hours of the issue becoming public, malicious actors began sending out phishing emails purporting to offer fixes for the problem. These emails targeted end-users and businesses alike, seeking to capitalise on the confusion and urgency of the moment.

This highlights the importance of cybersecurity awareness and training for employees. Businesses must ensure that their staff are trained to recognise phishing attempts and understand the correct protocols for dealing with such incidents.

Moving Forward: Strengthening Cyber Resilience

As we look to the future, it is clear that incidents like the CrowdStrike outage will continue to occur. The key for businesses and MSPs is to build resilience and ensure they are prepared to respond effectively. This involves:

  • Regularly Updating and Testing Disaster Recovery Plans: Ensuring that disaster recovery plans are not just theoretical but are practical and regularly tested.
  • Implementing Robust Backup Solutions: Having reliable backup solutions that can be quickly accessed and deployed in the event of an outage.
  • Fostering a Culture of Cybersecurity: Encouraging a culture of cybersecurity within the organisation, where employees are aware of potential threats and know how to respond.
  • Collaborating with Industry Peers: Leveraging the collective knowledge and experience of the industry to improve response times and outcomes during crises.

Conclusion

The CrowdStrike outage of 2024 was a significant event that disrupted businesses around the world. However, it also provided valuable lessons in disaster recovery, vendor management, and the importance of collaboration. By taking these lessons to heart, businesses and MSPs can improve their resilience and be better prepared for future challenges.

As we continue to explore the complex landscape of cybersecurity, it is essential to remain vigilant, proactive, and prepared. The CrowdStrike incident may have been one of the largest outages in history, but with the right strategies and mindset, we can ensure that its impact is minimised and that we are better equipped to handle whatever comes next.

Business Focus