July 23, 2024

686 CrowdStrike: A Look Back

Uncle Marv discusses the recent CrowdStrike incident that caused widespread computer outages due to a faulty sensor configuration update. He emphasizes the importance of disaster recovery and business continuity planning, rapid response, and communication. Marv also shares his personal experience dealing with a problematic HP system and a sudden staff resignation. Additionally, he provides updates on his upcoming attendance at ASCII Edge and encourages listeners to support the podcast through Amazon affiliate links.

Uncle Marv kicks off the episode by acknowledging the sponsor, SuperOps, and dives into the major news of the CrowdStrike incident. A faulty sensor configuration update from CrowdStrike led to widespread computer outages affecting about 8.5 million Windows machines globally. Marv explains how he managed client concerns and highlights the critical need for robust disaster recovery strategies and business continuity planning.

He underscores the importance of having a comprehensive crisis response plan and testing it regularly to ensure effectiveness. Marv also praises the IT community for their rapid response and effective communication during the incident, noting the collaborative efforts across various platforms and peer groups.

Marv shares his personal weekend ordeal with a new HP system that repeatedly blue-screened at a client's site but worked fine in his office. After troubleshooting, he discovered a faulty NVIDIA T1000 add-in card was the culprit. He also recounts the sudden resignation of a staff member, CJ, and the ensuing challenges due to the lack of communication from the client.

The episode wraps up with Marv discussing his upcoming trip to ASCII Edge and encouraging listeners to use his Amazon affiliate link to support the podcast.

Key Takeaways

CrowdStrike Incident: A faulty sensor configuration update caused widespread outages, emphasizing the need for robust disaster recovery and business continuity plans.
Rapid Response and Communication: Effective communication and collaboration within the IT community were crucial in managing the incident.
Personal Troubleshooting: Marv's experience with a faulty NVIDIA T1000 card highlights the challenges of diagnosing hardware issues.
Staff Resignation: The importance of clear communication between clients and service providers was underscored by a sudden staff resignation.

Links

ASCII Edge 2024: https://events.ascii.com/

CrowdStrike Lessons: https://tinyurl.com/9esmm7ys

CrowdStrike Community Support: https://tinyurl.com/mhypna7t

=== Show Information

Website: https://www.itbusinesspodcast.com/

Host: Marvin Bee

Uncle Marv’s Amazon Store: https://amzn.to/3EiyKoZ

Become a monthly supporter: https://www.patreon.com/join/itbusinesspodcast?

One-Time Donation: https://www.buymeacoffee.com/unclemarv

=== Music:

Song: Upbeat & Fun Sports Rock Logo

Author: AlexanderRufire

License Code: 7X9F52DNML - Date: January 1st, 2024

Transcript

Hello friends, Uncle Marv here with another episode of the IT Business Podcast, the show for IT professionals and managed service providers, where we try to bring you product stories and tips to help you run your business better, smarter, and faster. And I am coming to you on a Monday evening, Marv on Monday.

And as we get started, let me give a shout out to our sponsor of this episode, SuperOps, the cutting edge platform designed to streamline and enhance your IT operations, making it easier to manage your clients and deliver top notch service. Supercharge your MSP with SuperOps.

So let me go ahead and get started. And of course, the big news, the CrowdStrike incident that had everyone on edge.

So of course, we did have some calls and one client in particular called very early Friday morning in a panic thinking they were under cyber-attack. And luckily, everybody found out pretty quickly that this was not. And for those that do not know, I don't know how you are working in this industry and not know, but what we experienced was not a cyber-attack, but just simply a major technical glitch.

CrowdStrike, one of the leading, if not the leading cyber security company, pushed out a faulty sensor configuration update that caused widespread computer outages. The outage affected about 8.5 million computers globally, all Windows. A pretty big number, but really still less than 1% of all Windows machines.

And the impact was felt pretty much across the nation, probably the world. Financial institutions, airlines, health care facilities, and even some government offices were affected. So what I wanted to do was kind of give a couple of big takeaways that we as IT service providers should remember.

One of the things I think that was the first for me was in calming my client down to let them know the importance of disaster recovery and business continuity planning. Because of course their first question is, are we affected? And once we identified what was going on and I could assure them that no, they weren't affected. And really the only reason is because my clients are not running CrowdStrike Falcon software.

However, the question did come up, if we were to be affected by this, what's in place for us? And we talked about the fact that yes, we do have them backed up. We have some redundant systems, but they are not 100% redundant. So that sparked a conversation that we will be having a meeting later to go over that.

But for all of us out there, of course, it highlights the critical need for a robust disaster recovery strategy. If you're out there and you haven't done so, reassess your preparedness against major outages. And that could be any outages, folks.

Here in Florida, we talk about hurricanes a lot, but listen, everybody talks about ransomware attacks, breaches. But as we can see now, a simple update can be pretty big. So really any major outage that can affect the network.

So develop yourself a comprehensive crisis response scenario. And more importantly, test the plans to make sure that they work. That's pretty much the big thing.

What's the sense in having a plan that isn't tested until you need it? So make sure it works ahead of time. The second big thing that I want to point out is we saw the need for rapid response and communication. And, of course, Microsoft and CrowdStrike were pretty quick to get out what the issue was, let everybody know that it wasn't a cyber event and that fixes were underway.

Of course, those fixes took some time, but they did show good communication in letting everybody know what had happened. The news outlets, for once, didn't get ahead of themselves and allowed for all of this to play out. The other thing that I think was really significant in this communication was watching all of the groups out there, the Facebook groups, ASCII, peer groups.

Everybody, I think, was very helpful and very supportive, not only just collaborating with each other as to what was going on but offering their help to other MSPs to help expedite recovery efforts. Some people, of course, had to do boots in the ground, so people in different areas were offering their assistance. So that was very good to see in our community and speaks highly to what I think is really going to keep us going no matter what.

So thank you to everyone out there that kept calm heads and offered assistance to everybody. That was great. Now, of course, the third thing that I want to point to, and of course what all the news channels always talk about, is how can we prevent this from ever happening again? The only thing I think we can say is obviously we're finding out that no one is immune from any type of event.

In Florida, we have the hurricanes, we have tropical storm events, power outages. Other parts of the country, there are earthquakes, there are tornadoes. Our electrical grid in this country is faltering in some places.

We have updates that have happened in years past. Companies have pushed out faulty updates. People have flipped the wrong switch.

All of these things have happened. But really, I think the only thing we can really do is make sure that we have in place the best that we can, a comprehensive testing and deployment plan when it comes to updates, patches, all of those things. I know there are several of you out there that won't install Microsoft patches on Patch Tuesday because you want to see how they are affected in the wild in real time, and we'll install them maybe a week or two later.

Obviously, that didn't happen here with CrowdStrike, but having some sort of phased deployment approach where updates get rolled out to a small group before full-scale deployment, it's easy to say that now, but that's probably something that we all should consider in our clients' environments if we have a test network that we can do the updates first on our own system, air-gapped preferably so that you don't take down your own office, but then roll them out to your clients as you see that they are working properly. Of course, having good monitoring, and I mentioned earlier the incident response capabilities so that when something does happen, it can be identified quickly, isolated quickly, and resolved quickly. That's pretty much it.

I don't want to get too much into the weeds. I know that I didn't speak out all weekend on this. We had a lot of people already posting videos and updates and describing the way to fix it, so I just wanted to throw my two cents in and say that I think that the community did a great job, and kudos to everybody.

I probably should have started here with some housekeeping. The reason that I am here on a Monday is that there will not be a live show this Wednesday, and that's because if everything is back on schedule with the airlines, I will be probably still in the air because I had to have my flight pushed back already, but I should still be in the air heading to Boston for ASCII Edge. For those that do not know, ASCII Edge is one of the premier events put on by ASCII for IT solutions providers.

They have vendor presentations, networking opportunities, and a nice fun event that they do. So I will be there this week. That's Wednesday and Thursday, July 24th and 25th.

I'll have a link in the show notes to the ASCII events page so that you can see when the rest of those events are coming up this year. And I will definitely be at the last one, which is always called the ASCII Cup. It's the last event of the year, and it is in St. Petersburg, October 23rd through the 24th.

So if you are in that area, plan to come out and hang out with us at ASCII. So another reason that I did not do a whole lot of stuff on the CrowdStrike incident was because, one, my clients weren't affected, but I had one thing over the weekend that I actually was going bonkers with, and that was a brand-new HP system. And it was one of the high-end graphic stations that, for some reason, was working fine in my office.

But when I took it to the client on Friday, it started blue screening. And, of course, we thought, crap, could it be what was happening, you know, what's going on? I'm like, it can't be. I don't have CrowdStrike Falcon.

What could it be? And I went through everything, memory, hard drive, and power supply, trying to figure out why this machine was blue screening on-site. Brought it back to the office Saturday. Everything worked perfect.

Took it back to the client late Saturday, and it worked fine for, like, 10 minutes. So I thought, okay, we're good, and came back, and just happened to remote in on Sunday just to check, and, oh, what do you know, blue screen. So went back to the client on Sunday, something I normally would not do.

Turns out it was a faulty NVIDIA T1000 add-in card. And for the life of me, I don't know why it worked here at my office, but not at the client. I used the same cables to go to the monitor, mini DisplayPort to HDMI, and mini DisplayPort to VGA.

But for the life of me, don't know why this card was, you know, having that issue, but I will be checking it in another system. So luckily the station came with two built-in DisplayPorts on the motherboard, so right now I've got them hooked up to that, and they are running just fine. But that kept me busy this weekend.

And then this morning, a little bit of office drama for you folks. The first client email that I received today, many of you will understand, but CJ has quit. And the funny thing about it was, first, I did not get two weeks' notice.

Here it is Monday, and I get the notification that today will be CJ's last day. But the kicker is, you know, as some of you were probably shaking your head, thinking, oh CJ, oh CJ, but no, it wasn't CJ that was the problem. CJ gave his two weeks' notice.

The issue was with the client. They did not pass that information on to me. So that was a big part of my morning, figuring out what things need to be done as CJ leaves, and the joys of a co-working environment when the client doesn't keep you in the loop of things.

So that was my joy today. Pretty short show, that's it. I do want to give a big shout out to all of you that have been using my Amazon link.

I want to tell you that we are not even two-thirds of the way through, and this is the biggest month ever. So I want to thank you for all that you have done. Your support obviously means a lot to me, helps keep the podcast going.

And if you are listening and thinking, Marv, what are you talking about? And it's just simply this, if you like the show and you want to help us keep doing this, and you don't want to sign up for a subscription or a sponsor, it's really super easy, effortless basically, to support the show. The next time you're shopping on Amazon, use my affiliate link. It won't cost you anything extra, but a small percentage of your purchase will go towards supporting the show.

Here's how it works. You go to the website, or you look in the show notes, and you click on the Amazon affiliate link. You shop as you normally would.

That's it. Amazon takes care of the rest. The only pointer I can give beyond that is maybe save that link as your home Amazon page when you go to shop.

A shortcut on your desktop, a shortcut in your bookmark, I mean a bookmark shortcut in your browser, just something so that when you go to shop on Amazon, just click on that. Every little bit helps, and I appreciate your support. So that's going to do it, folks.

I'll be back probably later this week with some live interviews from ASCII. I will be doing some more vendor profile updates for the IT Nation PitchIT. And we'll be back soon.

So that's it. We'll see you next time. And until then, Holla.

686 CrowdStrike: A Look Back

Recent Episodes