Systems | Development | Analytics | API | Testing

Incident Management

Breaking Down the CrowdStrike Outage Part 1: Preventing Critical Errors from Reaching Production

On July 19th, 2024, the world witnessed a large-scale computer outage caused by a faulty update from cybersecurity giant CrowdStrike. This incident, affecting millions of Windows devices globally, serves as a stark reminder of the domino effect that software errors can have. Since then, CrowdStrike and other industry experts have shared their preliminary incident report in which they outline the incident and the steps they will take to prevent future issues like this.

Breaking Down the CrowdStrike Outage Part 2: Observability Strategies to Prevent Application Catastrophes

On July 19th, 2024, the world witnessed a large-scale computer outage caused by a faulty update from cybersecurity giant CrowdStrike. This incident, affecting millions of Windows devices globally, serves as a stark reminder of the domino effect that software errors can have. In part one of this series, we discussed the role QA methodologies can play in preventing future outages.

6 Ways to Improve Your Incident Management Reporting Process

Every organization faces incidents. These could be simple, such as customers being temporarily locked out of online accounts. Or they could be major incidents that damage your reputation, like a bad customer interaction that gets on the news. Or they could be damaging in another deep way, like a security breach or safety incident in manufacturing. Unplanned events are inevitable. How you handle them will make or break your organization.

Why Developers Should Care About Resilience

Recently, a friend reminded me of a joke we used to have when we were both developers at a huge software corporation (we won’t mention names, but back when printers were a thing, you probably owned one of theirs). We didn’t develop printers. We developed performance testing and monitoring tools. We were the dev team, which was completely separate from the QA team and from the Ops team (yes, I’m that old – we didn’t even call it DevOps back then).

PagerDuty integration with N|Solid

In the latest version of NSolid v4.4.2 NodeSource introduced the new PagerDuty integration that allows users to configure message notifications that are automatically triggered when your Node.js application experiences critical performance, lifecycle, and/or security events in production. This ensures DevOps professionals looking after applications running in production, can be notified on time about new performance and security issues.