fbpx

Blame For Failure Is Irrelevant

Never allow any critical system to have a single point of failure.

By Mark Riffey

After a long week of travel, I returned home today to learn that a fair number of people didn’t get to see their favorite (or “new favorite”) team play for a Final Four spot on Sunday. Apparently there were some communication issues that prevented some local stations from receiving the feed of the game. A sports reporter for one of these stations blamed their internet provider for the downtime. They indicated that their control system connects through a remote link (whose doesn’t?) and the internet connection between them and the system was down. Thus, it was the internet company’s fault, right?

Symptoms of failure

Without that connection, we don’t get anything to air“, the reporter noted.

That seems like it might be a problem. What could we do so that this type of failure never happens again?

As described, a single connection to a remote location is the dependency for a major network television affiliate’s network content. The ability to deliver content created by the network – something they likely depend on for about 20 of their 24 of airtime each day – depends on a single internet connection.

Most of the time, the ability to access the parent network’s television content is not critical path functionality to a community – even though it’s likely considered critical functionality by the station. If the connection goes down, maybe they have recordings or cached programming they can play. However, there are times when live content could be critical in a life or death situation. This station is located in a part of the Midwest that is subject to tornadoes. Perhaps they have their own technology (radar, etc), content, and experts for those critical situations.

Bottom line, the station has a critical system with a single point of failure.

This isn’t the problem. It’s a symptom of the real problem: management.

A management problem

While TV coverage of a ballgame isn’t a problem on the scale of the situation with the 737 MAX, the station’s viewers are likely unhappy about the outage. If your sports bar pays that TV station (and their network) a big check so that you can show their content, you might also be upset. In the latter case, you also have a management problem if you only have one source of sports content in an environment that depends on sports. You get the idea.

I’m guessing most TV stations have generators to keep things running during power outages. How many have redundant internet connections from different suppliers? Even if you can’t afford a full-time redundant connection, you can schedule access via a second provider during periods when losing your feed could create massive problems. Not foreseeing possible connection issues at the worst possible time is more a failure of imagination than anything.

The management problem is allowing any critical system to have a single point of failure.

You’re the cure

A single point of failure may not as bad as losing electricity or internet connectivity. It could be people-related as noted during last week’s brain drain discussion. To summarize, if you don’t have checklists, documented systems, and well-defined processes, you could have a single point of failure when you lose an employee. If their work is critical path, your business risks temporarily losing the ability to take care of its commitments.

This is particularly critical for a working owner. If they do work that no one else at the company can do, they’re a possible single point of failure. At risk: The company and the personal economies of every employee family. What if the owner has a stroke or gets hit by someone who ran a red light? The company’s future is probably altered forever.

The work of avoiding such failures must be done in advance. It takes vision and imagination.

The situation is most dire when this owner happens to run the town’s biggest employer. The economy of the entire town could be crippled overnight. A lot of employee families could be placed in a terribly challenging situation – the kind of thing that cascades across a town’s economy.

Despite this risk, I don’t believe a town (or a state) should “guarantee” a company’s success. It’s the duty, obligation, & responsibility of the town & its people to avoid getting into a single point of failure situation. Once a town’s economy is destroyed, neither blame nor a legislature can fix it or the financial situations of her residents.

Want to learn more about Mark or ask him to write about a strategic, operations or marketing problem? See Mark’s site, contact him on LinkedIn or Twitter, or email him at [email protected].