Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×

This Is a Drill

You don’t have a working disaster recovery plan until you originate programming from your backup location and transmit to your community

In my previous three articles about disaster recovery, I’ve shown a simple way to construct a response-based disaster recovery/business continuity plan, discussed using IT tools like Recovery Time Objective and Recovery Point Objective to help figure out how you should prioritize your preparations, and examined how you could use your existing content management systems in the most effective way to prepare for incidents that threaten your business.

Let’s look at the last step.

You’ve made your preparations, found a good spot to use as a backup site, installed equipment and set up backups for your content management systems. You’re feeling pretty good about all the work you’ve done, and you think you’re ready-to-go. Bring on the disasters!

Not so fast. Preparations are great, but you don’t have a working disaster recovery plan until you originate programming from your backup location and transmit to your community. If you have a generator at your studio and/or at your transmitter, you need to start it up and transfer the facility to it — not run into a dummy load. If you are working with a content management/playback system, you need to use it while you’re working from your backup site. You won’t have confidence in your planning and preparations unless you operate the backup locations and systems.

Some people are reluctant to actually use their backup systems, seeing them as only a last resort to use in situations where the alternative is being off the air. In reality, the best way you can ensure program (and business) continuity during an incident is to practice your procedures and operations from your backups.

REAL-LIFE EXAMPLE

National Public Radio, headquartered in Washington, D.C., operates the Public Radio Satellite System, a multiple-channel live audio and file delivery system. The PRSS delivers thousands of hours of programs for NPR and other public radio producers — many of those hours are live-use at stations. Any incidents at the PRSS Network Operations Center could affect the on-air sound at hundreds of radio stations.

The staff at NPR’s distribution department take seriously their commitment to providing programs, even if the NOC is offline. They built and maintain a backup/disaster recovery facility at Minnesota Public Radio’s headquarters in St. Paul, Minn.

Toby Pirro, senior manager, Broadcast Operations for NPR Distribution, describes it:

The Back-Up Network Operating Center is intended to provide disaster recovery service for the PRSS ContentDepot customers and service. The BUNOC is designed to ingest live audio and files from program producers, store them and then transmit them over the PRSS satellite transponders. When things are operating properly, public media station staffers don’t have to do anything to properly receive their scheduled programs on time, even if there’s an incident that affects the PRSS Network Operations Center.

To make sure that the facility is working properly, PRSS staff, in coordination with staff at MPR, switches network programming and control from the NOC to the BUNOC at least once every calendar quarter.

Here’s Pirro again:

“We have been performing BUNOC drills since August of 2009. We now plan on doing them quarterly. Our last one was on June 14.

“We have been doing them lately on Sunday afternoons/evenings, when the program load is relatively light. We want to give the staffs in Washington and St. Paul the opportunity to get used to the procedures for making the changeover. We have all learned that the time to read the manual is not when the equipment has failed. We have been able to streamline our procedures from what initially took over an hour to make a complete transition of all services to under a half hour, but we can get live stream programming switched almost immediately as the situation may require.

“We are still working to make the switchover a ‘one-button’ procedure. We look forward to moving drills to other days in the week and different hours, not only for our staffs to become familiar, but to allow stations to experience a ‘seamless’ transition of services.”

The PRSS BUNOC is an elaborate example — they’re backing up more than a dozen live audio channels, audio file delivery and a messaging system — but it illustrates why you need to test and practice incident response procedures.

PRACTICE MAKES PERFECT

The staff at the NOC, MPR and at other stations practice their switch-overs so everyone knows what to do and where to find information about programming coming from the backup site. Compare that to your station: Does your staff know — really know — what to do in case of an incident? Do they even know where the emergency procedures binder is? You do have an emergency procedures binder, right?

The PRSS switchover time from the primary to the backup site is now about half what it was when NOC staff started drills. A lot of that is because the staff at the two sites can perform the steps faster, but it’s also because they’ve streamlined their procedures over the years. As they go through drills, they discover ways to make switchovers faster — and that can’t happen without actually doing switchovers.

You can probably write procedures for your operations staff to switch to your backup site or write procedures for starting and switching to your generator at your studio, but wouldn’t they be better if they were actually tried multiple times? Wouldn’t your staff have more confidence during an incident if they had already practiced those emergency procedures several times in regular drills?

And there’s another very good reason to practice your emergency procedures.

Here’s Pirro again:

“One time when we switched to the BUNOC, there was an equipment problem there, which forced us to switch back about 15 minutes later. This caused us to review our pre-test inspections to check for some errors we hadn’t previously anticipated. Another learning experience that will hopefully save us in a real emergency.”

Radio stations are complex systems, which is one of the reasons they’re so interesting. Even the smallest stations have to coordinate dozens of individual functions to operate, from the content management and playback systems to the transmitter and antenna, to the traffic and business systems. As I’ve noted in previous installments, you need to account for all those systems in your disaster recovery plans — even if you decide (based on RTO and RPO!) not to provide backups for some of them.

Your drills provide a great way for you to discover problems transferring to backup systems and operating from them before you actually need the backup systems. In a drill, you should be able to “back out” of the transfer — or end the drill sooner than planned — if you discover a problem. If you wait until an incident forces you to go to your backups to see how your procedures work, any problems will at minimum cause delays — and at worst could keep you off the air.

MAINTENANCE

There’s a hidden opportunity in doing regular drills, too. Since the point of a drill is to move off your primary systems (and possibly out of your primary location), this is a good time to do that maintenance work on your primary systems. Of course, don’t do maintenance during the first drill; keep that one for sorting out issues with your procedures. After that, when you successfully move to your backup systems, take that time to do those upgrades or solve those problems that have been bothering you.

Organizing and executing regular drills of your disaster recovery procedures and facilities takes time, effort and money to do well. Drills are an investment that will pay off when you most need reliable operations during an unplanned incident, and could easily pay for themselves by helping keep your station on the air when your community needs you.

Bridgewater works with radio stations, program producers and other media companies to help them solve sticky problems, including analyzing and enhancing their disaster recovery preparations.

Close