About Me

My photo
Scott Arnett is an Information Technology & Security Professional Executive with over 30 years experience in IT. Scott has worked in various industries such as health care, insurance, manufacturing, broadcast, printing, and consulting and in enterprises ranging in size from $50M to $20B in revenue. Scott’s experience encompasses the following areas of specialization: Leadership, Strategy, Architecture, Business Partnership & Acumen, Process Management, Infrastructure and Security. With his broad understanding of technology and his ability to communicate successfully with both Executives and Technical Specialists, Scott has been consistently recognized as someone who not only can "Connect the Dots", but who can also create a workable solution. Scott is equally comfortable playing technical, project management/leadership and organizational leadership roles through experience gained throughout his career. Scott has previously acted in the role of CIO, CTO, and VP of IT, successfully built 9 data centers across the country, and is expert in understanding ITIL, PCI Compliance, SOX, HIPAA, FERPA, FRCP and COBIT.

Friday, May 18, 2012

DR Test - What Test?

I got your emails, and some new jokes from the readers of this blog.  Thank you.  The emails asked if I would give some high level recommendations on the DR test.  How would you go about setting up a real test of your environment and yet keep the business running.  So, at a high level, let me see if I can answer your questions.

First of all you are not going to be able to do your test during the week, this is a Saturday activity for most of you.  So schedule your DR test out far enough to give your team advance notice and you time to plan.  Next, get the big conference room scheduled (war room) for the day, have plenty of coffee, soda and some food, it can be a long day.  Other tools to have on hand is a working phone in the data center to be up and on speaker phone with the war room, and some application testers lined up.  Have a few work from home to test remote access, web applications, etc. 

I always recommend if this is your first test, start small - fail one application.  Don't make this so big your first time that it becomes unmanageable and confusion.  You will learn a great deal about your process, environment and plan with just 1 to start with.  If this is not your first time and you are ready to call a Disaster on your data center, then here are some suggestions:

  • Communicate, communicate and communicate.  Send out emails to all your staff, both IT and non-IT.  Put up posters, put a notice on the intranet page, have some staff working the service desk.  No matter how much you communicate, someone will still call the service desk their email is down or they can't get to their presentation to work on it.  Make sure everyone is aware that you are doing a DR test on Saturday and systems will be unavailable.  I would also put it upon each department manager to communicate in their staff meetings this same message. 
  • Ensure Friday night all your backups are complete, and verified.  You may need to start your backups early to ensure they complete on time.  When you start moving things around, things happen. 
  • Make sure your Saturday team has a updated DR plan prior to Saturday.  I like to send them out a week in advance telling them all to ready and prepare. 
  • Have a plan for the test - document it, how will this be done, who is doing what, when, and how will they document their portion of the test.  What worked, what did not work, and lessons learned.  What can we do different next time.
  • Fail the primary data center.  Now let me give a few words of caution because there have been organizations that turned everything off, etc.  What you are doing is testing users can get to the systems, data, applications in your backup environment.  Some of this legacy hardware when you turn it down, that has been running for years, may not come back up.  Be careful.  There was a question, can we incorporate a generator load test during this - sure.  If your DR test is a loss of utility power - yes.  To stop user access to data center A - take away the network.
  • Now that you failed over to your backup site or systems, make sure from a hardware perspective you can see everything. 
  • Bring up the applications and have the test users ready to start using the applications.
  • Don't forget to test print, EDI, and those other important transactions. 
  • Document your test as you go along.  Detail out what needs improvement, clarity, or re-write.
  • If your test failed - don't take it personal.  If you gained insight, lessons learned, you hit a home run.  Take all those notes and start fixing the issues one at a time.  Better to find that it does not work now than in a real crisis. 
Have a post test gathering at the end of the day.  Keep it positive, what we learned, what are next steps, and thank everyone.  I usually have thank you cards made up ahead of time with a little gift card in there from Applebee's or something as a way of thanks.  Your next job is to write up an executive report to management on the results of the test and what are the next steps to improve.  They will want to know when the next test will be, so be prepared to address that. 

I hope that helps.  One of your biggest challenges will be data if you have to recover from tape or drives.  If you don't have an archive policy in place, some of these large databases will be a challenge to accomplish in your agreed upon RTO.  Your test will help the organization understand that.

Keep it positive!

Scott Arnett
scott.arnett@charter.net

No comments:

Post a Comment