Saturday, April 21, 2012

NotesOn: Risk Management - Disaster Recovery & Business ...

Introduction (V1.0):

The subject of differences between Disaster Recovery and Business Continuity came up again.? During the discussion another idea came to me on how to graphically represent the two concepts, one that looks at the matter from another facet of the same jewel, as originally presented in NotesOn:? Risk Management ? Disaster Recovery versus Business Continuity.? I sincerely hope this helps ?the cause?.

?

Background:

As I?ve mentioned elsewhere, Disaster Recovery (DR) and Business Continuity (BC) are essential concepts to the current and future survival of a company, if you own or manage a company of any size then your company.? On the surface, these two subjects seem as if they should be pretty straightforward; wholly comprehensible.? But. ?Somehow.? Levels of complexity have been added and continue to be added to the point where some sources of information make them all but in-comprehensible.? As proof,? questions continue to arise around both topics. ?So I am building on the first four posts in this series (listed in reverse order of release below) in the hopes this approach will provide the ?ahhh-hah!? moment for those who haven?t quite ?gotten it? yet.

NotesOn:? Risk Management ? Cumulative Recovery Time Objective

NotesOn:? Risk Management ? Disaster Recovery & Business Continuity Definitions

NotesOn:? Risk Management ? Disaster Recovery & Business Continuity Essentials

NotesOn:? Risk Management ? Disaster Recovery versus Business Continuity

?

A Basic Business Workflow Diagram:

To set the stage for this discussion I?m going to bring forward a graphic that I included in a post written in May of 2010 entitled:?? NotesOn:? IT Fundamentals ? Simple Defined.? It was designed to demonstrate the basic processes, the essential flows of a Distribution company:

Distributor Business High Level IT Workflow Diagram

Distributor Business High Level IT Workflow Diagram - Click on image to enlarge

It has a different purpose in this post which is to provide a visual reference point as we discuss how you break down a company?s business into processes and the systems that support them.

Side-Bar:? ?I am also including this graphic to demonstrate the concept learned way back in my architecture and engineering days:? ?If you can?t draw it in two dimensions you can?t build it in three?.

Understanding this rule is crucial to someone doing the BIA?s that lead to Business Continuity Plans (BCP?s), and/or the DIA?s (Disaster Impact Analyses) that lead to Disaster Recovery Plans (DRPs). ?Simply put:? if you can?t draw it, you ?don?t got it?.? It is that simple.? To make sure I ?got it? I make a habit of diagramming the business process(s) when doing BIA?s, and, while architecting the DR system, how it fails over from the Production system; and anything else that doesn?t make sense until I it put down on paper visually.

There is a corollary to the above, that also applies to BC and DR team members, which goes something like this:? ?If you can?t draw in two dimensions what is already built in three, then there is something wrong, something missing, or at least one thing about it that is not understood.?

?

The BC-DR BIA-DIA Interaction Diagram:

The next graphic, and this is the key one for this post, is my new way to look at the entire BC-DR / BIA-DIA set of procedures and their various inter-relationships.? It is an interaction model and with it I am attempting to convey all of the key, critical, steps involved in properly, and fully, increasing the resilience of a business.

Now, I will warn you ahead of time that this is one of the more complex diagrams I have used in my posts.? There is a lot going on.? So, first, please take your time studying it (don?t forget to click on the image to enlarge it if you are viewing this on the web.? When you?re done we will take it apart, section by section.? Okay?

Here is the ?BC-DR BIA-DIA Interaction Model?:

BC-DR--BIA-DIA Interaction Model-V1-0

Click on the image to enlarge

Note:? while graphic images are wonderful and many folks comprehend concepts more easily using them, a similar graphical effort, for the entirety of a business, could easily end up being wall-sized, so more usually a spreadsheet in some form is used.

A typical BC-DR status spreadsheet has all organizational units (often no lower than department level) across one axis, and all identified-as-important functions/business processes across the other, with relevant BC and DR status information at the nexus of each row and column (or sub-row and/or sub-column).? ?But.

Even using a non-graphic method, for larger to large companies the display will grow rather large rather quickly; perhaps to a 72?, or larger, HD Plasma TV Screen size, with small fonts?? The benefit of chewing up wall space is that, if done cleverly, and cleanly, management will obtain a clear picture of what is most important to their survival and, more importantly, what business processes, or business unit(s), are at greatest risk of failing post a DR event.

(Trust me, every company which has not gone through this process has one or more weak links that can and will bring the company down.? Not doing BC and DR is akin to not obtaining proper medical care for a known, potentially fatal, ailment.)

Let?s now re-focus back on the above interaction diagram and how it can benefit your BC and DR efforts; after all we?re really here to clarify the differences between the two activities and learn something about how they must yet interact to make the entire BC-DR effort successful.? Ready?

?

Building Consensus One Graphic Section at a Time:

A Business Continuity Primary Function

Critical Business Process Information Diagram

One of the most important tasks, one of the earliest steps of a BC team is to accurately identify the key business functions within the group which they are surveying, i.e. the focus point of their Business Impact Analysis (BIA).? These critical functions make up the heart of their Business Continuity Plan (BCP) for that area; all other data gathered revolves around and relates back to those functions.? If you ever see non-business-process oriented BIA?s being done your time (and money) is being wasted.? Let?s move on.

This first diagram section is an effort to demonstrate the gathering of critical business process information.

You will have noted that in my neatly arranged diagram I have three critical processes in each business unit.? Of course the reality could be quite different, ranging from none (that is most certainly possible) to dozens.

Now this is not meant to be a detailed treatise (a learned thesis or exhaustive dissertation) on doing a BIA, that will have to wait for another time, so the first important thing to note is, and I know I am repeatedly beating the same drum, that the focus of the BIA is on the business process(es) not the design of and never the implementation of the IT disaster recovery systems that support it/them.

You may also have noted that the ?BC / BIA Product? bracket stops at the bottom of the IT Systems Frequency List.? There is a reason for this.

More than one beginner BC team has gotten mired down in such low level detail that they never get their own job done.? Leave IT DR Systems to IT and their DR team, the BC team must stay focused on the business processes and the resources and manual processes necessary to keep each business unit (BU) going during and post a DR event.

However, what the BC team members must collect for each critical business process for each BU, is the Recovery Time Objective and the Recovery Point Objective ? for each process.? Not for the IT systems that support each process, but the RTO and RPO for each process.? There is a big difference.? The two may correlate, or they may not, but if you assume they are the same going in, or never get the values for the process you may end up drawing assumptions that are not accurate, or relevant.

My good friend Chris Branch mentioned a while ago that he knows this set of values as M.A.D. or Maximum Allowable Downtime.? Whatever you know it as, whatever you call it, what you need to find out is (a) how long can the BU survive without their automated process being, well, automated.? And (b) how much of that process?s data, what the process requires, can the business unit afford to lose.

In the early stages of the BIA to BCP development process in particular, you don?t care what system or systems keep the process in question running smoothly (it?s possible one or more may not even be IT systems) and you certainly don?t care what system or systems stores the data.? You just need to know what the pain point, the breaking point, for each process is after which the BU will be in trouble.

Please.? Don?t carry it into any more detail than that.

If the BU folks tell you ?we need our invoicing data back in ____? or ?our inventory system has to be up in ____?, or whatever, and if we lose even ?one invoice? or ?one bill of materials list?, etc., then you know the service level requirements for that process.? Another BU?s requirements may be different, you?ll see examples of this in the diagram, but you know that BU?s RTO and RPO requirements and that is vitally important.

If one BU says their processes must be back up in ?twenty-four hours max? and another says ?we can survive for a week? and they use common IT system someone has to level set that ? but later.? Again, initially, you don?t ask or care what IT systems are used to support each process.

Only after you understand the process from front to back and can draw it in two dimensions do you ask about IT, or other, systems, at a high level, by name only.? Do not go digging down into the weeds and become wrapped up in performance specifications or operating systems or failover types or disaster recovery steps or ?? leave that to IT?s DR team and their DIA?s.

?

Where BC and DR Cross Over:

BC and DR Cross-Over Point DiagramWith the above said (and it can be said many more times) there is a point where the BC and DR teams meet, where a slight cross-over occurs.? I?ve touched on it above but it bears repeating.

Once the BC team has a good feel for how the various business processes work in that business unit and what they accomplish, once the BC team has them prioritized and fairly well mapped out, one of the questions, not the most important question for them but one of the questions they should ask is:

?What IT systems support that business process??

Don?t be surprised if you don?t get a complete list of all systems that help to keep the automated version of the business process running but you should be able to compile, through various interviews, a fairly comprehensive list.

The systems I listed above aren?t, of course, very helpful, their names are over simplified and we have no idea what processes they support.? But the key point of the above portion of the diagram is that you want to keep track of the frequency of use of these systems both across that BU and across all BU?s.

This is important, make that vital, data and it forms the central meeting ground between the BC and DR teams.

Now I have stated elsewhere that the DR team can, and often does, start ?DR?ing? IT systems without any input from the BC team.? They can, and do, build a Tier 1 (highest priority) systems list based solely on prior business experience (starting with the must hurtful pain points) and on institutional knowledge.? It may not be the ideal way to create a first cut of a Tier 1 list but, still and all, it is likely to be fairly accurate and it gets ?DR stuff? started.

Then.?? As the BC team completes enough of its BIA?s.? The Tier 1 systems lists are evaluated and synced up.

?

Where BC and DR Should Not Cross Over:

BC and DR non-Cross-Over Point DiagramIf you go any further than as described above you are (a) unnecessarily overloading what should already be a full workload and (b) you are digging down into the weeds where there is no reason to go ? because the DR team is going to have to go there anyway.

Let?s take a different slice of the diagram and compare the two elements:

You will notice, again, the RTO/RPO designations at the top.? To be intentionally redundant these are process oriented values.

Then we have the three IT systems that support Critical BP #1 lined out below it.? There may be other ancillary systems (such as email or a word processer, etc.) but it has been determined that these three systems are vital to the recovery of that business process.

At the bottom of the slice we have three most critical DR team acquired values:

RTO (Recovery Time Objective) records the time the IT system should be back up in order to keep the business process viable.? Typically specified in hours or days.

RPO (Recovery Point Objective) records the amount of data the business unit can afford to lose, i.e. that they can re-create if they have to.? Typically specified in hours or days, or rarely a point in time such as ?At least make sure we have Month End backed up.?

RSL% (Recovery Service Level) records the performance characteristics of IT?s DR system, as compared to the Production system.? Keeping in mind that during a DR event there are likely less people on the system, and less demands on the system, the DR version can be ?slower?.? How much slower is the question.? 50%?? 70%? Or it must match Production in all ways, so 100%?

All three questions are answered by the BU folks during a DIA (Disaster Impact Analysis) as all three are system oriented questions; again, asked by IT?s DR team so the correct answer is obtained relative to that system.

[Note:? the DR team must be sure to ask for these answers as a business analyst would not an IT techie, but with an IT techie?s level of understanding.]

Keep in mind (as a very good rule to remember) that the more stringent the RTO, RPO and RSL% specifications are the more expensive the DR system will likely be; possibly causing Production side upgrades as well.? This is why IT has to be involved at this detail level, it is why I created the DIA.

IT must agree, or not, to provide the requested degree of resiliency.? The BU may pay for the final DR solution, but IT has to both commit to building it and be able to build it.? And since IT clearly has to be involved there is no sense in both the BC teams and the DR teams doing the same jobs (no offense to anyone, but BC trained folks are rarely thoroughly IT trained as well, and vice versa).

?

The Next DR Team Steps:

DR Team Next Steps DiagramOnce the DIA is complete (or, in an ideal world the BIA for the BU and all relevant DIAs) the next step for the DR team is to sit down and design the DR system and its connections to its Production system.

This is done, very methodically, with the network folks, the operating system folks, the database administration folks, the file storage folks, the security folks, the IT audit folks, the vendor folks if they are involved ? until a coherent, affordable, workable design is achieved and accepted by the various layers of managers and executives and business users.

Then comes the really fun part of the project, building it out ? preferably in a remote datacenter far far away from where Production is located ? followed by testing it (we?ll address the details of the types of testing another time).

Finally, as the DR design is being implemented, as the entire system is being built out and tested, the team is also working on the Disaster Recovery Plan (DRP).? If the entire team is smart, and they usually are, they will document the build out and document the test steps and in so doing they will have the meat of the DRP at least half written for that system ? before they officially start the DRP.

?

What does a DRP look like, you might ask?? I?ll do another post on that (and, yes, on DIA?s too) in the not too distant future because that subject is beyond the scope of this post.? But.? In very brief form.? A DRP consists of at least the following elements:

  • Who owns it ? the entire hierarchy, not just the one person at the bottom of the organization chart
  • The scope of the system ? what BUs and sub-BUs does the system have an impact on
  • An overview of the System, the DR Strategy and at least the key Service Level Agreement (SLA) requirements ? RTO, RPO and RSL%
  • Contact Information for anyone who would be/could be connected with the execution of the DRP
  • Skill Levels required to support the DRP
  • The DR System?s Up- and Down-stream dependencies
  • A method to assess the post-DR-event state of both the Production and DR system components
  • The recovery steps
  • The post recovery steps

Summary:

It takes a fair amount of work to put together competent, usable (in the midst of chaos) BCPs and DRPs so, from a very practical standpoint, it is necessary to ensure the investment in time and resources and cold hard cash is worth it to the Business.? One very good way to ensure that is to have a joint BCP-DRP / BIA-DIA methodology that will prove out, or not, that the effort being considered is worth everything you and your company will put into it.

Hope this helps,

DP Harshman

PDF Link

bedlam bedlam cotto vs margarito 2 cotto vs margarito cotto vs margarito miguel cotto cotto

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.