Monday, August 22, 2011

RCA Algorithm / Process

Last few weeks I have been working with a large client engagement where we have several projects in development. The teams keep on adding major & minor releases almost every month as the projects are on a tight schedule to beat competition in introducing new features to their apps. This has become an issue for the support teams with multiple issues cropping up across many applications around the same time and the teams running after me (I being a Technology Leader - Principal Architect for SBU consisting of 8 - 10 large client accounts.) to help them with the Root Cause Analysis - RCA . I being a mere mortal have only 1 processor (brain) and can focus at most at only one issue at a time. Focusing here means looking at the code, understanding the design, deployment etc in depth. Growing frustrated with people running after me simultaneously to borrow an extra pair of eyes i.e. outsider perspective to look at their problem, I designed a simple process for RCA as I found out that many times the teams would see an error / exception in the code and run after me straight  for help without first doing thorough analysis on their own. So I offered the process below:

1) List down all the possible causes in descending order of probability.

(My words, "Do you have the list of possible causes? No, then go back and get me the list and explain why you think these could be the causes and the probability of each.")

2) Go on eliminating all the causes one after another starting at the top possible cause.

(My words, "Have you eliminated all the causes? Please explain how for each.")

3) In the attempt of eliminating the possible causes, root cause would be found many times. If the possible causes were listed sensibly.

4) If not, add more possible causes to your list and go back to step 1.

(My words, "Add more possible causes to you list and continue")

After that I observed that I need not go and look at every issue and the teams were doing their jobs well. Of course I did not tell them that they were working on the algorithm above, I just kept on asking them one thing after another. I could have even programmed some smart robots to do this, if such robots were possible given the contemporary AI technologies. But they(team) still get back to me after each step above and expect me to intervene but I just review the activities in each step and ask them to provide me the data on the next step. They then go back to work and meanwhile I can peacefully focus on one issue that I chose to work on. Generally I choose the issue that has the highest priority based on the value of the application asset,  risk and other implications.

No comments:

Post a Comment