Human Error Multipliers
By: George Spafford
Studies show that up to 80% of network availability incidents can be tied to human error. In addition, the fourth annual CompTIA study on security breaches shows that 60% can be attributed to human error. With statistics proving over and over that human error should be of concern, it is a wonder that more attention is not paid to managing it. In fact, there are a number of behaviors that can dramatically increase the odds of human error yet organizations fail to manage them.
All services contain some element of human interaction and thus some level of inherent variation. It may be introduced at any point during the life cycle from a wide range of vectors including development, operations, vendors, users, etc. Moving past the inherent baseline that can not be eliminated, additional levels of human error-related variation can be injected into challenged organizations. The following can all cause the level of human error in organizations to increase and thus put the attainment of goals and objectives at risk:
Increased Complexity – As the volume of systems, variety, integration and coupling increases, so to does the inherent complexity of the environment. This causes a situation wherein a significant amount of detailed knowledge around services rendered is distributed and the impacts of proposed changes are largely unknown. As a result, the likelihood of a change negatively impacting confidentiality, integrity or availability increases.
Operating Under Tight Deadlines – As the level of pressure to complete work increases there reaches a point where the emphasis may shift to “just get it done” wherein appropriate controls are bypassed in favor of completing work. As a result, mistakes are made and not caught. Standards are not followed and variation increases. Fatigue and stress levels increase and so on.
Human Fatigue – Studies have clearly tied fatigue with increases in human error. As people begin to perform without sufficient rest, the likelihood of errors increases. Expecting staff to perform without error despite working long hours is unrealistic.
Task Switching – A person split between a given number of tasks is likely to make mistakes due to shifts in concentration and delays between actions. It is a falsehood to think that a three tasks requiring a third of a full-time equivalent each can be handled by one person. As the number of tasks increases, the likelihood of error increases.
Insufficient Planning – Projects that invest the time and resources in planning prior to commencing work are far more likely to deliver on time and within budget. Failure to adequately plan may cause budget and schedule pressures to arise thus causing personnel to rush, work long hours, and bypass standard policies and procedures.
Insufficient Testing – When project schedules and/or budgets are at risk, one of the first areas to suffer is testing. As a result, the risk that human errors will not be caught prior to production increases.
Lack of Change Management – Human error is introduced via changes to production systems. When changes are not properly managed then risks to production and the business increases.
Development on Production Systems – Changes can and do fail. If development is allowed to change a production system directly then the odds of human error negatively impacting the organization increases.
Functional Silos –When functional areas are allowed to design services without the enterprise’s interests taken into account, then the level of variation and complexity in the environment will increase.
Inability to Criticize – In organizations where review and constructive criticism are stifled then the levels of unplanned reactive activities will only increase. Review should be designed into formal change management processes.
Lack of Communication – When modifications to systems are planned in isolation then the chances of dependencies causing incidents increases.
Lack of Documentation – When complex systems are not documented then it becomes increasingly difficult to train new people, understand the potential impacts of changes, etc.
Lack of Standards – As variation increases, the more people must try to learn and memorize increases. For example, it is easier to gain deep knowledge of three platforms versus 30. Similarly, for several processes versus differences between every employee.
Lack of Shared Objectives – If the objective for doing something isn’t clearly articulated and understood then the chances of individuals drifting from the intended objective increases.
Lack of Training – If people are not adequately trained on a new service, or specific system, then how can they possibly operate or support it without introducing errors?
Lack of Understanding Causality – When groups do not understand historical outcomes and formally track cause and effect then how can the culture evolve and risk behaviors be avoided?
Lack of Control and Process Knowledge – IT has long focused on technology to solve problems. Now, to enable the attainment of functional area objectives and organizational goals in a sustainable manner then proper control and process design must be coupled with the right people and technology. Without proper controls and processes, then risks from human error and other vectors will only increase.
The above is a partial list intended to invoke discussion. What we have witnessed during consulting engagements is that some organizations may have multiple behaviors that when combined further increase risk levels. Organizations must take a careful look at their culture and processes to understand and subsequently manage the level of human error being introduced. If we want to help safeguard the organization and its goals, then it is essential to understand what causes human error levels to increase and correspondingly, what can be done to reduce those levels.