Actions

ITIL Problem Management

Revision as of 14:46, 29 January 2021 by User (talk | contribs)

Problem Management is an IT service management process tasked with managing the life cycle of underlying "Problems." Success is achieved by quickly detecting and providing solutions or workarounds to Problems in order to minimize impact on the organization and prevent recurrence. Problem Management also attempts to find the error in the IT infrastructure that is causing the Problem and contributing to the Incidents that users may have. The IT Infrastructure Library (ITIL) provides the following definitions for usage within this process:

  • Problem: “The cause of one or more Incidents. The cause is not usually known at the time a Problem record is created"
  • Error: “A design flaw or malfunction that causes a failure of one or more IT services or other configuration items”
  • Known Error: “A Problem that has a documented root cause and workaround”
  • Root Cause: “The underlying or original cause of an incident or problem”.[1]

{TOC}

Phases of Problem Management[2]

Problem management involves three distinct phases:


Phases of Problem Management


  • Problem Identification: Problem identification activities identify and log problems by:
    • Performing trend analysis of incident records.
    • Detecting duplicate and recurring issues.
    • During major incident management, identifying a risk that an incident could recur.
    • Analyzing information received from suppliers and partners.
    • Analyzing information received from internal software developers, test teams, and project teams.
  • Problem Control: Problem control activities include problem analysis and documenting workarounds and known errors. Just like incidents, problems will be prioritized based on the risk they pose in terms of probability and impact to services. Focus should be given to problems that have highest risk to services and service management. When analysing incidents, it is important to remember that they may have interrelated causes, which may have complex relationships. Therefore problem analysis should have a holistic approach considering all contributory causes such as those that caused the incident to happen, made the incident worse, or even prolonged the incident. When a problem cannot be resolved quickly, it is often useful to find and document a workaround for future incidents, based on an understanding of the problem. A workaround is defined as a solution that reduces or eliminates the impact or probability of an incident or problem for which a full resolution is not yet available. An example of a workaround could be restarting services in an application, or failover to secondary equipment. Workarounds are documented in problem records, and this can be done at any stage without necessarily having to wait for analysis to be complete. However, if a workaround has been documented early in problem control, then this should be reviewed and improved after problem analysis has been completed. An effective incident workaround can become a permanent way of dealing with some problems, where resolution of the problem is not viable or cost-effective. If this is the case, then the problem remains in the known error status, and the documented workaround is applied when related incidents occur. Every documented workaround should include a clear definition of the symptoms and context to which it applies. Workarounds may be automated for greater efficiency and faster application.
  • Error Control: Error control activities manage known errors, and may enable the identification of potential permanent solutions. Where a permanent solution requires change control, this has to be analysed from the perspective of cost, risk and benefits. Error control also regularly re-assesses the status of known errors that have not been resolved, taking account of the overall impact on customers and/or service availability, and the cost of permanent resolutions, and effectiveness of workarounds. The effectiveness of workarounds should be evaluated each time a workaround is used, as the workaround may be improved based on the assessment.
  1. Definition - What Does Problem Management Mean in ITIL? Cherwell
  2. The 3 Phases of Problem Management BMC