Module 5: Proactive Processes

Problem Management Overview

The goal of problem management is to minimise both the number and severity of incidents and problems in your school. It should aim to reduce the adverse impact of incidents and problems that are caused by errors within the ICT infrastructure, and to prevent recurrence of incidents related to these errors.

It is important to note that incident management works best with a service desk or single point of contact. While some of the methods employed within incident management will be beneficial even if you do not have a service desk, the full benefits will only be seen by the use of a single point of contact and central management.

What is Problem Management?

Problem management is the process of repairing hardware and software faults following the implementation of work-around solutions. In this respect it is closely related to FITS Incident Management. It is also concerned with the prevention of incidents through network monitoring and preventative maintenance (see FITS availability and capacity management).

The goal of problem management is to minimise both the number and severity of incidents and problems in your school. It should aim to reduce the adverse impact of incidents and problems that are caused by errors in the ICT infrastructure, and prevent recurrence of incidents related to these errors.

You should address problems in priority order, starting with those problems that can cause serious disruption. The degree of management and planning required is greater than that needed for incident control, where the objective is restoration of normal service as quickly as possible. The function of problem management is to ensure that incident information is documented in such a way that it is readily available to all technical support staff.

Problem management has reactive and proactive aspects:

Reactive

Problem solving when one or more incidents occur

Proactive

Identifying and solving problems and known errors before incidents occur in the first place.

Problem management therefore includes:
  • problem control, which includes advice on the best work-around available for that problem error control.

Problem management is distinct from incident management in that it can be carried out over time to ensure that a fully considered solution is implemented. Incident management focuses on fast, often temporary, solutions to satisfy end-users' immediate needs. Problem management, by definition, means a change in some way – whether it’s a hardware component replacement, a software patch or full software upgrade. A problem, therefore, will always need to have an associated request for change approved before it can be resolved (see FITS change management for further information).

Example

An end-user reports that their computer has 'frozen' and they are unable to continue working. Investigation quickly establishes that the amount of memory in the machine is insufficient to support the number of applications currently open. This incident is resolved by closing all applications and restarting the machine, to release the maximum amount of memory available.

The user is advised to keep the number of applications open to a minimum in the short term and normal service is resumed. To prevent recurrence in the long-term, however, the underlying problem of insufficient memory must be addressed. This problem is recorded separately and a memory upgrade is ordered and installed using a request for change form, which is approved by the appropriate authority.

All schools should have a process to deal with major incidents – for example, a server crash, a virus attack or an unexplained loss of speed in the network. If you would like to manage your approach to major incidents, you should also consider introducing Problem Management at your school.

Most organisations, including schools, need to keep records of how well their ICT systems are functioning, what is failing and how long systems are unavailable. The information you will gain from problem management should enable you to report to the school on the technical issues that create incidents and problems. To provide your school with an effective approach to its technical support, you should always implement problem management alongside incident management.

The difference between incidents and problems

An incident is where an error occurs, and something doesn't work the way it is expected. It may be referred to as:

  • a fault
  • an error
  • a problem
  • but the term used in FITS is 'incident'.

A problem can be:

  • the occurrence of the same incident many times
  • an incident that affects many users
  • the result of network diagnostics revealing that some systems are not operating in the expected way.

Therefore a problem can exist without having immediate impact on the users, whereas incidents are usually more visible and the impact on the user is more immediate.

Error control

Error control covers the processes involved in the successful correction of known errors. The objective is to remove equipment with known errors that affects the ICT infrastructure in order to prevent the recurrence of incidents. Error control activities can be both reactive and proactive.

Reactive activities include:

  • identifying known errors through incident management
  • implementing a work-around.

Proactive activities include:

  • finding a solution to a recurring problem
  • creating a solution
  • including the solution in the database of known errors.

Examples of problems

Technical problems can exist without impact to the user. However, if they are not spotted and dealt with before an incident occurs they can have a major impact on the availability of the computer service.

User-experienced problems

  • The printer will not form-feed paper, so users have to advance the paper by using the form-feed button.
  • Each time a new user logs onto a computer, they have to reinstall the printer driver.
  • Windows applications crash intermittently without an error message. The computer will restart and work properly afterwards.

Technical problems

  • Disk space usage is erratic. Sometimes there appears to be plenty of disk space but at other times not much is available. There is no obvious reason and no impact on the users – yet!
  • A network card is creating a high level of unnecessary traffic on the network. This could eventually reduce the bandwidth available, which would lead to a slow response to network requests.

Differences between incident management and problem management

  • The aim of incident management is to restore the service to the customer as quickly as possible, often through a work-around, rather than through trying to find a permanent solution.
  • The main goal of problem management is the detection of the underlying causes of an incident and the best resolution and prevention.

In many situations the goals of problem management can be in direct conflict with the goals of incident management. Deciding on which approach to take requires careful consideration, and can call for self-discipline, as the need to implement a permanent fix for an incident is always likely to prevail. A sensible approach would be to restore the service as quickly as possible (incident management), but ensuring that all details are recorded, to enable problem management to continue once a work-around has been implemented.

Why use problem management?

Problem management is a proactive process. The more proactive technical support is the fewer incidents that should occur. Some of the benefits of problem management are that:

  • incidents do not recur because underlying problems are resolved
  • incidents can be prevented through pre-emptive work
  • overall availability of ICT services can be improved, through the elimination of unreliable equipment identified through trend analysis
  • technical support can plan its work more effectively and stick to planned schedules, because the number of unpredictable incidents has been reduced
  • end-users can rely on the system and equipment being available
  • technical support can be more proactive.

What happens if you don't use problem management? Without problem management, you may observe that your school:

  • faces up to problems only after the service to users has already been disrupted
  • loses faith in the quality of its technical support, with high costs and low motivation for both users and technicians, since similar incidents have to be resolved repeatedly without anyone able to provide permanent solutions.

Recording problem management information

One function of problem management is to ensure the documenting of incident information in such a way that it is readily available to service desk staff and technicians. The information should be recorded so that it is easily referenced by simple and detectable triggers from new incidents.

If you are unable to determine accurately the impact on the school of incidents and problems, you will not be in a position to give critical incidents and problems the correct priority.

It is important that you review your process for recording incidents and problems, so you can implement problem management. Unless you have a good incident control process, you will not have the detailed historical data on incidents which will help you identify that they have become problems.

Any failure to set aside time to build and update the call log or incident sheets will restrict the benefit of understanding the bigger picture on the network and looking at trends that may point towards an underlying problem. All incident reports must come through the service desk and not direct to the technician. Difficulties will arise if the service desk is dealing with multiple reports of incidents and the technician is not fully aware of the extent of the problem.

Problem management also needs to be closely co-ordinated with incident management. Failure to link incident records with problem/error records will mean you are unlikely to gain many of the potential benefits. This is a key feature in moving from reactive support to a more planned and proactive support approach.

    How does problem management work?

    Problem management works by taking inputs from the incident management process and the availability and capacity management process when problems have been identified that require further investigation, physical repair or change. A standard form is used to record problems, so that consistent information is gathered each time. A log is kept centrally so that problems can be tracked through their progress.

    The steps for recording and resolving a problem and reporting problem trends are very similar to the incident management steps, and the service desk can be used to co-ordinate them.

    The problem management implementation guide takes you through each step, as illustrated in the flowchart below.

    Summary of the Problem Management process

    Inputs to Problem Management
    Inputs to the Problem Management process are:

    • incident details from the Incident Management process
    • configuration details from the configuration-management database
    • details about changes made to the part of the network with the problem
    • any defined work-arounds (from incident management).

    Output from Problem Management
    Outputs from the Problem Management process are:

    • known errors
    • requests for change (through change management)
    • an updated problem record (including a solution and/or any available work-arounds)
    • for a resolved problem, a closed problem record
    • knowledge base content to use in incident management
    • management information through reports.

    Activities of Problem Management
    The major activities of Problem Management are:

    • problem control
    • error control
    • the proactive prevention of problems
    • identifying trends
    • obtaining management information from problem management data
    • the completion of major incident or problem reviews.

    Roles and responsibilities in Problem Management

    • Service desk to note on the incident sheet that the problem has been passed to problem management
    • Service desk to log, monitor and track the progress of the problem
    • Service desk or technician to spot trends
    • Technician support to action problems raised from incident management
    • Technician support to progress unresolved incidents through the problem management process
    • Technician assisting with the handling of major incidents and identifying the root causes
    • Technician preventing the replication of problems across multiple systems
    • Any additional first-line support groups, such as configuration management or change management specialists to be consulted
    • Second-line and third-line support groups, including specialist support groups and external suppliers
    • User to keep the service desk informed of any further changes to the state of the affected equipment (sometimes computers start working again when different incidents are resolved)

    Additional functions that form part of Problem Management

    • Developing and maintaining the problem control process
    • Reviewing the efficiency and effectiveness of the problem control process
    • Producing management information
    • Allocating resources for the support effort
    • Monitoring the effectiveness of error control and making recommendations for improving it
    • Developing and maintaining problem and error control systems
    • Reviewing the efficiency and effectiveness of proactive Problem Management activities  
    • The scale depends on the time required for the Problem Management process.

    Problem Management life cycle

    What does problem management cost?

    Like incident management, the cost of problem management overlaps with the costs of FITS Service Desk. The problem management costs already covered there are:

    • the cost of a problem management tool
    • the cost of staff to maintain the problem log.

    In addition to these is the cost of technical staff required to identify, log, diagnose and resolve problems and the time they spend doing so. How much time this takes will depend on the volume of problems in your school.

    As with FITS Service Desk and FITS Incident Management, we recommend that you begin your problem management process using the tools we have provided. The implementation guide refers to the templates as you need them, and we have also grouped them together in the Toolkit.

    When you are ready to automate your problem and incident (see FITS Incident Management) logging, you can choose from a variety of tools ranging from the cost-free to the very expensive. At the expensive end of the scale, expect products to be suitable for global commercial organisations and probably more than you need.

    The time needed for the implementation of problem management can be estimated from the table of activities below.

    Activity

    Example

    Further Information

    Preparing for implementation

    Discussions, planning

    Problem management implementation guide

    Implementation

    Training, pilot, actual implementation

    Problem management implementation guide

    Review of implementation

    Difficulties with process or roles

    Problem management implementation guide and process review

    Identifying and logging problems

    Diagnosing the cause of incidents, identifying problems detected during network monitoring, preventative maintenance and trend analysis, creating problem records

    Problem management implementation guide

    Maintaining problem records

    Updating the problem log

    Problem management implementation guide

    Resolving problems

    Investigating, diagnosing and implementing solutions to problems, preparing request for change forms (see FITS Change Management)

    Problem management operations guide

    Continuous improvement

    Monitoring effectiveness of process, improving efficiency

    Problem management continuous improvement

    Members Only Content - Please LOGIN OR purchase below

    FITS Member

    This content is for members only.  Please purchase below to get instant access.

    Special Limited Time Offer

    Get full member access for only £4.95/m

    We are currently offering full access to the members area for a very special rate.

    Some content on this website is provided under the provisions of the Open Government License. 
    All other content including, but not restricted to, website design, images logos, etc. 

    Copyright © 2020 - FITSEd. All Rights Reserved

    Page Created with OptimizePress