Lessons learnt from recent emergencies and blackout incidents

October 15th, 2015, Published in Articles: Energize


Large disturbances which interrupt supply to customers over wide areas occur every now and then despite best practices and efforts in planning and operating the power grid. These disturbances are usually unpredictable, and they often result in interruptions to customers and damage to power system infrastructure followed by a lengthy restoration process, with serious social impact and loss in business productivity activities.

In recent years, Large Disturbance Workshops were held during the Cigré Paris Sessions to share information on causes and impacts of large disturbances, and a transmission system operator’s (TSO’s) experience in managing the disturbances. Owing to the limited time allocated to each event, the information is often presented at a very high level.

To investigate the main causes of large disturbances in more depth and to develop lessons learnt in greater detail, a separate, focused exercise is needed to fully analyse these events with an objective to identify trends and develop possible remedies. Working Group (WG) C2.21 has completed a series of event analyses to pursue this objective.

This article presents the key findings of the working group, focusing on the main causes and possible remedies to prevent the occurrence or mitigate and minimise the impact of large disturbances, and an assessment of control centres’ performance in managing these disturbances.

Selection of events for analysis

On a global basis, system disturbances of various scales occur quite frequently and therefore it is neither practical nor beneficial to analyse all events. A set of criteria to filter out the events which have a wide area impact and which would require prompt actions by the TSOs serve to facilitate the more relevant work activities necessary for meeting the objectives set forth in the group’s terms of reference. The criteria used for filtering purpose are:

Emergency state events

For the purpose of this study a system is deemed to be in an emergency state if any of the following circumstances (events) arise:

  • Unplanned and/or unforeseeable widespread automatic disconnection of load due to a capacity shortage or transmission facility failure.
  • Widespread disconnection of load executed by, or under the instruction of, the system operator or under-frequency load shedding program in order to maintain system stability and security.
  • During extreme weather events that have caused, or are likely to cause, the system to enter a non-secure state.
  • An actual or anticipated emergency state deemed necessary or declared by system operators not involving load disconnection.

Near-miss events

A near miss is classified as an event or period of time where the system was operated under any of the following conditions but did not enter an emergency state as defined above:

The system operated outside of the defined security standard including voltage maximums and minimums. For example, if the system is designed to operate securely under N-2 (i.e., beyond planning and operating criteria) then the loss of three components which does not result in the system entering an emergency state, is classified as a near miss:

  • An event which results in the cascade tripping of plant caused by the incorrect operation of protection devices.
  • An event which results in sustained oscillation or the cascade tripping of multiple generators but does not result in the system entering an emergency state.
  • An over-frequency event which results in the tripping of one or more generators.
  • An under-frequency event which results in the additional tripping of one or more generators.
  • A minimum load event where the system operator had to order a generator to carry out a rapid plant shut-down in order to maintain overall system security.
  • An unplanned or unexpected system split which results in two stable operating systems (not caused by the deliberate action of a special protection scheme designed to split the system).
  • An event which results in the cascade tripping of transmission lines (by overload or by loss of angular stability) caused by the incorrect or unintended operation of a special protection scheme without unexpected disconnection of load.

Grouping of main causes in categories

At the onset, there appeared to be quite a large number of possible causes of events. To help to identify the trends and main causes and to develop key measures to prevent, mitigate and minimise the impacts of events, it was necessary to group possible causes into a manageable set of categories. This grouping exercise resulted in the following eleven categories:

  • Primary equipment failure
  • Design and application error
  • Secondary equipment failure
  • Communication and/or control system failures
  • Natural phenomena – beyond those designed for
  • Operator errors
  • Errors in maintenance
  • Security related
  • Inadequate investment and complexity of systems (lack of mitigating measures)
  • Excessive risk taking and/or inappropriate risk management
  • Others (not covered by the above, to be specified)

The group recognised that it was necessary to analyse a rather large number of events in order to identify common patterns and to propose key measures to mitigate them. Hence, it was necessary that the analysis tasks be shared among working group members. To achieve some degree of consistency among members in reporting the impacts and assessing the main causes of the events, the group developed an event analysis template for use by members assigned to the event analysis tasks.

Disturbances and near-misses analysed, and their main causes

Applying the established criteria for event analysis, the working group reviewed and identified a number of recent events for inclusion in the analysis task. The list of candidates included a number of events presented at the Cigré Large Disturbance Workshop in 2008 (Group A) and 2010 (Group B), and several others which were either of great significance or those that occurred during the course of the group’s analysis task.

Table 1: Summary of events analysed.
Event Area affected Number of
customers affected
Main causes Restoration time
UCTE system disturbances,
4 November 2006
3,4-million km² 15-million Design/application error; communication failure; operator error 90 minutes
Large disturbance in Tepco, Japan,
16 July 2007
N/A None Natural phenomena N/A

Inadequate reserve margin,

South Africa, 26 February 2008

Entire country All Inadequate investment N/A
Large disturbance in Florida, USA,
26 February 2008
Large part of Florida Not available Errors in maintenance Not available
Frequency excursion, UK, 27 May 2008 Not available 600 000 Lack of co-ordination between transmission and distribution grid codes on generator ride through requirements 40 minutes
Cyclone Klaus struck France and Spain, 24 January 2009 380 000 km² 2-million Natural phenomena Up to 5 days
Brazilian system blackout,
10 November 2009
3,1-million km² 40-million Natural phenomena Up to 237 minutes
Storm in Portugal, 23 December 2009 2807 km² 350 000 Natural phenomena Up to 132 hours
Snow/ice storm in Northern Ireland,
30/31 March 2010
13 843 km² 138 000 Natural phenomena Up to 18 hours
Brazilian system disturbance,
21 January 2002
3,1-million km² 40-million Design/application error Up to 270 minutes
Northeastern USA/Canada Blackout,
14 August 2003
Multiple states in US and province of Ontario in Canada 50-million Design/application error; primary equipment failure; secondary equipment failure; communication failure; operator error; inadequate diagnostic support 30 hours. for the majority of the affected load
Downtown Toronto, Canada, Blackout, 15 March 2010 Not available 240 000 Primary equipment failure Up to 215 minutes
Australian bush fire disturbance,
28 January to 5 February 2009.
(Multiple executions of load shedding)
500 000 km² Up to 1-million Natural phenomena; operator error; lack of load forecast for extreme weather; organising work in control room during extreme events From less than one hour to up to four hours
Brazilian North-east blackout,
4 February 2011
1,2-million km² 40-million Design/application error; secondary equipment failure Average 194 minutes
System oscillations in Central Europe,
19 to 24 February 2011
N/A N/A Not identified System was intact; oscillation lasted 15 minutes
Southwest USA, 8 September 2011 Not available 2,7-million Communication failure; operator error; inadequate Investment Up to 900 minutes
Blackout in Argentina’s western area,
4 April 2012
250 000 km² 6,5-million Natural phenomena Average 1,5 hours, up to 14 days
Large disturbance in East Japan,
11 March 2011
Not available 4-million Natural phenomena UFLS operations plus rolling blackouts for ten out of next 17 days

Control centres’ performance

One of the objectives of this task is to assess the performance of the TSO control centres’ in managing the disturbances. The event analyses suggest that for the majority of the events studied, the control centres’ performance appeared to meet general expectations, and their tools, training and response procedures appeared to be adequate. The ability of the control centre in responding to and managing system disturbances was assessed to be appropriate.

In some cases, early indication had been provided of potential threats, such as a cyclone or bushfire, but the extent of the event was not predicted and control centres could have better prepared for the forthcoming interruptions if sufficient information had been provided. The lack of adequate real-time situational awareness was prevalent in a number of incidents, though not all utilities, which were part of large interconnections, were generally more aware of the total system status despite there being some incidents where there was a lack of awareness of occurrences on neighbouring networks.

A better level of coordination between TSOs and sometimes DSOs could have improved response and restoration time. In some cases, a lack of knowledge by the operators of “behind the scenes” work, such as protection settings philosophies and grid code requirements for embedded generators, contributed marginally to a misunderstanding of what could occur. What is adequate for normal operating conditions may not always be appropriate for extreme conditions and plans should consider these abnormal circumstances.

In one example, the expected response from under-frequency load shedding (UFLS) and load reduction through voltage reduction was a lot higher than expected, thus indicating more work would need to be done in the planning stage to enable the control centres to have better predictability in their response. Generally, tools were adequate for the function required though there were some instances when failure of these tools had contributed to the development of the event. Enhanced alarm processing enabled better understanding of some events while in another case this was identified as a weakness which could be addressed.

Table 2: Main causes of events analysed. (* Secondary causes are those which by themselves would not have caused the disturbance to occur. However, they added to the severity of the disturbance, or complemented the primary cause sufficiently to result in the occurrence of the disturbance.)
Main cause Primary Secondary* Total contributions
Natural phenomena 8 0 8
Communication/control system failure 4 1 5
Design and application errors 4 0 4
Operator errors 4 0 4
Primary equipment failure 3 0 3
Insufficient training 2 1 3
Secondary equipment failure 1 1 2
Inadequate investment and complexity of system 2 0 2
Errors in maintenance 1 0 1
Excessive risks or inappropriate risk management 1 0 1
Security related (sabotage, etc.) 0 0 0
Totals 30 3 33

Possible remedial measures

Operations horizon

For the top five most common causes, the group explored possible measures in the operations horizon. Table 3 depicts a list of possible operating and operations planning measures that can be applied to help to prevent or reduce the likelihood of system disturbances.

Planning horizon

The group did not discuss or explore in detail any planning measures because of the work already conducted by WG C1.17. Nevertheless, the group concurs with C1.17’s findings that the need for strengthening the transmission system should be assessed periodically, and that any planning solutions will likely require significant investments in grid infrastructure, control and operational tools as well as learning and education. An approach that may help investors to determine whether the transmission system should be strengthened is to group potential risks into several categories according to their potential impact and probability, such as:

  • Risks with extremely low probability, regardless of level of consequence, for which costly measures may not be justified.
  • Risks with lower consequence and moderate probability for which preventative measures would be taken only if these could be strictly justified.
  • Risks with moderate consequences and moderate probability which require preventative measures to be taken, even if costly.
  • Risks with extreme consequences where probability is not extremely low and could result in major disturbances. Such risks require that preventative measures be taken, even if costly.
Table 3: Possible mitigating measures in operations horizon.
Main cause Measures to prevent/reduce
the likelihood of occurrence
Measures to minimise or reduce
impacts of the disturbance
Natural phenomena None specifically identified

Improve forecast (demand, weather, hydrology, bush fire, resource availability, etc.) capability, i.e. use of advanced models

Increase resilience of transmission equipment

Operate the power system in a more secure mode than normal

Pro-active mobilisation of staff

Define and enforce minimum requirements for connec¬ting new generators to the grid

Better forest/bush fire containment methods

Communication or control system failure

Provide full redundancy of EMS, SCADA, voice communication, etc.

Provide full capability back-up control centre

Perform more rigorous update of EMS database and capability

Provide wide-area view and analysis capability to operators, with frequent exchange of system data with other TSOs

Deploy resource to enhance observability and controlla-bility

Mobilise/dispatch staff to critical stations

Setting up emergency control centre and/or staffing backup control centre

Design and application errors

Implement a special maintenance plan for vital substations

Regular tuning or revision to controllers and protection systems tuning at key substations and generating plants

Periodic review of system protection philosophy

None specifically identified
Operator errors

Provide sufficient and frequent operator training, with aid of simulators

Provide state-of-art information and control system such as SCADA/EMS and emerging WAMS/WACS

Develop on-line equipment diagnostic systems and environment monitoring information systems

Effective use of normal security controls (SCADA/EMS) and security analysis, including overload relief measures

Effective use of data from on-line equipment diagnostic systems

Effective use of data from on-line environment monito-ring information systems (storms, lightning, geomagnetic storms, etc.)

Effective use of System Integrity Protection Scheme (SIPS) or Special Protection System (SPS)

Help from other control centre team members, from shift leader at the first place

Enhance Inter-TSO coordination to help minimise operator errors

Primary equipment failure

New Investments

Strengthen maintenance programme

Strengthen connection requirements to meet certain level of reliability performance

Develop measurement systems electric and energy signals and values which would otherwise have been available to enable a better monitoring of processes and of equipment

Installing Special Protection System as a “safety net” to contain potential cascading

Carry adequate replacement equipment inventory to minimise down time

Investment costs and extent of consequence may vary from one country or region to another. Therefore, there is no universally applicable “rule of thumb” that may suit all cases. Approaches should be refined to suit local situation, with due consideration to the social and economic aspects of the individual transmission system involved.


The analysis indicates that among the disturbance events analysed, the five most common causes for system disturbances were:

  • Natural phenomena (8)
  • Communication and/or control system failure (including lack of or inadequate situation awareness) (5)
  • Design and application error (4)
  • Operator error (4)
  • Primary equipment failure (3)

For each of these causes, the group developed a list of possible remedies which may help to prevent or reduce the likelihood of an occurrence, and to minimise or reduce the adverse impact on the power system due to these disturbances.

Another key observation made during the group’s analysing task is that in the disturbance reports which are publicly available, there is little, if any, detailed description of the restoration process. To minimise the impacts of large disturbances, it is imperative that the infrastructure and supply to customers be restored as expeditiously as possible. The restoration process thus plays an important role in meeting this objective. To achieve expeditious system and load restoration, improvements can be made through the sharing of restoration experiences and restoration processes and procedures among TSOs.

In view of the recent development in certain areas of power system planning and operations, and electricity market evolution and changes, the working group assesses that the increase in renewable energy resources, distributed generation and other trends in a modern smart grid can pose additional challenges to the safe and reliable operation of a network. In order to enhance the capabilities of the grid to cope with some of these challenges, significant investment in modernisation and upgrade of traditional infrastructures are needed to reduce and minimise the potential adverse impacts of these new challenges whose mismanagement may result in large disturbances or near misses.

It is beneficial that Study Committee C2 (through one of its working groups) continues to review large disturbances and near misses to keep adding to the data base for lessons learnt that may help to prevent the occurrence or to reduce or minimise the impact of such events.


This article was published in Electra, April 2015 and is republished here with permission.

Contact Rob Stephen, Cigré, Tel 031 563-0063, rob.stephen@eskom.co.za

Related Articles

  • Wild Fire Fiasco: California utility’s decision to shut off power to avoid wild fires backfires
  • Eskom rejects Nersa’s decisions regarding MYPD4 – heads to court; Nersa responds
  • US energy giant says renewables and batteries beat coal, gas and nuclear power
  • Industry collaboration determines renewal and modernisation at NETFA
  • Partnership allows for global solar PV distribution