A Clustering Algorithm for Power Outage Detection

Pour lire ce blog en français, consultez blog.nline.io/clustering-1-fr

By Margaret Odero, Data Analyst at nLine and Mohini Bariya, Energy Research Scientist at nLine

📢

Questions: nLine installs power sensors at outlets in homes, small businesses, and social infrastructure. How do we estimate the extent of a grid outage from individual sensor reports? And in the real world, where sensors can be unplugged or prepaid credit can run out, how do we separate real grid outages from false outage reports? Answer: Cluster outage reports using DBSCAN

Introduction

Measuring and improving the reliability of electricity grids across the world is a key part of nLine’s mission and detecting power outages plays a critical role in both measuring and improving the grid. Key grid metrics such as the System Average Interruption Duration Index (SAIDI) and the System Average Interruption Frequency Index (SAIFI), respectively defined as the duration and number of outages experienced on average across all customers in a region, require outage information. Grid improvement often starts by identifying the location and extent of grid outages, which can point to which piece of grid infrastructure at which level of the grid may be responsible for the outage.

To detect outages, nLine’s PowerWatch sensor is plugged into a power outlet at a home, business, or social infrastructure. The sensor records an outage report each time it loses power. However, these losses of power are not always caused by grid outages. A customer may have run out of prepaid credit, or the sensor may have been unplugged. To compute SAIDI and SAIFI from this dataset of outage reports, it is critical that we identify true grid outages and remove the “false” outage reports caused by customer unplugs or meter run-outs. We also group the true individual sensor outage reports into coherent, localized outages. For any sensor that detects the power state of an individual home, business, or piece of electrical infrastructure — from smart meters to PowerWatch sensors — aggregating individual power outage reports into outage events is a key challenge. nLine’s clustering algorithm transforms noisy, individual outage reports into coherent outage events to serve both of these needs.

In this post, we discuss how our clustering algorithm identifies grid outages and provide examples of outages we have encountered in real data. We also describe our plans for future enhancements to the method.

Using Clustering to Detect Power Outages

What is Clustering?

Clustering is the process of grouping disparate data points into groups (clusters) based on their similarity. Clustering is an unsupervised machine learning (ML) problem, because data points are classified in the absence of external category labels. In our case, the data points we wish to cluster are outage reports from individual sensors, and their similarity is based on two factors: the time the sensors reported loss of power and the spatial location of sensors. There are numerous specific clustering algorithms that vary in their objectives and constraints and are suited to different contexts and data types. We perform clustering using an algorithm based on the DBSCAN method, which we describe in the following section.

Density Based Spatial Clustering of Applications with Noise (DBSCAN)

DBSCAN is our algorithm of choice for grouping individual outage reports into clusters — termed “real outages” — and filtering out false outage reports caused by sensors being unplugged and meter run-outs.

DBSCAN clusters data points based their density in space and time, allowing for data points to remain unclustered if they are not nearby any other points, as determined by a distance threshhold (the algorithm’s only parameter). DBSCAN assumes that clusters correspond to regions with high density of data points separated by regions of lower density. Figure 1 demonstrates how the DBSCAN algorithm clusters a set of random data points.

**Figure 1: DBSCAN in use.** Left: a map of n randomly generated data points before clustering. Right: The data points after clustering based on their x and y positions with DBSCAN using a distance threshold of 1. Data points placed in the same cluster are given the same color. The data points that remain black after clustering are the outliers that do not fit in any of the clusters (i.e., are further than the distance threshold from another point). (Figure obtained from GitHub.)

For our application of identifying true outages, one major advantage of DBSCAN over other popular clustering approaches such as k-means is that we do not need to specify the number of clusters as a parameter to run clustering. This is critical because we do not know how many outages are present in the dataset beforehand; rather, this is what we wish to discover through clustering. To run DBSCAN, we feed in the outage reports — which are our data points to be clustered — along with two parameters: a space threshold and a time threshold. These parameters specify how far apart in space and time the data points can be while still belonging to a single cluster, i.e. how far apart in space and time these outage reports can be while still plausibly being caused by a single power outage. Clustering is performed based on these time and space thresholds.

Another key advantage of DBSCAN is that not all data points have to be assigned to a cluster. If a data point lies outside the threshold boundaries of every cluster, it is labeled a “noise” point. Outage reports that are given this “noise” label are conservatively considered false outages, since sensor unplugs or meter run-outs are unlikely to happen at nearly the same time for two different sensors nearby in space. Thus the process of clustering identifies false outages which can be excluded from SAIDI and SAIFI computations.

Time and Space Thresholds

We mentioned that the DBSCAN algorithm clusters outage reports based on their timestamps and locations. How exactly does this happen?

As described, the data points to be clustered are sensor outage reports. Each data point has two attributes: the time at which the report occurred, and the location of the reporting sensor. We can therefore visualize these data points as scattered on a graph with time on the x-axis and location on the y-axis (Figure 2). In clustering these points, we provide a spatial-temporal threshold pair which specifies the maximum permissible difference in time and distance in space for an outage report to be assigned to a given outage cluster.

Concretely, for outage report R to be grouped in cluster C, there must be at least one outage report within C that occurred within the time and space thresholds of R. This algorithmically captures the intuition that for a set of outage reports to have been caused by a single outage, they should occur within a short time window, and over a contiguous region. DBSCAN operates by iterating over the individual data points, finding their neighbors in space and time, and labeling these with the same cluster label until all points have been visited. In this way, clusters are grown, observing the constraints set by the spatial-temporal threshold. Along the way, some points are found to have no neighbors, as no other points are within the threshold distance from them. These points are assigned a “noise” label.

Figure 2 visualizes a toy example to demonstrate how the clustering algorithm uses the space and time thresholds to group individual sensor outage reports into outage clusters while identifying “false” outage reports that lie beyond the thresholds. In this example, the sensors at houses C, E, and D are spatially close to one another. These sensors go out together around the same time, along with sensor A. However, when clustering these outage reports, only C, E, and D are assigned to a common cluster. A is excluded from this cluster despite the closeness in time of its outage report as its location is beyond the spatial threshold, meaning it is deemed too far away to belong to the same outage as C, E, and D. Later on, sensor B also reports an outage but it is too distant in time and space to all other reports to be assigned an outage cluster. It is therefore marked as a noise or false outage.

**Figure 2: The clustering algorithm uses time and space thresholds to group individual sensor outage reports into clusters** **considered as real outages.** Outage reports from sensors in houses C, E, and D are grouped together as they go out at around the same time, within the same location, forming a real outage. On the other hand, the outage report from A is filtered out as false as it is in a far off location from the rest of the sensors. Similarly, the outage report from the sensor in house B is filtered out as it goes out on its own.

Assumptions and Constraints of Clustering

The previous example elucidates the fundamental assumptions and constraints of the clustering approach to outage identification. A grid outage will only be identified if it is observed by at least two sensors within a short time and limited distance of each other. This fundamental constraint informs sensor deployment strategies. Sensors must be deployed with enough density such that a grid outage will impact more than one of them. For example, since we know many outages originate at the distribution transformer, we aim to deploy two or more sensors under individual transformers to ensure that an outage at the transformer will be observed by all sensors under it and correctly identified as a “real” outage. We strive to co-develop deployment strategies with our partners that consider their needs within the assumptions of clustering to ensure we successfully identify significant grid outages.

Informed by our desire to capture outages all the way down to the distribution transformer level, we choose a clustering space threshold that we estimate to be the maximum distance between two households/shops connected under the same distribution transformer. One challenge with this choice is that it can lead to the splitting up of large outages affecting an extensive geographical area. We are exploring optimization methods to more dynamically select space thresholds for different contexts.

Clustering Examples: Localizing and Identifying Real Outages

The aim of clustering is to identify real grid outages by aggregating similar outage reports into outage clusters while filtering out false outage reports. However, the results of clustering often reveal much more than just differentiating true outages from false reports. In this section, we present examples where outages identified by clustering reveal patterns in grid behavior that could inform interventions.

Determining true outages and filtering out false outage reports.

Figure 3 shows maps of outage reports in a section of Accra City in Ghana over a 24 hour period before (a) and after clustering (b). Clustering identified two outage clusters (colored red and green) in (b) as well as several false outage reports (colored black after clustering).

**Figure 3: Determining true outages and filtering false outage reports out.** These are outage reports before clustering (a) and after clustering (b). The red and green colored clusters of data points in (b) represent real outages and the data points that remain black after clustering are the unclustered data points (outliers). These could be classified as “unplugs” or households running out of prepaid credits. These outlier sensor reports are consequently not included in computations of SAIDI and SAIFI.

Repeated outages involving the same set of sensors could signal a problem with a specific distribution transformer.

Outages identified through clustering can help reveal a specific piece of grid equipment that may be causing repeated outages. Figure 4 maps outage reports over another 24 hour period before (a) and after (b) clustering. In Figure 4(b), we see a set of highly overlapping outages all occurring at different times within the 24-hour period (denoted by the 3 different polygons encompassing the same set of sensors). This could indicate a failing piece of infrastructure, such as a distribution transformer, through which all these sensors are supplied. Therefore, these clustering results could inform maintenance interventions to fix or replace the problematic equipment.

**Figure 4: Repeated outages involving the same set of sensors could signal a problem with a specific distribution transformer.** Map of clustered outage reports for between "2021-06-02 21:00:00" to "2021-06-03 21:00:00" in Accra. The purple, green, red, colored clusters around the same area means these are three outages that occurred at different times but on approximately the same sets of sensors. These sensors are likely connected to the same feeder or transformer which keeps failing causing the outages to be reported on the same set of sensors.

Identifying outage types: LV vs MV/HV outages

The outage reports clustered into an outage indicate the extent of the outage, which can provide insight into the nature and root cause of the outage. Highly localized outages suggest a root cause at or under a single distribution transformer. On the other hand, outages which affect a large geographical area encompassing multiple distribution transformers are likely caused by an issue further upstream in the network, such as a fault on MV or HV lines. Therefore, at a minimum, outage extent can help distinguish root causes within distribution infrastructure from those lying in transmission; information which is key for responding to and reporting the outage. Figure 5 shows a set of outages recorded within a 24 hour period in Accra. Some of the outages affect a very localized area while the outage denoted in green affects a large geographical extent.

**Figure 5: Localized vs large scale outages.** Map of clustered outage reports for between "2021-10-01" to "2021-10-02" in Accra. The larger cluster(mapped in green) likely represents an MV/HV outage which affected a large geographical area while the smaller ones(red, purple, and blue) represent LV outages. Determining and differentiating MV/LV outages from LV outages can help us tell where these outages occur in the grid infrastructure. We may be able to tell whether an outage is due to a transformer failure or due to a failure higher upstream in the MV/LV lines. We could also tell if an outage is likely due to a generation/transmission failure or issues at the distribution level.

Conclusion

For any sensor that detects the power state of an individual home, business, or piece of electrical infrastructure — from smart meters to PowerWatch sensors — aggregating individual power outage reports into coherent outages is a key challenge. When using PowerWatch sensors, we additionally want to filter out false outages caused by customer unplugs or pre-paid meters running out, as well as false restores caused by generators. Clustering individual sensor reports based on distance in space and time is a powerful way to accomplish both aggregation and filtering. The premise of clustering is that while "false" outages affect only an individual (measured) customer, grid outages tend to simultaneously affect several (measured) customers, often over a contiguous region. Therefore, clustering aggregates coincident and nearby outage reports to distinguish grid outages from "false" ones, and then to robustly estimate the duration and extent of the grid outage.

The outages identified by clustering are used to estimate key metrics — SAIDI and SAIFI — to evaluate and quantify the performance of an electrical grid and valuably inform interventions and investments aiming to improve the reliability of power provided to customers. For example, electricity distribution utilities can use SAIDI and SAIFI data to identify areas with poor reliability and motivate further investigation and resolution of the underlying causes of reliability issues. Investors can also use these numbers to target investments for improving grid infrastructure. As we have started to explore in this blog, the individual outages and patterns in outages identified by clustering also have promising values for detecting and diagnosing grid issues.