*Mohini Bariya*

*, Energy Research Scientist at nLine*

**Knowing the structure of the grid—how lines interconnect and what phases loads are on—is vital for efficient grid maintenance and operations, informing applications ranging from fault localization to phase balancing.**Yet, grid structures, especially in distribution, can change over time and are often poorly known. This blog starts to explore how

*n*Line’s voltage data could be used to infer grid structure, with a vision toward eventually providing such insight to utilities.

### Introduction

Connecting diverse energy sources to users over long distances, electric grids are complex networks with intricate structures. Especially in the low-voltage distribution system, the grid becomes highly branched and convoluted to deliver electricity to individual homes and businesses.

This connectivity structure of interlinked lines and equipment is termed “

**topology**” and it is a fundamental property of all electric grids. Grid topology informs critical decisions around control, maintenance, optimization, and event response. Yet, knowledge of topology is often erroneous, incomplete, or totally absent. Especially in the low-voltage distribution network, a perfect storm of limited sensing and more complex and dynamic structure leads to generally poor topology awareness in grids across the world.Topology consists of two key components: connectivity and phase (although the two can not be totally disentangled). To capture the connectivity of the grid, we often reduce the complexity of an electrical network to a

*single-line diagram*(Figure 1), where all the wires and equipment are condensed into lines connecting nodes. In reality, the individual lines in the single-line diagram generally consist of three wires in parallel — one per*phase.*The three-phase structure of grids has its origins in the time of Tesla, and provides several performance benefits. Voltage, current, and (consequently) power on the phases is intended to be highly symmetric (or*balanced*), with a 120 degree phase shift in the waveforms. In practice, the phases can be far from balanced, and equipment failures, faults, and other events may affect a subset of the phases. At the very edges of the grid, the phases may be split, so that certain lines have only one or two of the three-phase wires.**Overall, it is important to know both the connectivity and phase structure of a grid topology**.This blog begins to explore how

*n*Line’s voltage data collected by our PowerWatch edge sensors can be used to elucidate the topology of the grid. It captures the beginnings of a longer term vision to provide such information to utilities to support and enhance grid operations and maintenance.#### Topology from Measurements

Sensors measuring grid parameters can give us visibility into the underlying topology of the gird. A large literature looks at using voltage measurements to recover topology. Voltage is an effective quantity for revealing topology since it is

*global*— affected by current (power) flows throughout the network. This is in contrast to current which is a local quantity. The impact of a current on a voltage is mediated through the network impedance, which captures connectivity and has an attenuating effect over distance. Therefore, a voltage will be most impacted by nearby current flows. Consequently, voltages at proximal nodes will track each other more closely than those far apart. A small toy example illustrates this below.**Toy Example**

*This example illustrates how voltages measurements can indicate proximity in a distribution network. It derives some simple voltage equations using Ohm’s Law. If you are already convinced about the efficacy of voltages for this task, feel free to skip this example.*Consider the small distribution network represented by a one-line diagram in the Fig. 2. We have three customers 1, 2, and 3 drawing currents , , and . They are connected by four lines, with impedances through . The “root” of the network has a voltage — as is common practice, let us assume this voltage remains always constant. Now, by Ohm’s Law, we can write the voltages observed at our three customers: Notice that , which is furthest away from the other customers, is not impacted at all by the current (and power) drawn by customers 2 and 3: and do not show up at all in the equation for . On the other hand, the voltages at customers 2 and 3 are impacted by the current drawn by

*both*these customers. We see this mathematically in the equations for the two voltages, which have a common term— —that is the voltage drop along the

*common path*between the customers. This common term will contribute a similarity in the time series of and which will show up in metrics such as the correlation between the time series. Customers that are closer together have longer common paths, and the contribution of those common path terms in the voltage equations leads to greater similarity in their voltages. This is why voltages of more proximal customers match more closely, and why voltage can be used to infer network structure.

The rest of this post demonstrates all of this theory in practice. Using

*n*Line GridWatch data from Accra, Ghana, we see how we can uncover phase and connectivity information from voltage data. We augment the voltage data with other*n*Line data types, to understand how they complement each other for phase identification.### The sites

Our basic deployment unit is a “site”, where several sensors are placed in an area believed to be served by a single transformer. This post explores phase and topology in four deployment regions within Accra. Two of these are densely deployed sites—named

`59`

and `78`

—where many sensors have been placed over an area believed to be served through a single transformer. The third is a set of sites spread over distance along Aburi Road. The fourth consists of two sites encompassing a few sensors around Darkuman Road, which allows us to explore how measurements can validate our deployment strategy.### The calculation

The calculation we use to go from voltage time series to an indication of topology is the variance of voltage differences, which we term

*vardist*for short*.*This is a*pairwise*quantity which gives us a distance metric between two respondents and with voltage magnitude time series vectors and :Since our sensors are sparsely scattered across a large network, separated by significant voltage drop along the lines, and possibly under different transformers, we standardize the voltages at each sensor before computing vardist. Intuitively, this rescaling focuses the metric on the alignment and relative significance of changes between the two series, and less on their absolute value, reflecting our interest in the connectivity structure, and not the values of line impedances. This rescaling also leads to a more interpretable final value: vardist is bounded between 0 and 4 (for more detail, see [1]).

Since we are trying to discover close together sensors, a

*proximity*metric can feel more intuitive. We convert vardist to*varprox*, which is bounded between 0 and 1, with a value close to 1 indicating sensors that are very proximal in the network.The heatmap in Fig. 3 visualizes varprox between a set of respondents, assembled into an matrix. The brighter entries indicate respondents that are closer together according to the varprox metric.

Notice that the varprox metric is very closely related to the widely known correlation coefficient. Indeed, we would likely see very similar results with either metric.

#### Variability over time

The varprox proximity is not constant over time, and its connection to the connectivity relies on assumptions on the statistics of currents and demand over the network, whose validity changes over time [1]. We can not expect a single snapshot of varprox to accurately capture the connectivity of sensors. The histograms in Fig. 4 show the range of varprox values we obtain between pairs of respondents over time. The variability is significant, often with a long tail.

To handle this variability, we compute many samples of varprox and then take their median to obtain a more robust proximity metric. Each sample is computed using a short segment of voltage data. The length of this segment, denoted , must be long enough to contain an adequate number of points for computing the metric and to capture enough voltage variation. However, if it is too long, then we will have few samples to take the median over for the robust varprox. Further, over a large , our voltages are likely to be very non-stationary, with changing means. These changes, aggravated by the normalization, will dominate the final metric. Yet, these longer duration changes are driven by aggregate load and therefore highly correlated across all sensors, making them less locally distinctive, and overall less revealing of proximity.

We choose hours, because it gives us a significant number of varprox samples per day (12) with a reasonable number of voltage points per sample (with 2 minute sensor reporting: 60 points per sample). Within a 2 hour window we see enough voltage variation to produce a proximity signal, but the period is short enough to exclude significant bulk demand variation that would correlate highly across all sensors.

However, the results are not very sensitive to this choice, and many values of around this length are effective.

#### Ordering & Projections

One challenge of working with varprox is how to present it to the user in an insightful, digestible format. With sensors, we end up with values to comprehend. The heatmap visualization of Fig. 3 effectively conveys proximal sensor groups, but only if the respondents are ordered — a random ordering is incomprehensible. It would also be nice to somehow visualize the varprox information alongside geographic sensor locations.

A

*graph projection*can help us. We can consider the varprox matrix as representing a graph structure with nodes for the sensors, connected by edges whose weights are the pairwise varprox values. Two sensors which are closer together in the physical network will thus be connected by a heavier edge in this graph. With this graph representation in hand, we can draw from the rich graph theory field, which has studied deeply how to analyze the structure of large graphs.The Fiedler vector is a succinct representation of a graph’s structure, derived from the graph Laplacian (see [2] for more details). It is a vector of values, one per node, where nodes that are highly connected in the underlying graph are given a closer value in the Fiedler vector.

By extracting the Fiedler from our varprox matrix, we get a set of values which order the sensors by proximity. We can arrange the sensors in the varprox matrix by their Fiedler values to obtain a more intelligible visualization which reveals proximal sensor groups. The effect is visualized in Fig. 5.

The result isn’t perfect, and in deeper post analysis, we might refine the ordering, but the Fiedler sort gets us impressively far in ordering the nodes given the dimensionality reduction from to just .

We can also visualize the Fiedler values as colors on a map of the sensors to capture how the network proximity relates to geographic location.

### Other Data Sources

GridWatch provides both voltage magnitude and outage data.

*n*Line is also beginning to collect very high resolution waveform snapshots, termed*point-on-wave (POW)*. These data types correspond to very different time scales of grid phenomena (Fig. 6), but all can help estimate grid topology. Sensors which are close together in the physical network are likely to experience the same outages, and therefore higher “common outage” counts. The waveform data can capture the signature of the voltage across sensors at the outage moment itself, potentially revealing sensor clusters which experienced more similar signatures and are likely to be more proximal within the network.## Along Aburi Road

Sensor deployments along Aburi Road consists of several sites stretching over several kilometers. We compute varprox between sensors at a subset of all these sites, and visualize the results via the Fiedler sort, projected on a map of sensors (Fig. 7, left) and as the reordered matrix (Fig. 7, right). To reiterate, the ordering of sensors in the matrix is the ordering of their Fiedler vector values. This same ordering is used to generate their colors in the map.

There is a lot of fine-grained structure in varprox matrix. However, when looking across the entire set of respondents covering a large distance and under several transformers, varprox predominantly reveals the broader, connectivity structure consisting of topology, rather than phase information. To elucidate phase information varprox will need to be focused on localized (and denser) deployments, as we have at sites

`78`

and `59`

. ## Site 78

Fig. 8 visualizes the varprox over 6 months at site 78. The respondents have been ordered by hand, based on a refinement of the Fiedler ordering. The proximity remain remarkably consistent over the the six months, an encouraging sign that varprox metric is indeed capturing structural properties—which change little over time—rather than short term load correlations. The metric matrices do appear to reveal some structural changes in the network. In August, respondent

`10029`

distinctively changes from being close to the pair `210B5744`

& `95158650`

, to being close to `428D66F0`

& `D2498E71`

. At least some of the highly proximal groupings likely correspond to respondents on the same phase. It is not uncommon for customers to be switched to another phase following an outage, and this may be what happened to respondent

`10029`

in August. However, without more data or utility verification, these remain hypotheses.

In the following sections, we start to investigate how additional GridWatch data can be used to bolster or augment the claims of the varprox metric.

## Darkuman Avenue Sites

We look at a few nearby sites along Darkuman Avenue. These sites have relatively few respondents each compared to the dense deployment at site 78.

We compute the varprox between respondents in these sites, and compare that metric to the

*outage overlap*between the respondents. The outage overlap is a proximity metric that quantifies the number of common outages experienced by a pair of respondents, normalized by the total number of outages they experience. Like varprox, it is bounded between 0 and 1. The results, alongside the map of sensors, are visualized in Fig. 9.There is corroboration between varprox and outage overlap in demarcating sensor groupings that are further apart. However, there is also some disagreement. Notice how the voltages in the purple region are highly correlated with those in the green, while they have very little outage overlap. Overall, the smaller the region we study, the more difficult it is to correspond varprox and outage overlap information. This suggests that outage overlap is more revealing of connectivity structure over longer distances, than local, phase connectivity (note that this is highly influenced by our deployment strategy and how we capture outages).

The grouping demarcated by the colored boxes above reflect the site labels of the respondents, with one exception. Respondent

`67C0C6CE`

is classified as in the same site as respondents `26BF1498`

& `B1C18DFB`

, but the varprox in March compellingly places it closer to respondent `BCA00F43`

at an adjacent site. In this way, varprox can help to correct connectivity assumptions for a network.## Site 59

Interesting things are happening over eight months at site 59! The varprox matrices in Fig. 10 show a degree of consistency across the months, as we saw at site 78, with greatest similarity between adjacent months. However, we also see seemingly structural changes, especially from May to August, where the proximity of some sensors changes distinctively.

One of the most distinctive changes is at respondent

`10122`

. In December, this sensor is distinctively within a group of highly proximal sensors which also includes `3711BB6B`

. In April, `10122`

goes offline, failing to report voltage data, which explains its absence in the May varprox matrix. By June, it has returned, now very close to respondent `AE1E42BD`

, which has thus far been a lonely sensor far from all others. This change is so distinctive, we can see it in the raw voltage time series (Fig. 11a & 11b)!#### Point-on-wave data & phase

At site 59, we are lucky to have waveform or point-on-wave data from some of the sensors. The 4kHz waveform data consists of snapshots collected at the moment of an outage. Rapid outages are clearly visible in the snapshot as drastic drops in voltage to 0. If such an outage occurs across many respondents, the outage moment allows us to accurately time-align the various snapshots down to the sub-cycle. In other words, the grid itself is providing us a high resolution synchronization signal! Below, we see a set of waveforms from site 59 aligned in this way — the yellow line indicates the outage moment that has been used to align them.

What is clearly revealed, post-realignment, is the three-phase groupings of sensors, with each group offset 120 degrees from the other two.

Overlaying the phase groupings in the waveform snapshot with the varprox matrix, we see that the phases, as revealed in the waveforms, generally align closely with highly proximal varprox groups. What is interesting is the two distinct varprox groups that appear to belong to the same phase — this may be due to a separation in the connectivity structure.

## Conclusion

We often think of grid measurements, particularly in distribution, as telling us about the behavior, experience, and state of a single point in the network: the particular home, business, or piece of equipment at which the measurement is made. But electric grids are

*highly*interconnected systems, and grid voltages are driven by current flows throughout the network. Therefore, embedded in our single point measurements is a great deal of broader, system information! This information—about the state and structure of the wider network—is fundamental to understanding and intervening in the grid and has enormous actionable value to those handling system operation and maintenance.This blog begins to scratch the surface of extracting structure information from

*n*Line’s voltage and outage data. The examples show the richness and promise of the data for structure discovery. We are excited to develop these directions further at*n*Line, in ways that will best meet utility and grid operator needs.## References

[1] Bariya, Mohini, Deepjyoti Deka, and Alexandra von Meier. "Guaranteed Phase & Topology Identification in Three Phase Distribution Grids."

*IEEE Transactions on Smart Grid*12.4 (2021): 3605-3612. link.[2] Wyss-Gallifent, Justin. “Graph Theory.” link.

*Section 12.4 includes discussion of the Fiedler vector and graph partitions*.