Advancements in Grid Data Searching Algorithms
OpenCIP Concept White Paper
by Travis Rouillard
Last year, GridBright deployed the BetterGrids.org Grid Data Repository for the DOE ARPA-E. BetterGrids.org is an online, searchable, open library of grid network models for power systems research, development, and testing. It provides an easy way for the grid research community to contribute and find grid models. The models are curated from the public domain, submitted by the community of users, or are contributed by labs or universities as an output of their research efforts. Since going live in 2017, we have signed up almost 200 users and collected over 300 network topology models.
Smart grid researchers depend on realistic models of the power system network, generators, and loads to simulate the grid in laboratory environments. This allows for the development and testing of new algorithms, processes, software, and hardware long before any type of field trial.
A constant challenge is finding models that accurately capture the physics of different grid challenges and scenarios. Transmission vs distribution networks. Long feeder runs vs heavily meshed networks. Overhead lines vs underground lines. Large commercial loads vs lots of residential solar. Etc. Etc. While there are hundreds of models in the public domain, the trick is finding one to match the specific characteristics of different use cases.
The rationale for developing BetterGrids.org was to make finding models easier for researchers. Simply having a public model exchange for the community addresses part of that goal. The Repository can also be divided into Collections that are categorized and indexed by data type – Transmission models, Distribution models, Buses, Generators, Loads, Inverters, Capacitors, etc.
But it is impractical to categorize all topology permutations and dimensions. We needed something more powerful so users could find ad-hoc physical characteristics. For example, users identified some examples to support their latest smart grid solar research –
- Find models that have over 100 PV loads
- Find models that have over 10 PV loads on a single 4KV circuit
- Find models with 10 or more feeders fed from the same substation that have more than one voltage regulator per circuit
- Find models with more than 100 wind or solar plants connected at 34kV or lower
- Find models with the sum of distributed generation total kW capacity of PV plants over 100kW is 100mW or greater
To answer this challenge, GridBright developed a feature we call ‘Semantic Search’. Semantic Search is the ability for the Repository to understand what is in each model, and then allow users to search using intuitive ‘natural language’ queries (like they do with Google). Users can literally type in the sentence “Find models that have over 10 PV loads on a single 4KV circuit” and the Repository provides back all matching search results.
The simplicity of the user experience is enabled by an underlying ‘big data’ technical architecture consisting of ‘cataloguing’ and ‘searching’ components. The Catalog Engine imports network topology files from a variety of different popular modeling formats, translates them into a common topological graph format, extracts key meta-data, and creates an efficient search index. The Catalog Engine only needs to run once, when new model files are submitted. The Search Engine parses the natural language queries into an efficient database query using the indices, translating human terms into technical terms the database engine understands. The Search Engine runs every time a user submits a new query.
Our long term goal is to develop similar ‘catalog’ and ‘search’ algorithms to support a wider variety of grid data and software. Each data type has fundamentally different data formats, attributes, meta-data, and user inquiry patterns. We believe that the work we have already done gives us a significant head start in solving a similar problem for other grid data types.
Travis Rouillard, GridBright CTO