For at least the past seven years, Northumbrian Water Group (NWG) has been realizing the value of open data alongside sector and cross-sector collaboration efforts, and it’s been nothing short of transformational for our organization. One of our key enablers to this end has been “data hacks,” orchestrated events where we invite people and teams from outside of our organization to analyze our data in a secured environment. 

Our successes with data hacks have naturally informed Stream, an open data initiative that aims to unlock the value of water data in the UK, which NWG is leading. Here, I talk about the legacy of data hacks, Stream, and the benefits and challenges of open data in the water industry.

How it started: the success of “data hacks” at Northumbrian Water Group

You’d be surprised how eager people are to play with real datasets, and the fantastic insights you can gain from opening your data for people outside your organization to analyze. During our first data hack seven years ago, we compiled about 10 years’ worth of pollution incidents data into a cloud environment. We invited people outside our organization to hack our data over a weekend. Around 100 people showed up, and we realized the value of these open data hacks immediately. Attendees posed hypotheses, asked questions, and brought up different ways of thinking we hadn’t thought of before. 

We brought these insights back to our organization, and it led to real solutions. For example, NWG has seen tremendous improvement in our performance in key areas like pollution. We’ve gone from almost the worst in the industry for pollution incidents to being among the best — an outcome that originated in data hacks.

Along with witnessing other industries, like transportation, banking, and energy, run successful open data projects, data hacks have inspired and informed our approach to the Stream project. If small, controlled open data experiments like NWG’s data hacks can have had such a profound impact on our company, imagine the impact other water companies, and the industry as a whole, could gain from this kind of openness.

Stream: enabling the water sector towards open data

Stream aims to unlock the value of water data by establishing the processes our industry needs to remove barriers to opening up and sharing water data. By creating a repeatable, scalable process for sharing data between water companies and with the public, we can ensure innovators have what they need to experiment and find solutions to tough sector and cross-sector challenges that simply can’t be solved by a single person or organization. 

Led by NWG and supported through innovation funding from the Ofwat Water Breakthrough Challenge (WBC), Stream brings together 13 of 17 English, Welsh, and Scottish water companies, with plans for all to join in the future. We have five experienced delivery partners, Open Data Institute (ODI), Icebreaker One, Sia Partners, Aiimi, and Costain, and two regulators, Ofwat and the Environmental Agency (EA), working together on Stream to open up water data in the UK, along with a wide range of potential open and shared data consumers. 

Stream includes a common framework and strategy for long-term collaboration, standards development, operational and legal capabilities, and a widely accessible open data platform with associated governance and data standards. Stream is integral and runs parallel to NWG’s open data strategy. 

Why we need open data initiatives like Stream in the water industry

To keep pace with the impacts of climate change

The water sector faces change at a faster pace than ever because of climate change. Higher temperatures lead to more intense storms more frequently. Extreme climate events significantly impact water services, customers, and the environment because they damage infrastructure, cause system failures and service interruptions, overwhelm stormwater and sewage systems, and cause pollution incidents. Meanwhile, climate change in some areas is leading to reduced water supply, changing the amount of available freshwater for a population and therefore how water sources and utilities are managed. 

There’s an urgency for water companies to start collaborating and innovating more quickly to meet and adapt to these climate change challenges — open data can help with that. By creating an ecosystem of open data and collaboration in the water industry, sector and cross-sector teams and individuals can collaborate more efficiently and find solutions more effectively with robust datasets in their hands. This can help the industry adapt to and recover from extreme climate events faster, prevent pollution incidents, support better predictive forecasting, and drive performance improvements for reliable and resilient water services in the face of climate change. 

To unlock stronger correlations and data signals, enhancing decision making

The water industry holds vast datasets that can become stronger when pooled together. There are a myriad of possibilities and potential benefits to combining the water sector’s different company datasets:

  • Patterns related to climate change impacts, customer behavior, health and safety incidents, and infrastructure aging may become clearer when observed across a larger dataset, spanning multiple utilities and regions. 
  • Machine learning models can be better trained to predict events like equipment failures, contamination events, or demand surges because larger datasets provide a broader range of scenarios. 
  • Utilities can analyze pooled data on energy consumption in water treatment and distribution to identify inefficiencies; correlating this with data on water flow rates, pump operations, and equipment age could help guide maintenance and upgrades.

Likewise, the Stream initiative believes that national-scale data can enhance decision making by:

  • highlighting potential regional differences with clear evidence,
  • increasing transparency and trust in the sector overall,
  • and creating a solid empirical basis to underpin sectoral policy discussions.

Larger datasets from diverse sources can unlock correlations and data signals that might be undetectable in smaller datasets, spurring richer insights. It can help the water sector discover new, innovative solutions that best support their customers, their businesses, and the environment. 

To catalyze cross-sector and cross-disciplinary insights and solutions to water challenges

Water challenges transcend the water sector’s boundaries. There is no single owner of the problem of river quality or energy use, for example. Industries are inextricably connected through water. Take, for example, energy consumption in UK homes — 18% of UK households’ energy consumption is spent on heating water. If both the energy and water sectors are aiming for net zero, this is a clear cross-sectoral problem that could benefit from collaboration and layering our datasets to find connections. Open water datasets can enhance the entire open data ecosystem and connect the dots between various industry’s knowledge. 

Because regulators increasingly require open data and collaboration initiatives

The UK government has an agenda to advance open data initiatives in the water sector. Ofwat, the economic regulator of the water sector in England and Wales, has signaled its strong drive towards open data. It’s becoming more clear that water companies in the UK will have no choice but to get on board with open data eventually. The good news is that means there’s opportunities for funding and support that compels water companies across the UK to prepare themselves to contribute to the open data ecosystem if they haven’t already.

Open data challenges to overcome in the water sector

While open water data has the potential to drive innovation and improve decision-making in the water industry and beyond, open data projects like Stream come with a range of challenges, which can make buy-in slow among water companies. These reflections are specific to the UK, but water companies around the world likely face the same issues. 

Data security and privacy

Data security and privacy concerns are at the top of the list of challenges when it comes to open data initiatives, not just in the water sector but across all industries. To protect privacy, data must be sufficiently anonymized and aggregated to prevent the dissemination of personally identifiable information (PII). Likewise, processes need to be in place to ensure Critical National Infrastructure (CNI) data is safeguarded and, if shared between companies, robust cybersecurity measures are in place to prevent unauthorized access, tampering, or breaches. 

Source: The Open Data Institute

Open Data Institute’s data spectrum

The Stream initiative considers the full spectrum of water data outlined by the Open Data Institute (OPI), including data that stays confidential due to business, safety, or privacy concerns to data that is openly accessible for anyone to retrieve, utilize, and distribute without restrictions. 

Our approach involves developing a value framework to identify high-value datasets and a sector-wide open data triage process for water companies to assess and mitigate the risks of sharing their data. We’re drawing from the energy sector’s current process to systematically identify when a dataset may possess characteristics that limit its potential openness, while also identifying techniques to mitigate those limits. The goal is open data, so where data can’t be shared or can only be shared between certain groups, it needs justification as to why it cannot or should not be open. 

On the technology side of things, Stream’s open data platform will aim to facilitate a range of license types under fully open, public, and shared data categories, making it easy to publish data in both a centralized way (data stored in a central repository) and decentralized way (data hosted by individual companies but still searchable in the data catalog).

Data standardization & interoperability

Standardizing datasets across the water companies participating in Stream is critical for maximizing the data’s quality and usefulness while enabling collaboration and comparisons across the sector. But data standardization is one of the more daunting tasks of any open data project. 

Data that comes from different sources often has its own data structure, format, and quality — inaccuracies, missing data, inconsistencies between datasets can pose significant problems during standardization. Semantic inconsistencies, even between water companies, is inevitable, where different companies use different terminologies or definitions for similar concepts. Even different levels of data granularity are common between water companies. For example, data collected at various times or places might not be directly comparable. Seasonal variations or differences in data collection sites can lead to false conclusions if not appropriately accounted for. All of this can add layers of complexity to standardization processes. 

Needless to say, the data standardization process is time and resource intensive because ensuring the datasets are comparable can be a complex task and, at the same time, engaging a range of stakeholders to agree on definitions and meanings is also a challenging endeavor. 

But without standardization, the full potential of open and shared data in the water sector cannot be realized and its impact could be significantly limited. Stream’s standardization process is a foundational element to the open data initiative. We’re working hard on defining common data standards and a shared trust framework, so data from different water companies can be combined and used together.

Data skills and interpreting support

Without proper background knowledge and robust metadata, there’s a risk of water data being misunderstood or misinterpreted, which can lead to inaccurate conclusions or decisions. People can over-generalize based on limited data, for example, or data that is too aggregated might hide significant local variations. Likewise, if certain areas or issues are more frequently monitored by some water companies than others, it can result in a skewed picture of water issues if the data user is not aware of that context. Building data literacy within and outside the water sector will be critical to the success of open data in the water sector because it will ensure people think critically about the data and that the context of each dataset is well understood by anyone who uses it.

Stream provides the path for open data in UK water

From data security and privacy to interoperability and data literacy, stream aims to comprehensively and openly provide a path to overcome open data challenges in the UK water sector. Our open data roadmap, which will be published after UK water signs off on it at the end of 2023, includes agreeing on a data publishing cadence that all Stream water companies will follow. But first, we’ll have a minimum viable product release at the beginning of 2024.

We plan to be brave and publish one of our toughest datasets first — sewage overflow data. It will likely cause some noise (news stories, public scrutiny, etc.), which will be uncomfortable in the beginning. But it’s one of the most valuable, sought-after datasets, and a key part of our open data strategy is to release those kinds of datasets first. We remain optimistic and look forward to seeing the UK’s open water data ecosystem take shape for the benefit of water customers, society, and the environment.

“Open Water” whitepaper

This article was taken from Qatium’s whitepaper “Open Water: why we need open data, open software, & open collaboration in the water sector.” You’ll find the full whitepaper here.

You might also like...