Snow, water supply, and new possibilities with machine learning models

Laura Read
Nov 21, 2025
Table of contents

In the continued quest for more perfect streamflow and water supply forecasts, we often return to the search for better data inputs. In basins where snowmelt dominates the annual water budget—the western U.S. (53%), the Colorado River basin (60-80%), Japan (~30%), and the Swiss Alps headwaters (35%) to name a few—improving snow data and model snow states is a logical place to focus.

Ultimately, the broad water community hypothesizes that better snow data → better model states → improved runoff predictions. This is a reasonable chain of assumptions because hydrology has rules and principles (morals—not so sure!), and even if we cannot fully prescribe all the physical parameters that influence hydrology, snow is a clear enough signal that we know it’s important to correctly represent.

An interesting newer development in this conversation is that the modeling landscape is changing as well. So, while better data is always one piece of the quest, we have a whole new set of models with which to play and experiment. Yes, I’m referring to machine learning hydrology models, which as we’ll explore here, are uniquely capable of learning how to extract skill in a fundamentally different way than existing physical models, and that is leading to advancement in many facets of hydrologic prediction.

The reality of snow data costs and funding

Thus in the U.S. and globally, we have invested millions into snow data–from standing up operational gridded models like Snow Data Assimilation System (SNODAS) and Snow Water Artificial Neural Network Modeling System (SWANN), to the collection of high-resolution point measurements like the Airborne Snow Observatory (ASO). While many of the government- related programs started off with large funding investments, they are struggling to keep the resources needed to maintain them.

  • SNODAS recently announced a downgrade in service to “basic”–aka no more development due to funding challenges. 
  • Field snow surveys and station funding are also suffering government cuts and staffing challenges, limiting the reliability and stability of maintaining critical snow data (e.g. SNOTEL stations in North America that are critical for SNODAS and SWANN) despite a study showing a range of positive economic benefits. 
  • Meanwhile, ASO as a private entity has successfully worked to institutionalize funding for its program into federal and state budgets, positioning its value as critical for flood safety and storage optimization.

All these types of snow data have their own unique value.

Snow data’s impact on forecast performance

But the real test comes in determining thatvalue: Are our current hydrologic models able to use the data to make more perfect runoff forecasts? We’re investigating these questions deeply as we continue to innovate with HydroForecast, our ML hydrologic model.

Machine learning research proves this approach warrants the industry’s attention

ML research and applications in hydrology have grown substantially in recent years. Objectively, a few seminal works in hydrology have shown consistent skill improvements with ML models across large area studies around the world; e.g., Kratzert et al (2019) in the U.S., Mai et al (2022) in Canada, and Clark et al (2024) in Australia; similarly for extreme events, Frame et al. (2022).

We were a part of the initial Kratzert publications, and are continuing to ask big research questions with our ML hydrologic models, like: 

  • How do ML hydrologic models best utilize snow data given that they learn hydrology in a fundamentally different way? 
  • Through these analyses, can we demystify the so-called “black box”?

Experiments: Snow data in action with HydroForecast

With HydroForecast, we train a foundational hydrologic model (see Kratzert et al. 2024 for why this is critical) to learn hydrology across a diverse set of geographies, hydrologic regimes, topographies, etc. Up until now, snow data has only been one of HydroForecast's inputs after experiments across our foundational model sites showed higher skill with these inputs. We’ll dive into more below, but here’s a teaser: we’re currently testing HydroForecast’s ability to predict snow.

Experiment 1: Improved runoff predictions with remote sensing observations across 100 basins

As part of our ongoing R&D experiments to train and improve HydroForecast, we tested the model with and without remote sensing observations as inputs. Our source is the globally available NASA MODIS Normalized Difference Snow Index (NDSI) and Normalized Difference Vegetation Index (NDVI); we ran a few scenarios to understand the impact of these on skill (Figure 1).

Figure 1: Cumulative distribution of model prediction performance across 100 basins in an out of sample validation time period of three selected model configurations tested in this work. Lines which are closer to the lower right corner of the plot indicate a more accurate model.

We observe that skill in terms of Nash-Sutcliffe Efficiency (NSE) increases when the NDSI and NDVI are included, and particularly at basins for which the baseline model performs less accurately (NSE < 0.4). Arguably, the interesting part here is that we know NDSI isn’t a great measure of snow–in fact, it's just cover and not depth. Despite NDSI’s limitations, the model learned how to utilize this dataset to extract information that improved its target output: better runoff prediction.

Experiment 2: High-resolution SWE drives forecast improvements in the Swiss Alps

Next, we integrated a really good (proprietary) high-resolution snow dataset to understand its added value on improving runoff forecasts. To set the stage, we’re in the Swiss Alps where we’re forecasting the inflow to Lake Geneva. This is a basin with upstream regulation, glaciers, and about 25 active streamflow gauges to utilize and evaluate. 

To evaluate the snow data's impact, we ran the model using re-forecasts of weather data and the proprietary high-resolution snow dataset as a daily input over a two-year validation period. We then calculated Kling-Gupta Efficiency (KGE) and normalized bias to ponder performance, averaging the results over the 1 to 10-day forecast horizon to ensure our metrics were robust.

Key results and takeaways: The model has learned snowmelt pretty darn well

  • The Full Inputs model has the highest skill scores, but exhibits a decent bias in the spring melt. 
  • Providing the model with the high-res Snow Water Equivalent (SWE) bumped up KGE across the board.
  • The model benefits from seeing the combination of NDSI and NDVI (which confirms what we learned in the 100 basin study above).

What does this look like for a single site, for a forecast issued in the spring melt period? 

Below is an example forecast with the Full Inputs model from a snow-driven site within the Lake Geneva basin. Note the strong diurnal pulsing that’s a signature of snowmelt in the figure below from the model (blue lines and shades) and the observations (black line).

How did the model learn to track pulsing? What inputs is it paying attention to the most? We can look inside the model’s “black box” and learn which features it is valuing highest. Apologies for the many acronyms of weather data below!

The inputs the model is paying most attention to are what we expect: snow-related data. The third is temperature, another critical input during snowmelt, and so on… 

Zooming out from a single-issued forecast, we can look at the history of predictions at this site and isolate the high-res SWE input to understand when the model started paying attention to it. In this case, the first signal of ‘importance’ came five months before the peak melt occurred.

Intuitively, it makes sense that the model starts paying attention to SWE as it is accumulating, and that signal becomes more important as the time to melt gets closer. The interesting part here is that when the model has snow cover and SWE data, it uses this data as we expect. When it does not have it, the predictions are still quite reasonable. What does this mean for scaling ML models across areas that have a heterogeneous amount and quality of snow data? 

Our goals for continued  innovation in machine learning, snow data, and water supply forecasts

Obviously we know this isn’t the end of the snow+data opportunities; there’s lots more to dig into here. Luckily this means our work is never done and we look forward to continuing our experiments and R&D with snow data. Here’s a sneak peek into what we’re working on… 

In addition to investigating how the model uses snow data, we’re working on two big pieces of R&D for HydroForecast.

I’ll keep everyone updated once we’ve interrogated the models completely and are confident in the findings! Feel free to reach out to our team for more question about HydroForecast and snow data.

Updated Button Contact