Marshall Moutenot

Not All AI has the Same IQ

Key change

Recently, the HydroForecast team competed in a year-long streamflow forecasting competition hosted by The Centre for Energy Advancement through Technological Innovation’s Hydropower Operations and Planning Interest Group. One goal of this competition was to determine whether AI models could beat existing approaches to forecasting streamflow.

The competition has wrapped up, and the conclusion is: yes, AI is a massively powerful tool when it comes to forecasting the amount of water that will flow through a river or stream. HydroForecast won 23 of 25 categories across all of the forecasting regions, hugely validating to have such a decisive result calculated live by RTI International, whose team evaluated the metrics.

But after some reflection on the competition, I realized that there is a more interesting takeaway: other participants also used machine learning and, if you’ll excuse the double-negative, they didn’t just not win but in most cases performed worse than the traditional approaches. Woah! So what does this mean for where and how we choose to apply AI to problems?

  1. We could all use more clarity about when to reach for AI to solve a problem, so I’ll walk through how my team assesses if machine learning is the right tool for the job.
  2. Different AI solutions are, at face value, treated equally and shouldn’t be. The streamflow forecasting competition is a great example of how AI can be misapplied to a problem.
  3. AI solutions are sometimes framed as opposing their conventional counterparts. Conversely HydroForecast applies AI in a way that builds upon and integrates the wisdom of physical modeling.

Disclaimer: I will use AI and machine learning somewhat interchangeably. If we are being precise: Artificial Intelligence is a super-set of Machine Learning is a super-set of Neural Networks.

AI or Nay-I

Powerful, “new” technologies are sometimes billed as silver bullets and panaceas; “they’ll solve your hardest problems and increase the efficiency of your strongest teams by 51%!”

A robot hand reaching towards the camera
"This way to the future," AI says.

When first assessing a problem, I adhere to an adage I learned while getting my engineering degree: “keep it simple, stupid,” a design principle originally noted by the U.S. Navy in the 60s that, in my interpretation, asks “is there a creative, simple solution to this problem, even if it sacrifices some performance or accuracy?” 

In the case of nascent tools like artificial intelligence or blockchain, more often than not there is a more discernible solution to a problem. Solutions devised from this viewpoint can often be more easily explained and maintained than their more complex counterparts.

In the case of streamflow forecasting, organizations can and have gotten mileage out of simple regression or long-term averages. Depending on the problem being solved with the forecasts, if the tolerance for error is high enough, these simpler approaches are easy to interpret and maintain.

However, if reducing error means safer, more efficient operations and more prescient planning, as it often does in the case of streamflow forecasting, sometimes simple solutions aren’t enough. It’s only in this case, once we’ve exhausted the candidacy of simple, creative solutions, that we’ll reach into our toolbox for machine learning.


There are existing, complex solutions to forecasting streamflow. Conceptual physical modeling, meaning creating calibrated equations that attempt to capture the relationships between variables (e.g. precipitation) and streamflow, have been applied to forecast flows at a variety of horizons for decades. 

Conceptual models range in performance from woefully inadequate to pretty good. Their complexity is inherent to the problem itself: streamflow depends on so many different factors: the past winter’s snow, recent precipitation and temperature, upcoming precipitation and temperature, elevation and soil types across the basin, groundwater interaction, distribution of land classification, and many (many!) other factors. 

This problem is a great fit for a tool like machine learning. There are a lot of possible inputs, some still waiting to be discovered and applied, and a set of complex, interconnected relationships between these inputs and the predicted output. And best of all, there’s a wealth of established science and lessons from past solutions to draw from and build upon.

However it is important to keep in mind that not all solutions that employ machine learning are created equally. I’ll show that for a complex problem such as this, it’s possible to create a machine learning model that can be passed off as performing well, but when put to the test will have worse performance than conceptual models and in some cases, fail catastrophically.

When AI Falls Short

When the prior art, established science, and interdisciplinary nature of a problem are not sufficiently integrated into how machine learning is applied, the solution will either immediately underperform or - eventually - fail catastrophically. What this means in practice is that a model is provided with a limited or partial set of inputs that influence the forecast, is trained on a historical period that causes “overfitting,” or is otherwise structured in a way that violates the nature of the problem (for streamflow forecasting, the laws of physics).

In the case of forecasting streamflow, a hypothetical solution might take precipitation and temperature as inputs, some in situ flow measurements as observations, and an off-the-shelf machine learning model to tie it all together. Training and evaluation looks pretty good! The solution-builder presents the result and moves forward to place it into a decision-making workflow. 

But hold up, we’re savvy about hydrology now and we know that there is FAR more than precipitation and temperature that drive streamflow. Can you think of when and why this approach will fail, catastrophically?

Robot prototypes failing are simultaneously hilarious and tragic. But in the case of AI decision-support, the ramifications of failure can be severe.

The first failure would be acute: the first big storm (more water) or intense drought (less water) that occurs outside of the parameters of the data used to train the model would produce nonsense or incorrect predictions. And depending on how that model is used, that could have critical consequences like forcing an operational team to scramble due to a missed forecast.

The second failure would be gradual, but no less severe. Unless the model is constantly retrained, long-term nonstationarity (i.e. things gradually drifting from the relationships between inputs and outputs) - in our case climate and landscape change - would cause the model to slowly deviate. This discrepancy might be almost imperceptible at first, but would become more exacerbated over time and no less consequential than the acute failure, especially as extreme weather events are projected to increase.

A Better AI: Theory-guided Machine Learning

Thankfully, there is a better way to employ machine learning to predict natural systems: an approach that respects prior solutions and structurally integrates their wisdom. Our friend Curt Jawdy of Tennessee Valley Authority brought up the excellent point in his reflections on the competition: “How will AI and conceptual models hybridize to provide a best-of-both approach?”

At Upstream Tech we call this a theory-guided machine learning approach, and it’s the beating heart of HydroForecast. We:

  1. Leverage expertise in meteorology and hydrology from our team and partners to inform how we select inputs
  2. Meet with our customers to understand and incorporate their wisdom of the river(s) they work on
  3. Build upon physical modeling approaches to inform how we train our models, evaluate our results, and make iterative improvements

How do we know this approach to applying machine learning makes HydroForecast stronger? 

The Forecast Rodeo participants included veteran forecasting teams at utilities like Tennessee Valley Authority and Hydro-Quebéc, governmental forecasts from agencies like NOAA’s National Weather Service River Forecast Centers, private vendors including Upstream Tech and Sapere, and - at some locations - public participants. We were not the only ones who submitted AI forecasts!

In each of the competition’s geographic regions, the best performance came from HydroForecast’s theory-guided machine learning approach. On average, machine learning models which were not theory-guided (“Statistical” above) under-performed conceptual models.

These other AI models were complex, but were not designed to incorporate the theory of hydrology. And we can speculate that the models’ performance would worsen over time as more extreme events occur and climate patterns and landscapes shift. In contrast, HydroForecast’s theory-guided machine learning design makes it the best forecasting model currently available, and it will continue to improve and perform in the years and decades to come.

It bears repeating: solutions that employ machine learning are not created equally. Often, there are simpler tools to use when devising solutions. And when machine learning is a strong tool for the job, it is best applied with respect for the science and in collaboration with the experts already immersed in the problem.

‍To learn more about HydroForecast, reach out at Do you have a problem and are thinking about machine learning as a tool for the job? I want to hear about it! Email me at