If you receive any water-oriented professional magazines or go through the innumerable marketing material that fills your inbox every day, you’ll see that Artificial Intelligence and its subset machine learning (ML) are going to solve all of our problems. Just import a bunch of data, hit GO and your computer does the thinking for you while you sit back with your feet up on the desk.
I wanted to increase my understanding this machine learning thing. While I’ve been following the topic from as far back as the 1970s, it has only been in recent years that ML has been making a big splash in water. I needed more than my superficial knowledge of ML, so I signed up for an 11-week online class in machine learning. It was pretty rigorous. (If you didn’t have a background in calculus, statistics, numerical analysis and linear algebra, it would be hard to keep up.) I’m not claiming I’m an expert. I’m pretty close to the bottom of a typical Dunning-Kruger curve. But I have a much better understanding of ML and can drop terms like decision boundaries, overfitting and logistic regression in casual conversations.
The relationship between machine learning and hydraulic modeling
The way ML works is by using some clever algorithms and a huge amount of data to train the algorithm for the problem at hand, ML can come up with good answers in many cases. What it means for hydraulics is that if you want to determine the flow in some pipe, we’ll call it P-10, you measure the flow in P-10 many times while also recording as many parameters as you can such as time of day, day of week, season, water level in nearby tank/wet well, and which pumps are running. The goal is to find the right coefficients and equations in your ML model, so that given the values of the input parameters, the ML solution will accurately estimate the flow in P-10. This is called “training” your machine learning.
If the machine learning was set up correctly, and the inputs were within the range of the training data, flow in pipe P-10 can be determined by ML. If that’s the case, why bother with a physics-based hydraulic model like OpenFlow WaterGEMS or SewerGEMS? For one reason, what if you want to know the flow in P-9 or P-11? If the ML model wasn’t trained with data from those pipes and it’s unlikely that you would have enough data (even with “big data”) to know the flow in those pipes or at more than a handful of pipes in your system. Meanwhile, the hydraulic model of your system could calculate the flow in every pipe.
But that is not the only issue. Suppose you added a new pump that delivers flow through P-10. The ML model wasn’t trained with that pump in the training data set. So, it doesn’t know how to deal with the pump, which is not one of its inputs. It’s back to the training data and collecting several days or months of data with the new pump and retrain the ML model because it needs to learn about how the new pump affects flow in P-10. During that time of course, your ML model won’t be very accurate. Of course, by the time the ML model is trained for the new pump, something else may have changed in the system (e.g., a new pressure-reducing valve setting that requires more training).
Now suppose there is a fire downstream of P-10. If the training data did not contain a day with a fire, the ML model could not give a reliable answer to the flow through P-10 during the fire. With a hydraulic model, it would be easy to add the fire flow and see how the P-10 responds. With ML, you would need to wait for the second fire to be able determine the flow.
Responding to anomalies
But the place where hydraulic models like WaterGEMS, SewerGEMS or WaterSight really shine in operations, engineering, and design is forecasting. Engineers and operators who understand modeling can simulate events that haven’t yet occurred or facilities that have not yet been constructed to evaluate their performance. Build on the backs of our hydraulic giants like Bernoulli, Manning, Darcy, Weisbach, Colebrook and White among others, this has long been the strong point of hydraulic models and has not been superseded.
With modern “big data”, it is not terribly difficult to come up with the millions of data points needed to train an ML model. However, most days are normal and uninteresting, so the vast majority of the data are essentially duplicates of other typical days. What you really want to know when you turn to a model, is, “What is happening at those anomalous times?” There are very few training data points available for these times because, by definition, they are anomalous. Meanwhile, hydraulic models don’t care if the scenario being calculated is typical or anomalous. While the ML model is struggling to respond to changes, the hydraulic model already knows what to do.
ML models are best for situations where a rational, physics-based model isn’t available. For example, with a data set containing information on a pipe’s material, age, soil corrosiveness, pressure, likelihood of transients, and break history of similar pipes, an ML model can make reasonable forecasts of future pipe breaks. The forecasts won’t be perfect, but they can inform a pipe replacement program.
Complimentary technologies
Are there places where hydraulic models and machine learning can play well together? One example could be using the hydraulic model as the training data for the ML model. Offline, the well-calibrated hydraulic model can be run thousands or even millions of times to generate the training data for the ML solution. This data set can contain fires, pipe breaks, seasonality, special events and, in general, a more consistent set of training data without the inaccuracies, broken sensors/transmitters and other problems that plague SCADA (Supervisory Control and Data Acquisition) and IoT (Internet of Things) data.
While an ML model can generally run faster than a physics bases model, computers these days are sufficiently powerful that run times are fast enough that the extra work in developing an ML hydraulic model generally isn’t justifiable. The exception might be a case where someone wants to run an optimization model requiring many thousands of model runs. The work in setting up an ML model may be justified.
So, while ML is powerful and much of the hype around it is valid, ML is not a cure-all. In many cases, you just can’t beat solutions that go back to the first principles of physics and the laws of nature. Models based on first principles from physics such as Q = AV or F = ma can be more effective in describing what’s happening.