Home / Software Posts / The R Words – Reliability, Resilience, Redundancy, R…

The R Words – Reliability, Resilience, Redundancy, R…

Tom Walski Profile Image

Tom Walski, Ph.D, P.E, Senior Product Manager, Water

two men working on a fire hydrant in a neighborhood
two men working on a fire hydrant in a neighborhood


We all know the meanings of the R words listed in the title of this blog (at least we think we do). In everyday speech, however, we use them somewhat carelessly and sometimes interchangeably. I’m guilty of this as well.

However, when we work with various water systems, we need to use precise language because careless talk can lead to less-than-ideal solutions to our problems. It bothers me to hear these terms used imprecisely.

One of the papers that tried to best nail down these words for water resources systems was by Hashimoto, Stedinger, and Loucks (1982). “These measures describe how likely a system is to fail (reliability), how quickly it recovers from failure (resiliency), and how severe the consequences of failure may be (vulnerability).”

While the above paper is considered the definitive work in this area by some, other terms should be added to the list, and the three terms―reliability, resilience, and vulnerability―could use clarification.

Here are the Walski definitions of these terms, at least as they apply to water. In industry papers, you’ll find many definitions and many metrics used to calculate them.

  • Reliability is the probability a component or system is operational or not. For individual components, it is usually a value of 0 (not working) or 1 (working). In the strict sense, once a component fails, it is replaced. However, water systems are made up of repairable components. A somewhat more useful term is availability, which is generally meant as the fraction of time that a system is running. Availability is especially important for systems with repairable components.
  • Resilience is the measure of the ability of a system to operate during the failure of a component or to recover from a failure. To the extent that there is a backup system, or a component can be repaired quickly, the system is resilient. Some researchers have proposed a “resilience index” to measure resilience; however, if you look at the math behind it, the index is really a measure of “excess capacity.” If a pipe needs to supply 2,000 gpm, but you select the size to provide 3,000 gpm, the resilience index would look good. However, if that pipe fails, it can’t provide 2,000 or 3,000 gpm but 0 gpm. The system would have poor resilience, even if it had plenty of excess capacity before the pipe failure. Excess capacity is a good metric for resilience if your failure is only caused by actual flow exceeding design flow, but there are many other types of failure.
  • Redundancy and robustness are two related but different terms. Redundancy refers to backup components or flow paths that can be operated if a parallel component fails. Robustness refers to the ability of the component not to fail but remain in service for very long time periods. For example, if you need to move water from point A to point B, in a redundant system, you will supply that water through two or more parallel routes so that if one fails, the other could provide the flow. With a robust system, you would specify a single, very thick-walled pipe with a sophisticated monitoring system. Think of robustness in terms of fail-safe, and redundancy as safe-fail.
  • Criticality and vulnerability look at the impact of failures on users of the system such as the number of customers who lost water or wastewater service. The nature of the users can also be accounted for. A hospital can be considered a more vulnerable/critical water user than a golf course irrigation system, for example (although some golfers may disagree). WaterGEMS’ criticality analysis is great for correctly identifying critical segments in distribution systems so that the system can be more reliable/available.
  • Reachability describes whether a point or a segment in a network is connected to a source (water distribution) or a discharge point (storm/sanitary sewers). Reachability can change due to the failure of other pipes/segments. It does not explicitly consider whether there is sufficient capacity, as long as elements are connected. There are more rigorous definitions in communications networks, where there is a science of reachability analysis.
  • Risk is used to measure the expected value of failure of a system or component. It is usually calculated as the product of the likelihood of failure times the consequence of failure. The likelihood of failure is difficult to quantify, and the consequence of failure is almost impossible to completely measure because so many consequences of a failure are not quantifiable.

Quantification of these reliability-related terms depends heavily on the property measured. For example, does a failure of a point in the distribution system occur when the pressure drops to zero, when the pressure drops to below 20 psi (or some comparable standard), or when the available fire flow drops below the needed fire flow? Metrics are important.

There is a great deal of literature in the general area of reliability, specifically in mechanical engineering, electrical engineering, and computer science. There have been several papers on the topic in water, but there doesn’t seem to be a consensus on the terminology. In the 1980s, an ASCE task committee on Risk and Reliability Analysis in Water Distribution Systems (ASCE, 1989) developed a book on reliability but didn’t focus on terminology.

Water journal papers on reliability tend to focus on system hydraulics. However, there are many other aspects impacting reliability. I wrote about these in a paper on the practical aspects of reliability (Walski, 1983). Some of the considerations include standby power, emergency interconnections, spare parts inventory, good communications with field crews, and adequate training for operation and maintenance personnel. These aren’t exciting topics for researchers. (Excerpts from that paper can also be found in the chapter on reliability in Mays (2000).)

Imprecise terminology is not usually a fatal problem, but discussions would be more productive if we agreed on things like whether excess capacity is the same thing as resilience.

There may have been some publication on this terminology that I missed and is widely used. I couldn’t find any. So, until we can come up with something better, I’d like to see people use the terminology I’ve described above. While I like to think my writing is perfect, I know it isn’t. If you want to comment on anything I’ve written above, send me an email at tom.walski@bentley.com.


ASCE, 1989, Reliability Analysis of Water Distribution Systems, Ed. Larry Mays, ASCE.

Hashimoto, T., Stedinger, J., and Loucks, P., 1982, “Reliability, resiliency, and vulnerability criteria for water resource system performance evaluation,” Water Resources Research, Vol. 18, No. 1. P. 114-20.

Mays, L. (Ed.), 2000, “Chapter 18. Reliability Analysis for Design,” Water Distribution Systems Handbook, McGraw Hill, New York.

Walski, T., 1993, “Practical Aspects of Providing Reliability in Water Distribution Systems,” Reliability Engineering and System Safety, Vol. 42, No. 1, p. 13.


If you want to contact me (Tom), you can email tom.walski@bentley.com.

Relevant Tags

Pretty much every water and wastewater system these days of any size has a SCADA (Supervisory Control and Data Acquisition) ...

You get a pressure reading of 62 psi. What does that mean? Is it: 62.0000 psi 62 +/- 5 psi ...

When I started working in water distribution system modeling, building a model, and getting it to run was so difficult ...