Subscribe to Email Updates

Recent Stories

Toward Seamless GIS-ADMS Integration in Electrical Utilities | Cyient Blog
Toward Seamless GIS-ADMS Integration in Electrical Utilities | Cyient Blog Cyient
Toward Seamless GIS-ADMS Integration in Electrical Utilities | Cyient Blog
From Bandwidth to Bliss: Future of Fiber-Based Communications Technology
From Bandwidth to Bliss: Future of Fiber-Based Communications Technology Cyient
From Bandwidth to Bliss: Future of Fiber-Based Communications Technology
IT Culture: Embracing Enterprise Vision for Digital Transformation
IT Culture: Embracing Enterprise Vision for Digital Transformation Cyient
IT Culture: Embracing Enterprise Vision for Digital Transformation
A 2024 perspective of power distribution ft. AI and data
A 2024 perspective of power distribution ft. AI and data Cyient
A 2024 perspective of power distribution ft. AI and data
Technology Priorities for a CTO that Will Fuel Innovation & Collaboration in 2024
Technology Priorities for a CTO that Will Fuel Innovation & Collaboration in 2024 Cyient
Technology Priorities for a CTO that Will Fuel Innovation & Collaboration in 2024
Pankaj Sahu Pankaj Sahu Written by Pankaj Sahu, Director - Technology Partner
on 07 Aug 2023

Water conservation is imperative today, given that freshwater sources are finite and shrinking day by day. About a billion people lack access to safe drinking water, and two million people die annually as a result of poor water quality, poor sanitation, and unhygienic conditions, according to a WHO report. Although 71% of the Earth's surface is covered by water, 95% of it is salt water. 

Rapid industrialization and economic development over the decades have led to excessive use of chemicals, fertilizers, pesticides, and this harmful industrial waste, untreated sewage from households, various solid wastes, electronic waste, and other pollutants are continuously mixing with freshwater bodies such as rivers, lakes, and water reservoirs. This onslaught rapidly degrades the quality of water and makes it unfit for drinking and any other essential applications such as agricultural and industrial use.

It is crucial to test, analyze, and control the water quality. While numerous techniques for assessing water quality have been in use for decades, most use a predominantly non-statistical, laboratory-based, single-dimension approach to testing. Water quality testing is largely done on the basis of three parameters: physical, chemical, and biological
Physical water quality parameters include total dissolved solids (TDS), turbidity, temperature, color, electrical conductivity, salinity, taste, odor, etc.
Chemical water quality parameters include dissolved oxygen, pH, hardness, chlorine, acidity, alkalinity, etc. 
Biological water quality parameters include bacteria load, algae, nutrients, viruses, etc.

Water quality assessment vs. prediction

Of all methods available, the Water Quality Index (WQI) is the most widely used method for measuring water quality. It derives the value from the weighted average of various measuring parameters mentioned above.  

The WQI prediction method, however, uses artificial intelligence (AI) and machine learning (ML) algorithms to train various ML models using a variety of data such as historical raw river data, remote sensing data, sensor data, various water quality parameters, seasonal river water data, meteorological data, etc. The trained models are evaluated for their accuracy with test data in different scenarios before being deployed for actual prediction of water quality in the future based on current conditions. 

Why prediction of river water quality is critical

As mentioned, earlier, population explosion has meant wide use of chemicals such as fertilizers and pesticides, which along with vast amounts of industrial and domestic waste, get dumped into the rivers. This has serious detrimental effects on river water quality. Once the river water gets degraded and unfit for use, it directly affects human health and all living things which depend on the river water. Further, cleaning degraded freshwater bodies is a humungous and time-consuming task, causing huge inconvenience and deprivation to those dependent on this water.

Predicting and forecasting water quality is thus critical in monitoring the water contamination rate and factor(s) responsible for river water contamination. Prediction of water quality gives enough time for and provides valuable insights into implementing required preventive measures. This can avert further contamination and make the cleaning process more effective, efficient, eco-friendly, and less time-consuming.  
The traditional approach to measuring and predicting water quality is long drawn and often inaccurate, as it depends on an individual’s or water expert’s knowledge and expertise to analyze the huge volume of historical data collected to predict the water quality. Due to the limitation of human capacity, it is difficult to predict water quality accurately and quickly.

Today, technologies such as IoT, big data, cloud, etc., drive the collection and storage of large volume data with ultrafast processing speed, while artificial intelligence and machine learning are providing water experts and data scientists with efficient methods for analyzing and predicting water quality with lightning speed.

Methods of water quality prediction

Rivers are the largest source of freshwater supply. However, they have become a significant repository for sewage discharges from domestic and industrial activities and thus are highly polluted/prone to pollution. Therefore, water treatment, water quality monitoring, and water quality control are necessary to ensure clean water at an affordable cost. Hence systematic analysis of data along with water quality prediction is the need of the hour. 
Methods such as multivariate statistical techniques are used to determine the correlation between different water quality parameters, whereas machine learning models such as regression and classification algorithms, and deep learning models such as ANN (artificial neural networks), are used for predicting the water quality with higher accuracy. 
Some of the popular ML and deep learning models used for prediction of water quality include: 

• Machine Learning Models
 • Linear regression
 • Logistic regression
 • Decision tree
 • Random forest algorithm
 • SVM algorithm
 • Naive Bayes algorithm
 • KNN algorithm
 • K-means
 • Dimensionality reduction algorithms
 • Gradient boosting algorithm
• Deep Learning Models 
 • Convolutional Neural Networks (CNNs)
 • Recurrent Neural Networks (RNNs)
 • Long Short-Term Memory Networks (LSTMs)
 • Generative Adversarial Networks (GANs)
 • Radial Basis Function Networks (RBFNs)

Depending on the type, volume, and quality of data, water quality experts and data scientists decide which model or combinations of models will best suit the purpose.

Overview of an ML Model for Water Quality Prediction

The diagram below depicts high-level architecture for applying a machine learning model to water quality prediction.  

ML_Framework_for_River_Water_Quality_Prediction_V0.1 (1)

Figure 1: Machine learning framework for water quality prediction

The architecture consists of various elements such as data collection, data exploration, data processing, training and testing the model, model deployment, and model monitoring. 

Data collection 
All water parameters which determine the water quality index are collected from sample collection points. In addition, various sensors fitted at strategic locations help gather a large volume of data. A minimum of three years of sample data is ideal for training the model. Seasonal and weather factors also need to be considered while collecting the sample. Data can be collected and represented in time series form, which can be used for prediction and forecasting purposes. 

Exploratory data analysis (EDA)
EDA is a method used to analyze the raw water data set to discover trends, patterns, and any anomalies in the data with the help of statistical and graphical tools. It helps in identifying the key features (water parameters) out of all water data parameters, which have a strong correlation with target output results.

Data preprocessing (DP)
In data preprocessing, the raw water dataset is cleansed, decoded (transformed), and normalized for use in machine learning algorithms for model training and testing purposes. DP helps in feeding quality data into the ML models and improves the efficiency of the model training process overall.  

Model training
The model for predicting the water quality index is trained on the key features of the water data set, which are finalized, cleansed, and transformed in the EDA and DP steps. These selected key features are known as “input variables,” and the feature “water quality index” which actually determines the quality of sample water, is called “output variable or target variable.” The supervised learning algorithm uses the following two types of models: 

• Regression models: These are used for predicting water quality index value for future dates, which is a “continuous value.” Based on predicted value, it determines the quality of water based on its value as per scientific guidelines.

• Classification models: These are used for predicting the water quality index as a “decision boundary or discrete,” such as “the output water quality index will be potable, palatable, contaminated, infected, etc.” 

Model evaluation (testing) 
The performance of each model in predicting the water quality is evaluated to find the best model to be deployed. Regression and classification models have different evaluation methods, as the output of the regression model is a continuous value, while the classification model yields a discrete value. 

a. Regression model metrics
Three typical error metrics are used for assessing the performance of the regression ML model:
    • Mean Square Error (MSE)
    • Root Mean Square Error (RMSE)
    • Mean Absolute Error
b. Classification model metrics
The popular metric for assessing the classification model performance is Confusion Matrix. Below are four key calculations used in the Confusion Matrix to assess the model performance. 

• Precision is defined as the ratio of True Positive (TP) to total number predicated results, i.e.  Precision = TP/(TP+FP)

• Recall is defined as the ratio of True Positive to the total number of actual positive cases, i.e., Recall = TP/(TP+FN)

• Accuracy = (TP+TN)/(TP+TN+FP+FN)

• F1 = 2 * Precision*Recall / (Precision + Recall)

Model deployment 
The best-performing water quality model goes through many rounds of testing and is optimized and tuned thoroughly before being ready for deployment. Models need a deployment environment with all the necessary resources and data to function optimally. Below are the most widely used deployment methods.

Webservice deployment: Provides an option to integrate model output as Web service with Web, mobile, and desktop applications. 

Batch deployment: Used where real-time prediction is not a priority. This is used to perform complex calculations and predictions and can handle a high volume of data. 
Many cloud hyperscalers, such as AWS, Azure, and Google Cloud, provide readily available services to deploy and integrate the model with various applications. This is the best way to deploy the model if you are already using a cloud service. 

Model monitoring 
Constant monitoring of the model is required to evaluate its performance and accuracy over time. This is because the addition of new data over time may degrade model performance due to various reasons.

Toward sustainable solutions

Water challenges are increasingly impacting every region around the world, facing the effects of rapid urbanization, climate change, aging infrastructure, and resources. At the same time, there is a significant increase in innovative new technologies that help overcome various water-related challenges. Governments, water utilities, and IT and engineering service integrators should work together to build a suitable solution for problems related to water quality monitoring, distribution, leakage, water scarcity, etc.
At Cyient, we are at the forefront of applying technology imaginatively to solve problems that matter. We have a decade of engineering and IT expertise in dealing with water-related challenges faced by our customers. 
We assist customers around the world by developing innovative solutions where better water management strategies, planning, and design are needed and by establishing guidelines and a roadmap to transition toward more sustainable solutions. 


About the author

Pankaj Sahu spearheads the Enterprise Asset Management (EAM) practice for Electric, Gas, and Water Utility technology solutions at Cyient. With a wealth of experience spanning over 20 years, he has an established track record in implementing and consulting EAM and APM solutions for clients in the Utility, Transportation, and Energy sectors worldwide.

Let Us Know What You Thought about this Post.

Put your Comment Below.

You may also like:

Talk to Us

Find out more about how you can maximize impact through our services and solutions.*

*Suppliers, job seekers, or alumni, please use the appropriate form.