What is Chemometrics?
Published on : Friday 29-09-2023
If correctly used, chemometrics works towards saving time, money, manpower and machine time, says Dr Shobha Nagappayya.
Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is fundamentally interdisciplinary. It uses methods frequently employed in core data-analytic disciplines such as multivariate statistics, applied mathematics, and computer science, in order to address problems in chemistry, biochemistry, medicine, biology and chemical engineering.
In other words, chemometrics help in making sense of the huge amount of uneven, random or even systematic data and interprets to arrive at a rational conclusion.
Chemometrics is applied to solve both descriptive and predictive problems in experimental natural sciences, especially in chemistry. In descriptive applications, properties such as concentrations of chemical systems are modelled with the intent of learning the underlying relationships and structure of the system (i.e., model understanding and identification). In predictive applications, properties of chemical systems are modeled with the intent of predicting new properties or behaviour of interest. In both cases, the datasets can be small but are often large and complex, involving huge number of variables, cases or observations.
Why is chemometrics important?
In chemical, pharmaceutical and biopharmaceutical industry, the main objective of the scientists is to understand the process that they are working on. In the API industry, the drive is to know the pathway, mechanism and kinetics of the reaction. This information is imperative to optimise and scale up the conditions with maximum accuracy and sound clarity of the chemical process. To achieve this, chromatographic and spectroscopic techniques are normally used in the industries. These techniques produce enormous analytical data sets which need to be systematically studied and interpreted. Chemometric studies are performed on these data sets which help in controlling the process, avoiding batch failures and preventing any adverse outcomes. When all the processes are well understood within a good time frame, it becomes easier for the companies for a faster launch of the anticipated product into the market.
What are the challenges?
Sampling: During any reaction or chemical process, sampling is a mandatory requirement. To monitor reaction, to get information about start and end-point of any reaction, to know the consumption of reactants and formation of products, to know the duration of the reaction and to control the reaction, samples are withdrawn from the reaction. These samples are collected, further analyses of these samples are done using appropriate analytical tools to monitor and understand the process. Sampling usually poses certain issues and is always challenging. For example, sampling of air and moisture sensitive reaction mixture is often challenging. Toxic and fuming chemicals are extremely difficult to sample.
Reaction Monitoring: Analysis of the samples would give the information of various parameters particularly the concentration of the components involved. This helps in further understanding the process and controls the parameters to have desired results. In all this in-situ or online measurements have an immediate impact in process optimisation. There is no time lag in results and therefore right decisions on the process parameters can be made and implemented faster.
How does chemometrics work?
Multivariate analysis of spectroscopic data plays an important role in gaining insights into chemical and biochemical processes. Spectroscopic data analysis involves the following steps:
1. Collection of spectra
2. Modeling (creating a chemometric model), and
3. Prediction (applying a chemometric model).
Chemometric tasks can be achieved with machine learning methods to group samples via clustering, to classify samples using a classification method or to derive the concentration of a substance with a regression model. It is a sequence of procedures including experimental design, data processing, data learning and data interpretation. To conduct all these steps is not an easy task. This requires considerable amount of data from natural sciences like chemistry, biology and physics. To interpret the data, intensive support from disciplines of machine learning, statistics and data management is required.
Multivariate calibration techniques such as partial-least squares regression, or principal component regression (and near countless other methods) are used to construct a mathematical model that relates the multivariate response (spectrum) to the concentration of the analyte of interest, and such a model can be used to efficiently predict the concentrations of new samples. In a practical sense, reaction is carried out. Samples of the reaction mixture at different time intervals are collected. Samples are analysed using different analytical tools to get quantitative data. These results are used as reference data. Calibration plots are built using reference data on an appropriate modelling platform using mathematical tools. Online analytical tools are used to monitor the reaction and track various concentrations of the reaction components. Calibration plots prepared using reference data are used for predictive applications and controlling the process. The details are described in the figure below.
How spectroscopic methods are used to predict unknown concentrations?
i. Samples are collected at different times (A,B,C,D,E,F)from the same batch/reaction/process.(Data Collection)
ii. Offline measurements of A,B,C,D using appropriate analytical methods like HPLC, GC, UV, Gravimetric analysis etc.(not E,F)
E.g. Measurements will be as follows.
A=10 units B=20 units C=30 units D=40 units
Units =%, g/ml, g, ml, l, mol/l etc.
iii. Take Raman/IR/NIR spectra of A,B,C,D,E,F as per the samples and their response towards spectroscopic method. Figure below shows Raman spectra of samples.
iv. The correlation between changes in spectral data and corresponding offline measurements of A,B,C and D,were utilised to develop a prediction model. This is called calibration.
v. Then, that model is applied on the Spectra of E,F.
vi. Model will predict the values for E and F and other samples. This is how the concentrations are predicted.
Proper design of experiments (DOE) is essential for successful measurements using any of the analytical tools because well-designed studies and measurements are the prerequisites to have clarity on all objectives under investigation during a process. The measurement protocol is a clear statement about the aim of an investigation, the procedures of sample preparation, the instruments to be used, the strategies of sampling and other aspects. Because the nature and quality of data will decide how well the model will eventually be able to predict the course of reactions.
One of the common issues is poor data. Chemometrics can only reveal the trends that are actually hidden within the data. If the experiment is not well designed, measurements are not replicated, instrument is not turned properly, chemometrics is bound to arrive at a wrong conclusion. Another problem is the misinterpretation of output. Several user-friendly software are available for pattern recognition and data analysis in commercial instrumentation. This does not, however, mean that any particular package is an automatic choice for solving any particular problem.
Chemometrics aims to increase the efficiency of the analytical process. If correctly used, it works towards saving time, money, manpower and machine time during any chemical, physical or biological process.
Dr Shobha Nagappayya is DGM – Application Specialist (PAS) Marketing – Optical Analysis products at Endress+Hauser. A graduate in Chemistry with post graduation in Organic Chemistry from Delhi University, she did her Ph.D. (Sci.) – Chemistry from the Institute of Chemical Technology (ICT), Mumbai.
Dr Shobha was a Post Doctoral Fellow at Dow Chemical International Pvt Ltd, Pune during 2010-11 and from May 2012-Decemebr 2018, a Senior Research Scientist in Reliance Industries, before joining as Application Scientist for IR and Raman Spectroscopy in Mettler Toledo, India in Jannuary 2019. From August 2022, Dr Shobha is an Application Scientist for Raman Spectroscopy in Endress and Hauser, India.
Publications:
1. Extraction of Aleuritic Acid from Seedlac and Purification by Reactive Adsorption on Functionalized Polymers; Ind. Eng. Chem. Res., 2010, 49 (14), pp 6547–6553.
2. Designing of Ligands for Extraction of Cs+ using Molecular Modelling; Desalination and Water Treatment, 38, 1-3, 2012
Patents:
1. An integrated process for carboxylation and oxidation of alkyl substituted aromatic hydrocarbons
Pradip Munshi, Shobha Nagappayya, Raksh Vir Jasra, WO2015004683 A3
2. Hydrophilisation of polymers for reinforced materials
Pradip Munshi, Shobha Nagappayya, Shashikant Kamble, Raksh Vir Jasra, 1265/MUM/20