For successful implementation of Data Analytics, the right methods (Use case-specific) needs to be followed right from Data ingestion, preparation, pre-processing, Model construction, validation and Model tuning & audit.
Data Preparation:
- Extract & Examine data structure: Characterization of Process Data, Analysis of Operating regions, Identify changes in Operating conditions, Non-Gaussianity, Linear/Non-linear relationships, Time-series correlations, etc.
- Samples and Variable Selection: - What kind of model/task is at hand – Monitoring vs. Quality
Data Pre-processing: Improve Quality of data, Data Transformation, Data Scaling & Normalization
- Inconsistency in data, Outliers & gross errors, Missing Data (Deletion of samples, missing value estimation, Bayesian inference), Feature Scale difference among variables – Normalization/ Standardization, Gradient-descent algorithms (linear regression, logistic regression, Neural network) or distance-based
Model Selection, Training & Performance evaluation: Once training data set is ready
- Data Model Construction depending on data characteristics (complexity) – Single model/ multiple model structure
- Apply ML algorithm (Linear Regression, Logistic Regression, Decision trees, SVM, ANN, etc.)
- Performance of model – Model validation methods (cross validation, model stability analysis, model robust analysis, parameter sensitivity analysis)