Getting the Models Right is Key to Successful Analytics
Getting the Models Right is Key to Successful Analytics
ISE Magazine April 2021 Volume: 53 Number: 4
By William E. Hammer Jr.
Getting the models right starts with a comprehensive understanding of the process. A successful application of analytics requires a study of the existing process model and related systems. In most cases, the existing process model needs to be updated to consider additional variables and relationships recognized in big data sources. The validation of the causal relationship of a potential new variable and data is best accomplished as the ﬁt in the process model is guided by the process owners and operators.
Creating or updating the model to accurately reﬂect the process is the critical step in applying analytics. An accurate data model then can be developed and used to describe the results of the process, predict future outcomes and prescribe actions to produce desired outcomes. Generally, the model of a process and the subsequent data model necessarily becomes more comprehensive and complex as it is applied to predict an outcome, and even more so when used to determine an action to cause a desired outcome.
This can be illustrated by considering the sales data for a marketing process. For the data model to describe (report or present) sales results over periods of time simply requires a model or template to aggregate the sales in total and meaningful distribution categories. To predict future sales, both the process and data model need to be expanded to integrate additional information such as demand history and industry forecasts, competitor strategy and customer behaviour and desired features. To prescribe actions to cause an increase in sales requires a process model with inclusion of potentially beneﬁcial changes to the process and even more relationships and information such as current product development, projected competitor performance and scenario processing. Then the data model can incorporate related data to prescribe an action.
Descriptive data models are generally well-established by business use such as standard ﬁnancial reporting formats and quality charts. Predictive data models are derived from the analysis of data from potentially related variables identiﬁed, validated and included in the process model. The relationship of some potentially predictive variables to the process may not be obvious or not previously recognized by the process experts. These candidate variables may be external to the process ﬂow and need to be evaluated for logical inclusion in the model. Predictive models provide the greatest beneﬁt from the application analytics and can generate a competitive advantage. Currently, few companies are committed to the development of prescriptive models due to the challenge involved in making them reliable.
The focus of this article is getting the predictive models right to answer business questions, make better decisions and improve results.
Getting the models right
Appropriate advice is provided in a blog, “7 Fundamental Steps to Complete a Data Analytics Project,” by Alivia Smith. She wrote: “Understanding the business or activity that your data project is part of is key to ensuring its success and the ﬁrst phase of any sound data analytics project … before you even think about the data, go out and talk to the people in your organization whose processes or whose business you aim to improve with data.” (Data Iku, blog.dataiku.com, July 2019).
Getting the data model right for predicting outcomes is the goal of most organizations using analytics. This requires establishing the purpose of the data model and understanding the business process on which it is based. Creating the appropriate process model and data model is both art and science. Identifying the operative relationships of variables in the process and relevant data can be accomplished by observation and guidance from the process owners and operators. Development of the process model is iterative and involves testing and reﬁnement. The same is true for the data model.
The process model documents the purpose of the business process and all that is known about it; purposes, map-ping of work, event and data ﬂow, operational variables, relationship of variables, strategies, people involved and supporting systems and database(s). New and potentially useful variables and related data from big data sources should be considered for logical and causal relationship, and included in the process model and database if they ﬁt into the process mechanism. Scanning big data (transactions, social media, surveys, etc.) for new variables and aspects of variables already included in the process model is a critical step in getting the process model right.
Once the process model and supporting database are established, the work shifts to the construction of an inclusive data model. The data model is the basis for the application of appropriate statistical and visual tools, and algorithms, to generate an accurate and reliable predictive model. There is an abundance of excellent tools and methodologies available for this task. The book, Advanced Analytics Methodologies: Driving Business Value with Analytics, by Michele Chambers and Thomas Dinsmore, provides a clear and useful explanation for these techniques.
Training and testing with carefully selected data sets are critical to validating the data model and generating accurate and reliable predictions. The testing and reﬁnement cycle produces conﬁdence in predictive models, as recently demonstrated by the COVID-19 models.
While many excellent statistical and visual tools are available for facilitating the creation of analytics models, there is a tendency to rely on them too much and not focus on understanding the business process and carefully identifying the inherent predictive variables. The guidance and insight of process owners and operators are key to the development of an appropriate process model, data model and predictive model.
It is important to note that an unrelated variable and data can appear to be predictive when the positive correlation is coincidental. A famous example of this known as the Mierscheid Law, is the correlation of the Social Democratic Party of Germany’s share of the popular vote with the size of crude steel production in Western Germany.
The business payoff for getting the models right clearly justiﬁes the signiﬁcant effort needed. Predictions from sub-optimal models can produce counterproductive results or less beneﬁcial results. Right models produce optimal predictions and better business decisions.
Getting the models right requires:
- Clearly deﬁning the purpose of the model, e.g., the question(s) to be answered.
- Understanding the process and identifying the intrinsic predictive variables.
- Creating or updating the process model ﬁrst.
- Careful evaluation of data for potential predictive variables to conﬁrm causal relationship.
- Discarding potential predictive variables with high correlation value that a causal relationship cannot be established.
- Guidance from the process experts (process owners and operators) during all stages in development of models.
- Avoiding overreliance on modelling tools.
Keeping the models right
Most beneﬁcial predictive models need enhancement as the business environment changes, process changes occur and when additional and previously inaccessible data become available. The disruptive effect of the COVID-19 pandemic on retail business in particular, and the vast amount of additional customer data made available due to the massive increase in online transactions, warrant review and update of process, data and predictive models as a regular practice.
Big data is even bigger now and provides more potential data, in terms of variables, relationships and volume.