How can a Random Forest Help my innovation?
Random Forests are one of many statistical regression tools leveraged by Mission Field in the analytical process of projecting your innovations’ success potential. It is one of four key machine learning models applied within our advanced metrics package on transactional tests of innovations, and we believe it is a key anchor for leveraging these test and learn experiences into driving future launch successes. To understand why we are using Random Forests, it helps to know what one is and how it works… and we want to give credit to IBM for having the most comprehensive description of Random Forests ( https://www.ibm.com/cloud/learn/random-forest). We have adapted and lightly edited their online article for brevity, business-management applicability, and clarity on how it applies to Mission Field’s transactional testing & our Burst model:
What is A random forest?
A Random Forest is a machine learning algorithm… which combines the output of multiple decision trees to reach a single result. Its ease of use and flexibility have fueled its adoption, as it handles both classification and regression problems.
Decision Trees start with a basic question, such as, “Should I surf?” From there, you can ask a series of questions to determine an answer, such as, “Is it a long period swell?” or “Is the wind blowing offshore?” or “Can I prioritize it over other activities I’d also like to do?” These questions and countless more will make up the decision nodes in the tree, acting as a means to split the data… allowing us to think of it all as something out of the “Choose your own Adventure” books - which give you an eventual business decision outcome linked through a chain of multiple nodes of information provided. When multiple decision trees (and you should imagine this as millions and millions of tree combinations) form an ensemble in the random forest algorithm, they are able to predict very accurate results.
How it works
Random forest algorithms have three main hyperparameters,…. These include node size, the number of trees, and the number of features sampled. From there, the random forest classifier can be used to solve for regression or classification problems.
IBM’s article more clearly defines the nuances of how this all comes together, but know that in the end, the Decision Trees can be averaged and Cross-Validated to prove out a model of statistical likelihood and possibilities. In business terms, this means that all of the factors of your innovation’s comparables to its competitive set ($ sales, unit sales, price/value, growth curve, rank in category, etc., etc.) which comes from real in market data helps to create a model of probabilities of how well it will perform at launch. This is where the power of Random Forests comes to life!
Benefits of random forest analysis
There are a number of key advantages that the random forest algorithm presents when used for classification or regression problems. Some of them include:
Key Benefits
Reduced risk of overfitting and forcing a false positive: Standard Decision tree analysis runs the risk of overfitting as it tends to tightly fit all the samples within the training data (in other words, a human devised tree model is only as good as the parameters set up to analyze it). However, when there’s a robust number of decision trees in a random forest, the classifier won’t overfit the model since the averaging of uncorrelated trees lowers the overall variance and prediction error.
Provides flexibility: Since random forest can handle both regression and classification tasks with a high degree of accuracy, it is a popular method among data scientists. Feature bagging also makes the random forest classifier an effective tool for estimating missing values as it maintains accuracy when a portion of the data is missing.
Easy to determine feature importance: Random forest makes it easy to evaluate variable importance, or contribution, to the model. There are a few ways to evaluate feature importance. Gini importance and mean decrease in impurity (MDI) are usually used to measure how much the model’s accuracy decreases when a given variable is excluded. However, permutation importance, also known as mean decrease accuracy (MDA), is another important measure. MDA identifies the average decrease in accuracy by randomly permutating the feature values in oob samples.
Random forest applications & Results
The Random Forest algorithm has been applied across a number of industries - including finance, health care, and e-commerce - allowing these industries to make better business decisions. We believe in its power to help predict the future success of the launch of a new innovation once it goes through our transactional testing model, primarily because we can feed our Machine Learning tools millions of points of data based on current & comparative sales behavior, historical sales behavior, and the past experience of over 78 tests of prior innovations tested in-market. We believe that this advanced analytical tool, applied within our testing model, should eventually help to raise the success rates of new innovations and help our clients make better go/no-go business decisions for more profitable launches. Feel free to contact us to learn more about how we can help you achieve better launch success using this analytical tool in our Burst testing model.