February 19, 2017

Getting over the fear of using big data and predictive modeling

In the last year, I have noticed some fear of using data models. Referencing the statement from George E. P. Box that “Essentially, all models are wrong, but some are useful”, we should take a directional approach when starting to data model. Without knowing it, most of us use predictive modeling daily by looking at our car fuel range predictor.

Taking a step back, we understand data models on a holistic level. When you first fill up your car and it says you have a range of 415 miles, we accept that’s not exactly precise. Yet when we look at our dashboard mid tank and see 271 miles left we learn that based on our current driving behavior this is a more accurate reading. As your driving gets closer to zero, your fuel range prediction gets more precise. Unless there is a complete standstill of traffic, our fuel range prediction is usually on par.

The same can be applied to data modeling. Initially, we get a sense of what direction the data is taking. In mid-point, we can refine variables and work to get a much stronger statistical significance. There is nothing to fear with data modeling. You have to start somewhere that is directional and realize it’s an iterative process.