I was talking shop the other day with a colleague who also runs a big data analytics firm. When I spoke with him, one of the things he mentioned briefly was econometric modeling vs. machine learning. I don’t know if it’s applicable or substantive enough for our potential audience but it may.
Essentially he said econometrics is great but not of much interest in his world because the focus is on WHY things happen, it’s “explanatory” in nature. His attention is focused more on machine learning because it is “predictive” in nature. He and his customers aren’t too concerned about the “why”; they are more interested in knowing where things are going next and, if given enough time, to figure out how to address that before it happens.
Frankly, this is not a new debate. The difference between computational statistics and statistical computing is just one more analogy to the debate above. Prior to the current Big Data explosion, statistics and computer science behaved in well defined silos at both Universities and organizations. Now there is a convergence between the two – statistics and computer science – to get what is needed to explain why the customer is acting in a particular way and forecast what they will want next.
Enter the twin paradigms of econometric modeling and machine learning. At first they seem to have similarities as well as differences. Some techniques like regression modeling are taught in both courses. Yet they are different by definition- Econometric models are statistical models used in econometrics. An econometric model specifies the statistical relationship that is believed to be held between the various economic quantities pertaining to a particular economic phenomenon under study.
On the other hand- Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. So that makes a clear distinction right? If it learns on its own from data it is machine learning. If it is used for economic phenomenon it is an econometric model. However the confusion arises in the way these two paradigms are championed. The computer science major will always say machine learning and the statistical major will always emphasize modeling. Since computer science majors now rule at Facebook, Google and almost every technology company, you would think that machine learning is dominating the field and beating poor old econometric modeling.
But what if you can make econometric models learn from data?
Lets dig more into these algorithms. The way machine learning works is to optimize some particular quantity, say cost. A loss function or cost function is a function that maps a value(s) of one or more variables intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. Machine learning frequently seek optimization to get the best of many alternatives.
Now, cost or loss holds different meanings in econometric modeling. In econometric modeling we are trying to minimize the error (or root mean squared error). Root mean squared error means root of the sum of squares of errors. An error is defined as the difference between actual and predicted value by the model for previous data.
The difference in the jargon is solely in the way statisticians and computer scientists are trained. Computer scientists try to compensate for both actual error as well as computational cost – that is the time taken to run a particular algorithm. On the other hand statisticians are trained primarily to think in terms of confidence levels or error in terms or predicted and actual without caring for the time taken to run for the model.
That is why data science is defined often as an intersection between hacking skills (in computer science) and statistical knowledge (and math). Something like K Means clustering can be taught in two different ways just like regression can be based on these two approaches. I wrote back to my colleague in Marketing – we have data scientists. They are trained in both econometric modeling and machine learning. I looked back and had a beer. If university professors don’t shed their departmental attitudes towards data science, we will have a very confused set of students very shortly arguing without knowing how close they actually are.
Hope you enjoyed reading this blog.
Visit the Resources page for more eBooks or Whitepapers