• vlad @ 312analytics.com

What is Machine Learning, Big Data Modeling

Big Data, is the hottest topic in tech news. Everyone talks about what Big Data is, but rarely does anyone show how data mining algorithms actually work. This post is to show how to make a simple stock data decision tree model, and how to understand it.

The Data
Data was downloaded from a free online source for the John Deere stock ticker DE, from August 3, 2012 to May 3, 2012. Data shows DE Open, DE High, DE Low, DE Volume, and other stock indicators such as Dow Jones, NYSE, Shanghai, Nasdaq, Greek, Euro VKG, Price of Corn, and to Buy DE stock Yes/No. Buy DE is calculated based upon if DE Close Price > DE Open Price.

DE Data

Machine Learning J48 Algorithm

The machine learning tool processes all the attributes against the Buy Deere Yes or No goal. Then it spits out rules as to which attributes cause a Buy Yes or a Buy No. The algorithm used to process this data is the J48 or C4.5 Algorithm that provides a decision tree. This is a set of rules of how the data relates to Buy Yes, or Buy No decision. The model was 72% effective, and a valid model at the time.

decision tree

How To Read The Output
Going Left: if Euro VKG stock ticker is Green, than the next ticker to look at is DJIA, and if Dow is Green than it is a Buy DE stock decision. If the Dow is Red, the next stock to look at is the GREK, if it is Green than do not buy DE, if it is Red, than Yes, purchase DE.

Going Right: if Euro VKG  is Red, then you want to look at NASDAQ, if Green, then you want to look at the price of Corn, if corn is Green than it is a don’t buy decision. If the price of Corn is red, then yes purchase DE. If NASDAQ is red, then it is a no buy decision.

The decision tree rules are straight forward and easy to follow. However, this model cannot be used as the data is outdated and there are not enough attributes that makes me feel secure in actually purchasing the stock ticker. There are far more factors that are affecting the price of DE stock other than those mined in this algorithm. At that time the Euro Markets were the highest impact on DE stock.  If we were to run this algorithm today, we may find completely different factors that impact stock price.

Models are created based on the data type you input into the software. It is up to the data scientist to find the right algorithm or algorithms to mine the data. In the real world, algorithms that are more than two thirds accurate cannot be ignored. It can take some time to find the right model, and becomes even more complex with big data.

Related posts: Decision Trees vs. Neural Networks, Big Data vs. Small Data, My first hands on experience with Big Data