Big Data is here. Before looking at the difference between Big Data vs. Small data, it is important to assess the data mining process for the data. This will allow you to see the steps necessary to arrive at valid data insights.
Data Mining Process
Data mining is a time consuming task because you not only have to clean the data and make it ready for analysis. Then, you have to create a valid model from the data. The model you create has to be verified by running a different model and achieving the same results to prove that you did not arrive at your insights by chance. Only then, can you present the insights and outcomes of your data.
Small data is collected with an intended purpose for analysis. It is a sample size that is determined by the data scientist that is collected to answers the problem at hand. With Small Data, there is control of the data. It is ready and conditioned for analysis once the data is collected.
Big Data, does not have an intended purpose other than data mining. For this reason, the data takes a long time to clean and processed by the machine learning algorithms. The data scientist lets the machine do all the work to come up with relationships in the data structures. Then uses different algorithms to verify the findings.
Time, data complexity, and cleaning processes are the main differences in Big Data vs. Small Data. Here is an example of a decision tree machine learning data model built with small data. I will be writing about ways to process big data machine learning on this blog in the near future.