Very informative piece here about the future of statistics in an age of data analytics. I had never considered the point that statistics and probability were disparate subjects until a couple of hundred years ago. As the author mentions, this can be seen easily in the etymology (“statistics” coming from “state,” i.e. the state’s measurements).
Will big data cause a revolution in how these fields are used in quantitative techniques?
The author dismisses the decline of sample-based quantitative methods too quickly. Statistical inference based on the sample of a population becomes less impressive as data becomes more plentiful. With terabytes of data points, it is unlikely that most researchers will encounter too little data for a robust sample.
With the whole population’s data readily available, quantitative analysis now relies less on formal inference testing and more on teasing out relationships. The power of big data is its ability to lead us to looking at ways certain variables correlate. Some data tools are built on classical statistical assumptions, but many are not — because the limits of classical statistics no longer hold.
While statistics are still important to understanding data, the mindset for big data is data mining. This incorporates fields such as visualization, database management, and the social sciences to draw patterns and relationships from large data sets.
This is not the first revolution in data science, but it will change how probability and statistics are used.
Leave a Reply