Apr 10, 2016

Big data analytics overview

The paper Beyond the hype: Big data concepts, methods, and analytics (2015, International Journal of Information Management, Vol. 35, N 2, pp. 137–144) reviews definitions and analytics techniques of big data and discusses some future developments. The article begins with a chart showing an explosion of publications in the Proquest database, which is quite similar to the chart in our JASIST publication "Big data, bigger dilemmas". Both charts show that 2013 was the year when the term "big data" gained popularity:
"Beyond the hype ..."
"Big data, bigger dilemmas..."
The paper cites Diebold's paper "A personal perspective on the origin(s) and development of “big data”: The phenomenon, the term, and the discipline" to describe the origin of the term "big data":
"... the term “big data … probably originated in lunch-table conversations at Silicon Graphics Inc. (SGI) in the mid-1990s, in which John Mashey figured prominently".
After summarizing aspects of big data that were discussed many times elsewhere (volume, velocity, variety, veracity, etc.), the article provides a useful summary of the types of analytics that are common in big data research:
  1. Text analytics
    • Information extraction
      • Entity recognition
      • Relation extraction
  2. Text summarization
    • Extractive (location and frequency of text units)
    • Abstractive (semantic information)
  3. Question answering
  4. Audio (speech) analytics
    • Transcript-based approach (large-vocabulary continuous speech recognition, LVCSR)
    • Phonetic-based approach
  5. Video analytics
  6. Social media analytics
    • Content-based analytics
    • Structure-based analytics
      • Community detection
      • Social influence analysis
      • Link prediction
  7. Predictive analytics
In conclusion the paper argues for new techniques that would address such issues as the irrelevance of statistical significance, heterogeneity and computational efficiency in big data.