Jonathan Sedar

Personal website and new home of The Sampler blog

Data science has become a well-established discipline. What is it?

Posted at — 11 Feb 2015

Editor’s note: this post is from The Sampler archives in 2015 - indeed it was the first post there. During the last 4 years a lot has changed, not least that now most companies in most sectors have contracted / employed data scientists, and built / bought / run systems and projects. Most professionals will at least have an opinion now on data science, occasionally earned.

Let’s see how well this post has aged - feel free to comment below - and I might write a followup at some point.

The term ‘data science’ has been around for several years with many explanations, discussions and breathless over-excitement in the technology and business press. What is it, where did it come from, and who’s using it today?

Let’s consider in turn: The Discipline, The Practitioner, The Application, and The Lead into Insurance

The Discipline

The term ‘data science’ is a useful shortcut to describe the recent confluence and evolution of several previously distinct disciplines1, made possible by an increasing availability of data and sophistication of high quality open source software, decreasing costs of hardware and data processing, intense academic research and massive commercial and industrial interests.

Data science as its own discipline is wide ranging and rapidly evolving, with a general theme of letting humans understand more about a situation, predict real-world actions and identify patterns in data.

It involves descriptive analysis, statistical modelling, iterative experimentation, agile systems development and high quality communication.

I would suggest the 4 main aspects are:

“The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.” - Hal Varian, Chief Economist, Google.

The Practitioner

The data scientist is best brought into the core of the business, to work closely with technical, operational and leadership teams to help improve decision making and critical business functions.

As an individual, or more likely a small team, they will have considerable skill in rapid and powerful software engineering, know advanced statistics, have effective communication and deep subject-matter expertise.

It’s potentially a very varied, highly skilled role, and the remit of a data scientist may cover, for example:

The famous ‘data science venn diagram’ by Drew Conway of DataKind, IA Ventures, Alluvium and more, is a lighthearted but surprisingly accurate summary of skills required and regularly employed by a data scientist during the course of their work.

In discussion several years later, Drew reflected that the diagram is still relevant and highlighted the additional importance of strong communication.2

“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” - Josh Wills, Director of Data Science, Cloudera.

The Application

The first industry to really make use of (and thus help define) data science has been the internet-oriented technology sector. They made key progress in: + search and advertising (Google, Facebook) + communications (Twitter, Skype) + entertainment (Netflix, YouTube) + consumer retail (Amazon, eBay).

These companies and others have improved the state of the art in recommendation engines, natural language processing, data compression, game-theoretic auctions, massive-scale psychological experiments, human-computer interaction, campaign analysis, user profiling and more.

Naturally, statistical modelling and data analysis is a critical, core capability for the typically more conservative pharmaceutical, telecoms and financial sectors too. These companies have conducted drug discovery, network optimization and predictive modelling for a long long time, and to assume they aren’t familiar with statistical data analysis would be foolish.

However the sheer abundance of new technologies, tools and techniques available today cannot be underestimated. It makes possible all sorts of high value analysis and modelling that simply wasn’t practical in years past.

Today’s advanced analytics in insurance pushes far beyond the boundaries of traditional actuarial science… While the impetus to invest in analytics has never been greater for insurance companies, the challenges of capturing business value should not be underestimated. - McKinsey & Co. Unleashing the Value of Advanced Analytics in Insurance

The Lead into Insurance

The insurance sector in particular is all about risk modelling and data analysis, and is a natural fit for a data science approach.

Now is the time for insurance companies to take advantage of the past five years of rapid development in the data science discipline. It’s time to make wide-ranging improvements throughout their businesses, to:

We’ll write specifically about the opportunities for the insurance industry to make best use of data science in future posts.

  1. Widespread discussion of data science as a discipline seems to have begun in 2009 with prognostications and explanations from people such as Hal Varian (Chief Economist at Google)
  2. The field of data science has grown to such an extent that dedicated books are starting to appear with first generation data scientists passing down their knowledge and experiences, for example the Data Science Handbook.

comments powered by Disqus