Jonathan Sedar

Personal website and new home of The Sampler blog

Delivering Value Throughout the Analytical Process

Posted at — 19 Oct 2015

Editor's note: this post is from The Sampler archives in 2015. During the last 5 years a lot has changed, not least that now most companies in most sectors have contracted / employed data scientists, and built / bought / run systems and projects. Most professionals will at least have an opinion now on data science, occasionally earned.

I was prompted to dig this post up following an offline chat about a DS Project Checklist by fast.ai: it seems we all eventually stumble on thinking about the meta issues.

Let's see how well this post has aged - feel free to comment below - and I might write a followup at some point.

The heart of data science is to use statistical and programmatic techniques to realise value from data. This applies to non-profit organisations, conventional businesses and new startups alike. That realisation can take different forms:

… all three viewpoints are valid and demonstrate that we all want to follow a methodical process from initial speculation, through justified strategies and hypotheses, through data analysis, to improved models, systems and processes.

Do we need a new process?

To define such a process is nothing new, and data analysis methodologies such as the Cross Industry Standard Process for Data Mining (CRISP-DM) - developed as far back as the mid-nineties - adequately capture the workflows, tasks, responsibilities, and explain the benefits to various parts of the business.

However, many aspects of data science are still maturing, the CRISP-DM process appears to be abandoned and a handful of new processes are in discussion; the most notable being OSEMN (Obtain, Scrub, Explore, Model, iNterpret).

Applied AI are often called upon to deliver analytical insights and systems where before there were none. Naturally our projects are sold upon these final deliverables, but there's tremendous business value to be found throughout the analytical process and I think it's worth trying to define.

A new generalised data science process: SPEACS

Every new process needs a convoluted acronym, so I'll invent SPEACS 1. This is currently just a rough sketch, but you can see the general flow from ideas to implementation and the iterative nature.

Let's work through each stage in more detail and discuss the value to be found.

SPEACS Data Science Process

Strategy

Everything starts here: defining the business case, the potential benefits and risks. There's many questions which we ought to try to answer before going any further, including:

Business Case

Wider Impacts

On the technical side we might expect to support these questions by variously creating synthetic data, efficient data explorations & visualisation, and small simulations to deliver “what-if” analyses, make approximate estimations and help to engage with wider parts of the business.

At this stage though, we are unlikely to conduct any heavy analysis, and certainly wouldn't implement any systems. We aim to gain valuable insights into the current & future states of the business, and build a justified case for doing performing those analyses or building those systems.

Possibly the project will stop right here, or the goals will be sought through other means, and that's okay. Change is always hard, and every minute spent on these questions saves hours down the road.

Planning

This stage dives into the detail of what and how the project will run. I've placed this at the start of an iterative sub-cycle of Core Technical Activities, since planning is always tightly influenced by what's actually possible and the various outcomes of the full process. Some considerations include:

Data Sourcing and Ownership

Addressing privacy concerns & abiding by security protocols

Technology and process constraints

At this stage we are still unlikely to have accessed any data, created descriptive analyses nor created predictive models. What we seek here is to set up the right environment (both business and technical) for the project to succeed - whilst accounting for all sorts of considerations and compromises.

The valuable outputs / artifacts / deliverables are likely to included project schedules, technical architectures, legal frameworks, privacy statements, ethical statements and priced business plans.

“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” - John Tukey 2

Data Engineering

Most analysts will tell you that data preparation consumes a disproportionate amount of project time, and they're totally right. What you don't hear so often is the huge amount of value created during this stage:

Data Acquisition, Storage and Quality Control

Sampling, Feature Engineering and Tidy Data

Pipeline Engineering

Data Analysis & Modelling

Finally, the bit that most people think about when they hear ‘data science project’. You'll hopefully agree though, that by this stage we should have already realised a huge amount of value for the business. Simply getting here is often be more than half of the project.

Exploratory Data Analysis and Hypothesis Testing

Predictive Modelling and Evaluation

“Data are not taken for museum purposes; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action. The step intermediate between the collection of data and the action is prediction.” - W. Edwards Deming

Communication & Change

The final stage of this inner cycle of Core Technical Activities is communication & change: putting to use the observations and insights gained by making a change in the business or more widely. Without this stage, all the best analysis and modelling in the world is useless.

Observations and Reporting

Recommendations and Change Management

Systems

Congratulations, you've reached a level where it's worthwhile to embed your insights into systems. This systemisation might be:

Think carefully about how systems management & execution fit into regular business processes and vice versa. Implementing new technical systems is a great opportunity to redesign old business processes and make sure teams are well-configured and delivering business value.

Model Scalability, Repeatability and Responding to Experience

Systems Development

In Summary

My intention in this post was to demonstrate that a ‘data science’ project yields value all the way throughout the process, not just the analysis and modelling stages.

I've explained this in a new-drafted process called SPEACS, which I've detailed with examples and opinions gained from a several years working in the arenas of data science, consulting and systems development. I'd love to know your thoughts, and it's likely that I'll elaborate upon this in future.


  1. Pronounced “speaks”.

  2. Tukey is always a good read. See for example his paper ‘Sunset Salvo’, in which he condenses 40+ years of observations into only a few pages of incredibly quotable and still highly-relevant insight.


comments powered by Disqus