Using Agile Techniques To Build Data Products

Feb 23, 2021

Author: Chris Brown

In my day job, I look after a multi-disciplinary team of analysts, data scientists and data engineers. The team carries out ad hoc data exploration and model building. Their typical work looks like this:

  • Think about desired outcomes of what they are building
  • Muse on a model that might work
  • Collect and shape the data
  • Build and apply the model
  • Test results, validate them
  • Present the results
  • Iterate and loop through all these tasks a number of times

This is explorative work and when it’s free-flowing an analyst can find themselves doing bits of all these activities to complete the work.

This is all very fine if deadlines aren’t too much of a thing, if not many other people are involved or there are few dependencies in and of the tasks. But in reality this is rarely the case; there’s always a delivery date, dependencies are often complex and very few things of merit can be done by one team member alone.

I need my team to work together to achieve outcomes that we have promised our stakeholders. I need a clear idea of whether we are going to make the dates. I may have to throw more resources at a problem (people, storage, compute). I need to know how engineering can help and get a clear idea of how we will know that we are finished.

I’ve introduced the team to the concept of agile exploration to build data products.

Thinking about data science problems like this means that we describe the final output as a product that has features. These features become user stories. The definition of done puts acceptance criteria on the stories. The activities to deliver the stories become sub-tasks. We can put the features of our data product into timed sprints of fixed duration. Hey presto! We’ve just repurposed agile development methods for data analysis and science.

Our data engineering teams already work in this manner. They are quite comfortable in following a scrum-based approach. Immediately a synchronicity is achieved between analysis and engineering. The team starts to use the same language to describe the work they are undertaking.

How we apply people and machinery to the problems has become a lot clearer. The process of developing models and visualisations can still be exploratory and iterative but now has more general structure. I can better predict if and when we are going to deliver what we said we would.

Be careful when trying this though. One problem we have encountered is applying a story based approach to an abstract concept, like a model. Developing a machine learning model can be more art than science. How do you describe the “brain” that is central to what we are trying to achieve?

My guidance to the team was to think about how you might describe the techniques that your model uses to reach its results. This might involve a particular statistical assumption, a regression, certain feature engineering that happens, assumptions made, a performance optimisation, a novel trick in category transformation. Over a period of model development these facets of the model will be introduced. So write them as stories and play them into sprints. The sprint timeline will show how the model will come to fruition. My exam question to the team is “if you had to describe to a friend in bullet points how cool your model is, what would you say?”

Chris Brown is a data professional who specialises in building platforms for companies to analyse data for business value. Throughout his career he’s been instrumental to many data driven firms, focusing on platform architecture; data science and insights; and building a digital bank from scratch. Chris joins us to share his insights on the fintech landscape; expect to learn about data engineering and data science, the areas he’s passionate about.