Author: Chris Brown
Author: Chris Brown
In my day job, I look after a multi-disciplinary team of analysts, data scientists and data engineers. The team carries out ad hoc data exploration and model building. Their typical work looks like this:
This is explorative work and when it’s free-flowing an analyst can find themselves doing bits of all these activities to complete the work.
This is all very fine if deadlines aren’t too much of a thing, if not many other people are involved or there are few dependencies in and of the tasks. But in reality this is rarely the case; there’s always a delivery date, dependencies are often complex and very few things of merit can be done by one team member alone.
I need my team to work together to achieve outcomes that we have promised our stakeholders. I need a clear idea of whether we are going to make the dates. I may have to throw more resources at a problem (people, storage, compute). I need to know how engineering can help and get a clear idea of how we will know that we are finished.
I’ve introduced the team to the concept of agile exploration to build data products.
Thinking about data science problems like this means that we describe the final output as a product that has features. These features become user stories. The definition of done puts acceptance criteria on the stories. The activities to deliver the stories become sub-tasks. We can put the features of our data product into timed sprints of fixed duration. Hey presto! We’ve just repurposed agile development methods for data analysis and science.
Our data engineering teams already work in this manner. They are quite comfortable in following a scrum-based approach. Immediately a synchronicity is achieved between analysis and engineering. The team starts to use the same language to describe the work they are undertaking.
How we apply people and machinery to the problems has become a lot clearer. The process of developing models and visualisations can still be exploratory and iterative but now has more general structure. I can better predict if and when we are going to deliver what we said we would.
Be careful when trying this though. One problem we have encountered is applying a story based approach to an abstract concept, like a model. Developing a machine learning model can be more art than science. How do you describe the “brain” that is central to what we are trying to achieve?
My guidance to the team was to think about how you might describe the techniques that your model uses to reach its results. This might involve a particular statistical assumption, a regression, certain feature engineering that happens, assumptions made, a performance optimisation, a novel trick in category transformation. Over a period of model development these facets of the model will be introduced. So write them as stories and play them into sprints. The sprint timeline will show how the model will come to fruition. My exam question to the team is “if you had to describe to a friend in bullet points how cool your model is, what would you say?”