Sahar Cohen
Agile and data science: how do these two come together?

Agile software development is the practice in many software organizations. Short cycles, close relations with the customer, continuous delivery and team collaboration lead to better product decisions, higher focus on the business, flexibility to changes, higher customer and team satisfaction and lower costs. To agile software teams, working in the traditional waterfall methodology seem anachronistic and strange. Although agile might be challenging to implement, its superiority is undoubted. But, what about agile in data science?
Data science work involves writing software code, but data science and software programming are not the same profession. Whereas software development is mainly an engineering process, data science is research oriented. It involves exploration, trial and error and re-thinking, which make its deliverables uncertain and often hard to define, and its cycles to (typically) somewhat longer. Due to this core difference, agile methodology, as it is implemented in traditional software teams, might not be suitable for data science work. However, there are several important agile principles that can considerably contribute to the work of data science teams. In this post we suggest those agile principles that we, at start-up.ai see as particularly important to the data science work.
1. Business Centricity
One of the 4 core agile values emphasizes the collaboration of the developers with the customer (the business stakeholders). This collaboration is important in data science work, at least as it is important in traditional software development. The deliverables of data science work must provide organizational value, and if the data scientists and the business stakeholders do not maintain close and open relationships, the probability of having a data science solution that fits the business needs is very low.
The interaction between business stakeholders and data scientists has some unique challenges. Business users interact with traditional software through a user interface. They are often capable of imagining a user interface and their interaction with it, which provides a straightforward mean to communicate around. On the other hand, the impact of data science work on the business is more abstract and complex. Business stakeholders do not understand algorithms and they often submit requirements in a business language that do not translate directly to a data science language.
The gap in translating business (non-technical and not scientific) requirements to data science terms gives a huge importance to “analytics translators”, a concept that was nicely articulated on a HBR article in 2018 (https://hbr.org/2019/02/how-to-train-someone-to-translate-business-problems-into-analytics-questions)
2. Do not aim to achieve it all from the first iteration
In pure functional software, there are two important considerations: correctness (the software is supplying the right output for any given input) and engineering quality. In a development iteration, the team will typically not commit to a feature if it cannot be supplied correctly and with sufficient quality, in the available time. In functional features correctness is usually well defined.
In research-based features, the knowledge that is required for developing the new feature and the exact outcome are initially absent. In many cases, there may be several levels of correctness.
The potential levels of machine learning model’s correctness are often measured by means of accuracy (or recall and precision). Consider for example an analytic feature that involves a prediction model. With no model, the prediction is random. With the “optimal” model, the prediction accuracy might get to 95%. When started working on the model a data scientist does not know how good the model can get and attaining this 95% requires a lot of efforts. It is often the case that a skillful data scientist can quickly come with a model that can provide 85% of accuracy, or even 90% of accuracy. Does such a model provide a correct result?
Many times, agile work means preferring quick solutions over optimal ones. When we have a solution that is already providing significant business value, it is often the case that we better start implementing the model, instead of continue optimizing it.
Prediction tasks are an easy example, in the sense that the solutions have objective measures of correctness (the prediction accuracy in this case). How would you measure or compare algorithms that cluster objects into groups, or algorithms that automatically summarize long documents? Research domains often require baby steps, and solutions that improve iteratively. Research is a product that evolves continuously and not a one-time project. Giving up the aim for optimization is sometimes hard for researchers and it often requires a cultural change.
3. Power to the team
The concept of teams is central in agile software organization. Unfortunately, many organizations see researchers as individual contributors. Researchers might be conceived as highly smart individuals, yet strange and sometimes hard to work with. Individual-contributor researchers get very little feedback from customers, very little feedback from developers and very little feedback from product owners. No wonder that in some cases individual contributors indeed lose their connections with reality and may provide outcomes that do not consider all the product and infrastructure constraints.
Moreover, in the individual contributor mode, customers, product owners and developers get very little feedback from the researchers and are often tempted to think that research is some sort of magic. When this is the case, the product comes with unrealistic request and expectations, and fail to deliver these requests to the researchers, and developers fail to prepare the infrastructures for implementing the new features.
Teams that conduct research should not be comprised solely from researchers. Developers who will be involved in the implementation and the product representative should be part of the team, as well.
Moreover, cross-role knowledge sharing should be encouraged. Researchers are often highly curious, open minded individuals, that can contribute in many domains, and sharing business knowledge and development knowledge with them, creates a win-win. Research work is often highly interesting and sharing it with product and development is also a win-win, since it increases the level of work interest, and helps in aligning expectations.
4. Team diversity
All data scientists were created equal, but data scientists come in multiple shapes and forms. Every data scientist is a whole and complex human being, however, as data scientist who are geared towards generalization, we often recognize four different typecasts of data scientists, as follows.
The engineer
These data scientists are very good programmers. They write code of high quality. The code that they write is modular, flexible and executes quickly. On the downside, these data science are sometimes slow in their work (which is not surprising, because they invest a lot of time in their code) and they are not always top creative.
The scientist
These data scientists are mathematical savvies. They have deep understanding of algorithms. They tackle each challenge in a scientific approach, and many times they can improve the accuracy of existing algorithms. These data scientists are sometimes also good engineers, but many times they care less about the actual code and are more focused on the conceptual algorithm. On the downside, these data scientists sometimes spend a lot of efforts on tasks that produce very little business value.
The hacker
These data scientists care about the problems and their solutions. They tend to produce quick and effective solutions that address the essence of the problem. They often grasp the essence of the business, although they sometimes miss some details that might be important. On the downside, hacker data scientists are sometimes very intuitive in their work, which comes with the price of being unorganized in their scientific approach to problems and in the way that they code.
The business focused
These typecast is somewhere in the middle between data scientists and product managers. They find a lot of interest in the business process and they are highly educated in the business domain. On the downside, these data scientists often less technical.
The different typecasts of data scientists have different operations system. None of these typecasts is better or worse than the others. Each typecast can contribute its strengths and might need some extra help with its weaknesses. Diversity that puts data scientists with different operation systems into a single team has huge potential.
5. Retrospective
Retrospective is one of the strongest rituals of agile. Dedicating time to discuss the way that the work is done, rather than the work itself, and adapting the methodology according to the team’s needs and desires ultimately drive teams to become more engaged and self-empowered, and simply makes the work methods better. With heterogeneous teams (see guiding principle no. 4) agile become even more complicated and comes with higher degrees of freedom. The different terminology, modes of work, outcomes and even mentality, should be discussed and learned by all sides with much respect and patience to each other. The more diverse is the team, the more important it becomes to have deep non-judgmental retrospective. Going agile in data science takes way more time and improvements iterations, and missing the needed room for retrospective jeopardize the level of team satisfaction from the change, the probability of finding some sweet implementation method and eventually the overall success of the change.
Summary
Although data science and software programming are not the same profession, there are several agile principles that are highly important in data science work. Data science should be focused on business value. Data science work should progress in baby steps, not aiming to achieve it all at once. Data science is a teamwork, not an individual contribution. Data science teams should be as diverse as possible. Retrospectives are helpful in shaping and improving the data science team’s work.