A few years ago, I wrote my first article about data science and project management. Since then, I have done several data science projects, mostly as the data scientist myself, but as it happens, as a data scientist, you cannot just spend your time building models. In fact, you will spend most of your time to understand the business requirements.
IBM’s Watson tv commercials have fueled the expectation that Artificial Intelligence is something that can easily be installed and applied to a huge variety of different problems, no matter what industry. In reality, though, not all Watson projects are successful. And this is not only a problem of IBM. AI projects or even Machine Learning projects as a subset of AI are a challenge to a lot of companies.
If we look at the PMBOK definition of a project, the notion of creating of something new is an essential part of a project. For traditional projects in other domains, often enough, clients, project managers and project team members already have some experience with respect to the industry. If you have built a house before, then it is easier for you to build the second house. The more houses you build, the easier it will get, even if these new houses have new requirements that you haven’t seen before. Also, clients do have a notion of what a house looks like because they have seen other houses. They also understand why some things take longer than others because they can relate to the physical world.
Data Science projects are different because clients usually don’t have any experience in this domain. And there are not too many experienced project teams or project managers that have done several projects in an industry to reproduce results for every possible question. A data science project will most likely not result in an application that automatically understands human speech and thinks like a human, just without the mistakes and a million times faster. But how can we manage our client’s expectations and at the same time understand what he really needs?
There is another difference: Houses are made of well-known materials. Data science projects usually rely on data, and often enough the data that is needed is not available and needs to be collected first or it is not available in the quantity or quality that is needed. Also, there is no proven way of making a data science project successful. In some cases, the results may not be as expected, be it due to the lack of sufficient data or that the algorithms simply cannot produce anything usable. Sometimes, we need to play around with the different data attributes (“features”) to understand which of them actually enable a machine to find usable patterns. We have been building houses for centuries now but only a few decades have been spent delivering AI or ML projects.
If understanding the business is the prerequisite for success, then it is of utmost importance that the business is committed to investing enough time for onboarding the project team and working with the team to collect the requirements and test early prototypes. If the goal is to increase productivity, then a data science project will mean decreased productivity during the project phase. This has to be made clear very early, and this may encounter resistance.
One of the best practices to ensure the success of a data science project is to define success very early in the project. The main question here is “When would you regard this project to be completed successfully?” Often, this question will be answered with KPIs that are not measurable. But that’s the point: If it is not measurable, then you cannot improve it. As a consequence, it is mandatory to go through this phase and come up with measurable KPIs.
KPIs, unfortunately, are not enough in most cases. A model can produce the best results in the world, but it is the business impact that plays a more important role than a model performance. A machine learning model can have the best performance in the world but may not have any business impact whatsoever.
Data science, as a consequence, is much more than just creating machine learning models. It is about creating business value by identifying the KPIs that really move the needle.