Develop your end-to-end ai product in no time : why and how?

Johan Jublanc
9 min readDec 22, 2023

--

https://pixabay.com/fr/photos/montagnes-des-oiseaux-silhouette-100367/

Speed and quality are of vital importance to your business. Why? Because if you can’t develop quality solutions quickly, your competitors will, or you’ll miss out on major opportunities.

Building a data science solution means transforming data in complex ways and extracting useful information or actions from it. It’s not “just” about training or using a model to make a prediction or extract information. It has to be seen as a product from day one.Going end-to-end : just a starting point to iterate.

What does “end-to-end” mean? It means starting to deliver value, i.e. having an MVP. Here’s the classic ia product lifecycle.

Created with https://excalidraw.com/

Start your project following a few simple but effective steps

First of all, start with a case study

You first need to make sure you understand the problem you’re trying to solve, your customers’ difficulties and the value you’re going to bring. Otherwise, you’ll probably be working for nothing. To do so you can use a simple checklist like this :

Exemple energy production. A company wants to optimize the way it manages its energy production, deciding when to sell it on the market, when to store it in batteries, and even when to reduce its production capacity. To do this, a data scientist will probably want to forecast market prices, but that alone won’t bring value to the company. It will be necessary to devise a strategy and evaluate it both theoretically and in practice. Thus, the success of the project goes beyond the quality of the forecasting model.

Then, follow with a PoC (Proof Of Concept)

It must answer the question “Can this idea bring value? The sooner you answer this question, the less you’ll spend, and the more you’ll be able to move on to another idea if it doesn’t pan out.

  • What happens if it does not work in 2 weeks? And in 6 months?
  • What happens if we do not have this product?
  • What would give us confidence that a solution is feasible?

This will give us an idea of what to do during the PoC, and how long it will take.

Exemple energy production. For this example one have to make sure not only that the team is able to design a forecast model capturing some signal, but also that this signal is usable to improve the current strategy.

Finally, get some quick but quality wins with an MVP

With an MVP you already bring value to your client, even if many things are still to be perfected.

Exemple energy production. The customer wanted to establish a coherent workflow from data collection to the production of a useful result (in this case, information and recommendations for action). In this phase, we simply make sure that the workflow works, that the process is fast enough (predicting past information doesn’t help anyone) and that the information is usable to save money or time.

Why is it important to move quickly to an MVP? Because only then can you evaluate your solution. A good strategy is to start from a very simple base. For example, it’s a very good idea to start with a Random Forest, a business rule or a ready-to-use open-sourced tool.

Only when you assess the final value can you focus on the most crucial part of the project to improve in order to increase value. It’s most effective to answer the question: can the project deliver value? Is it technically feasible? Is the expected value of the idea of real interest to end-users? If you don’t, you run the risk of focusing on things that don’t matter for the end value, such as increasing the accuracy of the model from 0.8% to 0.82%.

Going end-to-end fast : use method and tooling

Prepare seriously and strongly

As mentioned earlier, you need to prepare your use case thoroughly. What value do you want to bring to the user? How will you be able to evaluate it?

You also always need to gather enough information to make a good decision about the time to devote to the project, the kind of evaluation system you need to design and implement, etc. To do this, you need to take into account expected revenues, risks, uncertainty and complexity.

An example, e-commerce platform. I once worked for a client who wanted to predict whether a visitor would convert when browsing their e-commerce platform. But the end goal wasn’t clear. What did they want to achieve? They imagined they could target these visitors with specific marketing actions. So I tried to make the use case very clear! What kind of action did they want to take? And when? What was the expected result? How can we measure the result?

Finally, we designed a new approach :

  • Calculate probability instead of predicting conversion. We don’t want to spend money on visitors who will convert anyway, nor on those who will never buy anything.
  • Take marketing action between visits, not during. There was no room for actions during navigation (due to technical problems), so there was no need to try a continuous approach.
  • Evaluate results with appropriate A/B testing. We decided to refine the solution using statistically relevant and robust information.

The sooner you know whether you need to develop the product, the more value you’ll bring or save

As soon as you get feedback from users or the market, you’ll be able to estimate an adequate return on investment, and an even more accurate risk assessment. At the end of the test phase, if you decide not to go any further, you haven’t lost much. On the contrary, if you decide to go ahead, you’ll have useful information for prioritizing your order book.

Decompose the problem

To move fast and be sure of working efficiently, try to find the key feature that will bring value, without all the nice things you thought of at the start.

For instance, when working on a recruitment solution, instead of developing a complete web application, we chose to focus on our core functionality, which was the search and rating of candidate profiles. We decided to concentrate on a very simple path and to time-box the development of the algorithms, which saved us a lot of time by not working on the login page, the payment system and a complex customer path.

Sequence, don’t parallelize

Whenever possible, you should always favor a sequential approach for two reasons:
Firstly, when you parallelize the production of two features or products, you will lose value because you could have delivered value at the end of project A rather than at the end of all your parallel projects.

Results when you parallelize

The other reason is that context switching consumes time and energy for the team. So it’s better to have an Agile team tailored to a project, rather than one big team that takes care of everything.

Results when you sequentialize

Focus on the final value for the user

That’s the only important thing. I know it’s tempting to try very sophisticated machine learning techniques, but it’s not worth anything until you’re sure you’re solving a problem for someone. The team works for the customer, not for the data scientists.Use tooling to scale as soon as possible

Tooling is crucial, as it speeds up the PoC process and facilitates industrialization. In particular, you need :

  • An easy set up
  • Monitoring and tracking from day one
  • Best practices implemented right from the design stage
  • The possibility to scale your experiment
  • Self-supporting code

To achieve this, we at LittleBigCode have developed a model that helps anyone start a new project using our MLOps platform.

This makes it possible to carry out a complex project such as planning the maintenance of over 5,000 sites. For each site, there is a model that predicts the productivity gain if a maintenance operation is carried out. This information is then used as input for an optimization module that takes into account several constraints, such as the distance between sites. The result gives the end-user the site to maintain for the next 10 days. Using a tool to scale up training allowed us to experiment more quickly. What’s more, because we could easily train the 5,000 models, we were able to build the optimization module using real-life input data. So, by the end of the PoC, we were pretty sure we could bring value to our customer and already had an end-to-end solution.

Going end-to-end with quality : dot not confuse quality and performances

Don’t compromise on quality

Quality is essential if high speeds are to be maintained. That’s why you should always reserve slots for refactoring and implement good practices at a very early stage, checking in particular the following items:

  • code quality : lintage, review, test
  • code structure and project worflow
  • tests
  • product evaluation: first and foremost evaluate the final value

Do not seek performances with your baseline

It’s important to point out that model performance is not the same thing as code quality and product value. You can’t be careless about quality and value, but it’s not important to have a high-performance. What’s important is capturing enough signal to build a service that solves a problem.

Include scalability and security to your PoC

When carrying out a PoC, you want to answer questions about feasibility and potential value. It’s a good idea not to forget to check scalability and security issues.

This is particularly the case with generative AI projects, as you can check the value of solutions very quickly using APIs like Openai, but what’s really important is to make sure you get a solution that avoids data leakage and limits the attack surface. Indeed, the use of an LLM on corporate data must be capable of handling huge quantities of data and potentially increasing threats.

Of course, since we’re talking about scalability, you also need to factor in the cost of training (if you have something like that in your project) and inference.

Managing stakeholder motivation

Repeat and explain your convictions

The above advice seems clear enough to me, but in real life it can be very difficult to implement. Many people will get algorithm metrics wrong, some stakeholders won’t understand why we spend time on refactoring.

And as a team leader, team member, PO or anyone else with an interest in the project’s success, you’ll need to repeat your beliefs often and explain where they come from. At the same time, to manage the high expectations of stakeholders, you’ll also need to be clear about what you can and can’t do.

Have something to show on a regular basis

The exploration phase can be a little frustrating for stakeholders and the sales team, who have high expectations of the new project they’ve launched. To avoid the team being strongly challenged on what they’re doing, make sure you have something to show on a regular basis, for example: data analysis, raw results, tracking, model outputs, etc.

Having ready-to-use tools gathered in one platform helps us at LittleBigCode to have something to show right from the start of development.

I hope you enjoyed this article, please feel free to leave comments and messages to share your thoughts.

--

--