“The use of unstructured data in AI will become massive in the coming years”

Dataiku’s CEO details his 2022 strategy as well as his R&D roadmap. An interview granted this April 20, 2022 on the occasion of the world event of the company which was held in Paris.

Founded in 2013 in Paris, Dataiku recorded revenue of more than $150 million in 2021, up more than 60% year-on-year. Present in 8 foreign countries (Germany, United Kingdom, United States, Singapore, Australia, Dubai, Canada, Netherlands), the company has more than 1,000 employees, including around 250 based in France. It claims more than 500 client companies, 150 of which are listed in the Forbes Global 2000.

JDN. What are the key machine learning market trends in 2022?

Florian Douetteau is CEO of Dataiku. © Dataiku

Florian Douetteau. First, we find that the French machine learning market is very comparable to others. French companies investing in this area are as mature as American companies. Another trend is that AI platforms are massively migrating to the cloud. Companies understand that this path allows them to benefit from a base that brings together 90 to 95% of their relevant analytical data, with fluid access and agility. Based on this base, they can then create all their smart applications. This is not a phenomenon unique to AI. It concerns all business developments.

At the same time, we are seeing a democratization of machine learning. Many business profiles have an appetite for analytics and forecasting. Whether in strategic marketing, financial engineering, supply chain, mechanical engineering or industrial process. All these professions increasingly need data. A chemist, for example, who accesses data from his tank, feels that there is something more to do beyond the choice of temperature via a rule of three depending on the outside temperature. He knows his business data perfectly. He knows how to normalize and manipulate them. Applying a machine learning algorithm is well within reach. His approach in this area will be much more relevant than that of a data scientist to decide how to do it.

How do you accompany your customers to the cloud?

Our platform was designed to run on the three main cloud providers (Amazon Web Services, Microsoft Azure and Google Cloud, editor’s note). But also on the main cloud data warehouses, in particular Snowflake. The first wave of cloud migration involved moving one-premise servers and databases to the cloud. The second concerns the applications that are still on the workstation and the business processes that were not yet dematerialized. The vast majority of our customers have reached this second phase, and are deploying Dataiku in the cloud.

How do you promote the democratization of machine learning within organizations?

The Dataiku platform is historically designed for no-code AI. From the start, in 2013, we imagined it as a graphical machine learning workshop, oriented no code / low code. Obviously, this is an environment in which the data scientist is free to code. But our real challenge was to allow profiles who don’t have time to learn to program to be able to work on their data and models. This is what makes our platform so successful.

“Dataiku’s visual layer allows business profiles to be autonomous from data scientists”

It is estimated that one billion people around the world have a computer for work. Within this population, at least 300 to 500 million professionals manipulate data via spreadsheet-type applications. As for data scientists, their number is estimated at 1 or 2 million. One data scientist for 1,000 users is a very low ratio at a time when analytics and AI are among the key bricks of work transformation. The visual layer of Dataiku, which we are constantly improving, allows business profiles to be autonomous vis-à-vis data scientists.

What are your main R&D areas for the coming months?

In the continuity of this logic of democratization, our first challenge is to provide more and more applications packaged on top of our platform to facilitate and accelerate the handling of machine learning. We live in the age of Netflix. Enterprise software must also provide satisfaction in 48 minutes. Our brain time is now modeled on that of a series episode. In this logic, it is a question of proposing prebuilt applications in which the way of visually using a model within an operational process is already provided. For finance, for example, it could be models oriented towards risk management or cash forecasting.

“We already have clients that are growing in multimodal AI”

Our second R&D challenge relates to governance, model explainability, and trusted AI. This involves traceability and the issue of data quality, but also responsible artificial intelligence.

Finally, we wish to strengthen the possibilities of our platform in the ingestion of unstructured data, in terms of texts, images and even sounds. 80-90% of our customers’ use cases rely on structured data. But it is clear that the use of unstructured data in AI will become massive in the coming years. Hence this strategy.

For many AI scientists, the next stage of machine learning will go through the emergence of multimodal oriented models capable of ingesting both text, image, video… Do you anticipate this new stage?

We anticipate it. We already have customers who are developing in multimodal AI. This is the case in predictive maintenance, quality control on production lines or even in the optimization of industrial processes. In these different cases, the models combine textual type data, for example reports written in the field, with images of the manufacturing flows.

Do you integrate large language models (LLMs) such as Bert, GPT, etc. into your platform?

We take care of them. Dataiku notably integrates with the LLMs available on Hugging Face. The use of these models is also part of the future. For the models of several hundreds of billions of parameters which have been designed for general use cases in the context of technological demonstration, the question arises of the relevance in business on often specific business contexts, for example in the case of a corpus of pharmacological data. This question remains open. Ditto for generative models (cut to automate the creation of images for example, editor’s note) for which we are still trying to identify business use cases.

Florian Douetteau is CEO and co-founder of Dataiku. A graduate of the Ecole Normale Supérieure, he began his career at Exalead, which he joined in 2000 to carry out a thesis on the development of the Exascript programming language. He will remain there until 2011, successively occupying several management and vice-president positions in the fields of research, development and product management. After a stint at Is Cool Entertainment as technical director, he joined Criteo for a while as a freelance data scientist, before embarking on the Dataiku adventure in 2013.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *