DeepMind has used its artificial intelligence (AI) dubbed AlphaZero, which plays board games, to discover a faster way to solve a fundamental mathematical problem in computer science. It thus beats a record established more than 50 years ago.
The problem in question, matrix multiplication, is a crucial type of computation at the heart of many different applications, ranging from displaying images on a screen to simulating complex physical phenomena. It is also fundamental to machine learning itself. Finding a way to speed up this calculation could have a huge impact on thousands of daily computing tasks by reducing costs and saving energy.
“It’s a really amazing result,” said François Le Gall, a mathematician at Nagoya University in Japan, who was not involved in the work. “Matrix multiplication is used everywhere in engineering,” he explains. “For anything you want to solve numerically, you usually use matrices.”
Despite the ubiquity of this calculation, it is not always well understood. To put it simply, a matrix is a grid of numbers representing whatever you want. Multiplying two matrices together usually involves multiplying the rows of one with the columns of the other. The basic technique for solving the problem is taught in high school. “It’s like the ABCs of computer science,” says Pushmeet Kohli, head of DeepMind’s AI for Science team.
To be more effective, the new DeepMind chatbot will rely on human reactions
However, things get complicated when trying to find a method to go faster. “Nobody knows the best algorithm for this,” continues François Le Gall. “This is one of the biggest open problems in computer science.”
Indeed, there are more ways to multiply two matrices together than there are atoms in the universe (10 to the power of 33 for some cases examined by the researchers). “The number of possible actions is almost infinite,” explains Thomas Hubert, engineer at DeepMind.
For the DeepMind team, the trick was to turn the problem into a kind of three-dimensional board game called TensorGame. The board represents the multiplication problem to be solved and each move symbolizes the next step in solving that problem. The series of movements performed in a game therefore represents an algorithm.
Researchers trained a new version of AlphaZero, called AlphaTensor, to play this game. Instead of learning the best series of moves to perform in Go or chess, AlphaTensor learned the best series of steps to perform when multiplying matrices. He was rewarded for winning the game in the fewest moves.
“We turned it into a game, our favorite setting,” says Thomas Hubert who was one of the main researchers working on AlphaZero.
The researchers describe their work in an article published this Wednesday, October 05, 2022 in the journal Nature. The takeaway is that AlphaTensor discovered a way to multiply two matrices of size 4 by 4 faster than a method devised in 1969 by German mathematician Volker Strassen. A performance never achieved before today. The basic method taught in high school has 64 steps. That of Volker Strassen has only 49. AlphaTensor has found a way to perform the operation in 47 steps.
In the end, AlphaTensor beat the best existing algorithms for more than 70 different matrix sizes. It reduced the number of steps required to multiply two matrices of size 9 by 9 from 511 to 498. The number required to multiply two matrices of size 11 by 11 decreased from 919 to 896. In many other cases, AlphaTensor is fell back on the best existing algorithm.
The researchers were surprised by the number of different correct algorithms found by AlphaTensor for each matrix size. “It’s amazing to see that there are at least 14,000 ways to multiply 4 by 4 matrices,” enthuses Hussein Fawzi, researcher at DeepMind.
After researching the fastest algorithms in theory, the DeepMind team wanted to find out which would be the fastest in practice. The performance of the algorithms can vary depending on the hardware made available because computer chips are often designed for specific types of calculations. The DeepMind team used AlphaTensor to research suitable algorithms for Nvidia V100 GPUs and Google TPUs, two of the most commonly used chips for training neural networks. The algorithms found were 10-20% faster for matrix multiplication than those typically used with these chips.
Virginia Williams, a computer scientist at MIT’s Computer Science and Artificial Intelligence Laboratory, is excited about the results. She points out that, for some time, computational approaches have been used to find new matrix multiplication algorithms – and many of the fastest existing algorithms were designed this way. However, none has been able to improve on the results obtained long ago by Volker Strassen.
“This new method offers something completely different from what the others were doing”, explains Virginia Williams. “It would be nice to find out if this new method really encompasses the previous ones or if you can combine them in order to obtain something even better”.
DeepMind now plans to use AlphaTensor to research other types of algorithms. “It’s a new way of doing computing,” concludes Pushmeet Kohli.
Article by Will Douglas Heaven, translated from English by Kozi Pastakia.
Be ready for the next generation of AI
Receive our latest news
Every day, the selection of main info of the day.