Tiny Machine Learning Design Alleviates Bottleneck in Memory Usage on Internet of Things Devices | MIT News

Machine learning provides powerful tools for researchers to identify and predict patterns and behaviors, as well as to learn, optimize, and perform tasks. This ranges from applications such as vision systems on autonomous vehicles or social robots and smart thermostats to wearable and mobile devices such as smart watches and applications capable of monitoring changes in health. While these algorithms and their architectures are becoming more powerful and efficient, they generally require huge amounts of memory, calculations, and data to train and make inferences.

At the same time, researchers are working to reduce the size and complexity of the devices on which these algorithms can run, down to a microcontroller unit (MCU) found in billions of internet devices. of objects (IoT). An MCU is a mini-computer with limited memory housed in a compact integrated circuit without an operating system and executing simple commands. These relatively inexpensive peripheral devices require low power, compute, and bandwidth consumption, and offer plenty of opportunities to inject AI technology to expand their usefulness, increase privacy, and democratize their use – an area called TinyML.

Now, a team from MIT working on TinyML in the MIT-IBM Watson AI Lab and the research group of Song Han, assistant professor in the Department of Electrical and Computer Engineering (EECS), have devised a technique to reduce the amount of memory required. . even smaller, while improving its performance on image recognition in live videos.

“Our new technique can do a lot more and paves the way for tiny machine learning on peripheral devices,” says Han, who designs the TinyML software and hardware.

To increase the efficiency of TinyML, Han and his colleagues at EECS and the MIT-IBM Watson AI Lab analyzed how memory is used on microcontrollers running various convolutional neural networks (CNNs). CNNs are biologically inspired models after neurons in the brain and are often applied to assess and identify visual features in imagery, such as a person walking in a video image. In their study, they discovered an imbalance in memory usage, causing the computer chip to load frontally and create a bottleneck. By developing a new inference technique and novel neural architecture, the team solved the problem and reduced maximum memory usage from four to eight times. Additionally, the team deployed it on their own tinyML vision system, equipped with a camera and capable of detecting humans and objects, creating its next generation, dubbed MCUNetV2. Compared to other machine learning methods performed on microcontrollers, MCUNetV2 has outperformed them with high accuracy on detection, opening the door to additional vision applications that were not possible before.

The the results will be presented in a document at the Neural Information Processing Systems (NeurIPS) conference this week. The team includes Han, senior author and graduate student Ji Lin, postdoctoral fellow Wei-Ming Chen, graduate student Han Cai, and research scientist Chuang Gan from MIT-IBM Watson AI Lab.

Designed for efficiency and memory redistribution

TinyML offers many advantages over deep machine learning that occurs on larger devices, such as remote servers and smartphones. These, Han notes, include privacy, since the data is not sent to the cloud for computation but processed on the local device; robustness, because the computation is fast and the latency is low; and low cost, as IoT devices cost around $ 1 to $ 2. Additionally, some larger, more traditional AI models can emit as much carbon as five cars in their lifetime, require lots of GPUs, and cost billions of dollars to train. “So we believe that these TinyML techniques can take us off the grid to save carbon emissions and make AI greener, smarter, faster and also more accessible to everyone – to democratize AI,” said Han.

However, small MCU memory and digital storage limits AI applications, so efficiency is a central challenge. MCUs only contain 256 kilobytes of memory and 1 megabyte of storage. In comparison, mobile AI on smartphones and cloud computing, as a result, can have 256 gigabytes and terabytes of storage, as well as 16,000 and 100,000 times more memory. As a valuable resource, the team wanted to optimize its use. So she profiled the MCU memory usage of CNN designs – a task that had been neglected until now, Lin and Chen say.

Their results revealed that memory usage peaked by the first five convolutional blocks out of about 17. Each block contains many connected convolutional layers, which help filter out the presence of specific features in an image or video. ‘entry, creating a map of characteristics as the exit. During the initial memory-intensive phase, most of the blocks operated beyond the memory constraint of 256KB, providing a lot of room for improvement. To reduce peak memory, the researchers developed a patch-based inference schedule, which only works on a small fraction, about 25%, of the layer’s characteristic map at any given time, before moving on. in the next quarter, until the entire layer is complete. . This method saved four to eight times the memory of the previous layer-by-layer computation method, without any latency.

“For illustration, let’s say we have a pizza. We can split it into four pieces and eat only one at a time, so you save about three quarters. This is the patch-based inference method, ”Han explains. “However, it was not a free lunch.” Like the photoreceptors in the human eye, they can only capture and examine part of an image at a time; this receiving field is a part of the total image or of the field of vision. As the size of these receptor fields (or pizza slices in this analogy) increases, there is an increasing overlap, which equates to a redundant calculation the researchers found to be around 10 percent. The researchers proposed to also redistribute the neural network across the blocks, in parallel with the patch-based inference method, without losing any of the precision of the vision system. However, the question remained as to which blocks needed the patch-based inference method and which could use the original layer-by-layer method, as well as redistribution decisions; manual tuning of all these knobs was a lot of work and best left to the AI.

“We want to automate this process by performing a joint automated optimization research, including both the neural network architecture, such as the number of layers, the number of channels, the size of the nucleus, as well as the timing of the neural network. ‘inference including number of patches, number of layers for patch-based inference and other optimization buttons, ”says Lin,“ so that non-machine learning experts can have a button-based solution – push button to improve the efficiency of the calculation but also to improve the productivity of the engineering, to be able to deploy this neural network on the microcontrollers.

A new horizon for tiny vision systems

Co-design of the network architecture with optimization of neural network search and inference planning provided significant gains and was adopted in MCUNetV2; it has outperformed other vision systems in terms of maximum memory usage, detection and classification of images and objects. The MCUNetV2 device includes a small screen, a camera, and is roughly the size of a headphone case. Compared to the first version, the new version required four times less memory for the same precision, explains Chen. Compared to other tinyML solutions, MCUNetV2 was able to detect the presence of objects in picture frames, such as human faces, with an improvement of almost 17%. In addition, it set an accuracy record of nearly 72% for a thousand-class image classification on the ImageNet dataset, using 465 KB of memory. The researchers tested what are known as visual wake-up words, how well their MCU vision model could identify the presence of a person in a picture, and even with the limited memory of just 30KB, it achieved greater than 90 percent accuracy, beating the previous state. – state-of-the-art method. This means that the method is precise enough and could be deployed to help, for example, smart home applications.

With the high precision and low usage and cost of power, the performance of MCUNetV2 unlocks new IoT applications. Due to their limited memory, Han says, it was previously believed that vision systems on IoT devices were only good for basic image classification tasks, but their work helped expand the possibilities for use. by TinyML. Additionally, the research team envisions it in many areas, from sleep and joint movement monitoring in the healthcare industry to sports training and movements like a golf swing to identification. plants in agriculture, as well as in smarter manufacturing, from identifying nuts and bolts to detecting faulty machinery.

“We’re really moving forward on these larger scale and real-world applications,” Han says. “Without GPUs or specialized hardware, our technique is so small that it can run on these small, inexpensive IoT devices and run real-world applications such as these visual wake-up words, face mask detection, and people detection. . This opens the door to a whole new way of doing tiny AI and mobile vision. “

This research was sponsored by the MIT-IBM Watson AI Lab, Samsung and Woodside Energy, and the National Science Foundation.

Comments are closed.