More

    Why AI and embedded design share the same DNA?

    PHILIP LING, Senior Technology Writer | Avnet

    Change is always around the corner. Right now, it is in the shape of machine learning (ML). It is no exaggeration to say that artificial intelligence (AI) is influencing every aspect of the modern world. The extent of its influence will vary, as does the type of AI. Machine learning is a subset of AI, with recognized limitations. But these limitations mean ML requires fewer resources. This makes ML useful in edge applications. Detecting a wake word is a good example.

    AI involves complex algorithms. Training ML models normally takes place in the cloud and handled by powerful hardware such as graphics processors (GPUs) with access to a lot of fast memory. Running many trained models in the cloud makes sense if the cloud resources can expand to meet demand. The cloud resources needed to run millions of instances of those trained ML models would far exceed the resources needed to train the original model.

    Running those ML models at the edge is attractive to cloud providers. We can point to smart speakers as an example. The wake word can be handled at the edge by ML, while the AI providing the voice recognition is hosted in the cloud.

    Executing trained ML models in edge devices reduces cloud demand. Local ML also avoids network latency and expensive cloud processing. Models are running on small, connected devices sitting at the edge of wide-area networks. In some cases, the device may not need a high-bandwidth network connection, as all the heavy ML lifting happens on the device.

    In simple terms, running an ML model on an embedded system comes with all the same challenges doing clever things on constrained platforms has always had. The details, in this case the model, vary, but the basics are the same. Engineers need to select the right processing architecture, fit the application into the type and amount of memory available, and keep everything within a tight power budget.

    The key difference here is the kind of processing needed. ML is math-intensive; in particular, multidimensional math. ML models are trained neural networks, which are basically multidimensional arrays, or tensors. Manipulating the data stored in tensors is fundamental to ML. Efficient tensor manipulation within the constraints of an embedded system is the challenge.

    From dataset to trained model

    Tensors are the main building blocks of AI. Training datasets are often provided as a tensor and used to train models. A dataset for a motion sensor might encode x, y and z coordinates, as well as acceleration. Each instance is labelled to indicate what the data represents. For example, a fall will generate a consistent but variable kind of data. The labelled dataset is used to train an ML model.

    A neural network comprises layers. Each layer provides another step toward a decision. The layers in a neural network may also take the form of a tensor. In an untrained network, all connections between layers are random. Adjusting the connections between layers in a neural network creates the trained model.

    Training involves changing the weight of connections between the nodes in the neural network’s layers. The weights are changed based on the results of mining the connections in the dataset. For example, the model may learn to recognize what a fall looks like by comparing common features it detects in a dataset.

    The tensor of a training dataset might encode multiple instances of motion sensor data. Some of the instances will be labelled as a fall. Finding the connections between the instances labelled as a fall creates the intelligence.

    What does a trained ML model look like?

    The many forms of AI

    Artificial intelligence is even more diverse the organic intelligence. Technology provides the framework but the foundations exist in advanced mathematics.
    Artificial intelligence is even more diverse the organic intelligence. Technology provides the framework but the foundations exist in advanced mathematics.

    An untrained neural network with a specified number of layers will start with randomly assigned weights for the connections between those layers. As the model learns from the dataset, it will adjust the weight of those connections. As raw sensor input data passes through the layers of a trained model, the weights associated with the connections will change that data. At the output layer, the raw data will now indicate the event that generated that data, such as a fall.

    A weight value will typically be between -0.5 and +0.5. During training, weights are adjusted up or down. The adjustment reflects the strength of the connection in a path to a specific action. A positive weight is called an excitatory connection, while a negative weight is an inhibitory connection. Weights that are close to zero have less importance than weights closer to the upper or lower limit.

    Each layer in the trained model is essentially a tensor (multidimensional array). The layers can be represented in a high-level programming language, such as Python, C or C++. From there, the high-level language is compiled down to machine code to run on a specific instruction set architecture.

    Once trained, the model applies its learnt intelligence on unknown data, to infer the source of the data. Inferencing requires fewer resources, which is why it can be applied at the edge using more modest hardware.

    The performance of the model depends on the embedded system. If the processor can execute multidimensional math efficiently, it will deliver good performance. But the size of the model, number of layers and width of the layers will have a big impact. Fast memory access is another key parameter. This is why developing an ML application to run on an end point is fundamentally an extension of good embedded system design.

     

    Making ML models smaller

    Even with a well-trained model, edge ML performance is very dependent on the processing resources available. The overriding objective in embedded system design has always been to use as few resources as possible. To address the dichotomy, researchers have looked at ways of making the trained models smaller.

    Two common approaches are quantization and pruning. Quantization involves simplifying floating-point numbers or converting them to integers. A quantized value takes up less memory. For accuracy, floating-point numbers are used during training to store the weights of each node in a layer, as they give maximum precision. The aim is to reduce the precision of floating-point numbers, or convert the floating-point numbers to integers after training, without impacting overall accuracy. In many nodes, the precision lost is inconsequential to the result, but the reduction in memory resources can be significant.

    Pruning involves removing nodes with weights that are too low to have any significant impact on the result. Developers may choose to prune based on the weight’s magnitude, only removing weights with values close to zero. In both cases, the model needs to be tested iteratively to ensure it retains enough accuracy to be useful.

     

    Accelerating tensor manipulation in hardware

    Broadly speaking, semiconductor manufacturers are taking three approaches to ML model acceleration:

    • Building conventional but massively parallel architectures
    • Developing new, tensor-optimized processor architectures
    • Adding hardware accelerators alongside legacy architectures

    Each approach has its merits. The approach that works best for ML at the edge will depend on the overall resources (memory, power) needed by that solution. The choice also depends on the definition of edge device. It may be an embedded solution with limited resources, such as a sensor, but it could equally be a compute module.

    A massively parallel architecture adds multiple instances of the functions needed for a task. Multiply and Accumulate (MAC) is one such function used in signal processing. Graphical processor units (GPUs) are normally massively parallel and have successfully secured their place in the market thanks to the high performance they deliver. Equally, field programmable gate arrays (FPGAs) are a popular choice because their logic fabric supports parallelism. Although built for math, digital signal processors, or DSPs, have yet to be recognized as a good option for AI and ML.

    Homogeneous multicore processors are another example of how parallelism delivers performance. A processor with 2, 4 or 8 cores delivers higher performance than a single core processor. The RISC-V is becoming favored in multicore designs for AI and ML, as the architecture is also extensible. This extensibility allows custom instructions to be instantiated as hardware acceleration blocks. There are already examples of how the RISC-V is being used in this way to accelerate AI and ML.

    New architectures designed for tensor processing are also appearing on the market, from both large and small semiconductor vendors. The trade-off here might be the ease of programmability for a new instruction set architecture versus the performance gains.

     

    Hardware acceleration in MCUs for ML applications

    There are many ways semiconductor companies, both established and startup, are tackling AI acceleration. Each will hope to capture a share of the market as demand increases. Looking at ML at the edge as a purely embedded systems design challenge, many of those solutions may see limited adoption.

    The reason is simple. Embedded systems are still constrained. Every embedded engineer knows that more performance is not the goal, it is always just the right amount of performance. For the deeply embedded ML application, the preference is likely to be a familiar MCU with hardware acceleration.

    The nature of ML execution means the hardware acceleration will need to be deeply integrated into the MCU’s architecture. Leading MCU manufacturers are actively developing new solutions that integrate ML acceleration. Some of the details of those developments have been released but samples are still some months away.

    In the meantime, those same manufacturers continue to offer software support for training models and optimizing the size of those models to run on their existing MCU devices.

     

    Responding to demand for ML at the edge

    Machine learning at the edge can be interpreted in many ways. Some applications will be able to use high-performance 64-bit multicore processors. Others will have a more modest budget.

    The massive IoT will see billions of smart devices coming online over the next several years. Many of those devices will have ML inside. We can expect semiconductor manufacturers to anticipate this shift. We already see them gearing up to respond to increased demand.

     

    ELE Times Report
    ELE Times Reporthttps://www.eletimes.com/
    ELE Times provides extensive global coverage of Electronics, Technology and the Market. In addition to providing in-depth articles, ELE Times attracts the industry’s largest, qualified and highly engaged audiences, who appreciate our timely, relevant content and popular formats. ELE Times helps you build experience, drive traffic, communicate your contributions to the right audience, generate leads and market your products favourably.

    Technology Articles

    Popular Posts

    Latest News

    Must Read

    ELE Times Top 10