More

    The Flexible Approach to Adding Functional Safety to a CPU

    Functional safety has become both increasingly important and prevalent across various markets, none more so than the automotive and industrial sectors.

    Previously, within the automotive industry, functional safety was reserved for a number of critical functions implemented in a few Electronic Control Units (ECUs) in a vehicle. This approach has increasingly changed as the number of automotive applications with safety requirements have grown, such as the continued deployment of driver assistance innovations and autonomy.

    The industrial sector’s evolving transformation towards smart manufacturing has seen the rise of automation and the growing adoption of collaborative robots (co-bots) working alongside humans on the production line. As humans and machines continue to work in such close proximity, control systems must constantly monitor the integrity of the robot and react appropriately under error conditions. This in turn has driven the necessity for higher levels of functional safety.

    Human and Machine Smart Manufacturing

    Methods for Incorporating Functional Safety into SoC Designs

    Implementing safety requirements in a structured and consistent way has been made possible by using internationally agreed standards, such as ISO 26262 and IEC 61508. These standards have been put in place to develop a common language across the industry, as well as provide a reference for best practices. To satisfy an application’s safety requirements, the creation of safety mechanisms and rigorous development processes are needed.

    One approach to mitigate random hardware faults is by using redundant hardware. For a CPU, this could be through the creation of a processor with dual-core lock-step. Such hardware duplication enables rapid detection and high levels of fault detection but increases the area and power that may not be merited for applications with lower safety integrity levels such as ASIL B or SIL 2.

    A combination of safety mechanisms, such as Error Correcting Codes (ECC), Built-In Self Tests (BIST), and others, can be used to help achieve these lower safety targets. For BISTs, there are various forms including memory BIST, logic BIST, and software BIST (Software Test Libraries).

    Logic BIST is used in product testing to provide a path to otherwise unreachable parts of the design, but can also be deployed in the field. Its use in the field can be complex to implement and is normally limited to power on testing, as its execution requires the processors to be offline to run the tests, then restarted as if a reset had occurred. This makes it a challenge to deploy at runtime where there are tight requirements for fault detection. The available performance is also reduced as there is the time required to both test the processor and then reset and restart the application.

    Like LBIST memory, BIST is normally used as part of the production flow, but may also be feasible to be deployed in the field. Some Arm processors enable memory BIST to be executed at run-time. As this BIST approach tests the memory, it would often be implemented with other checking mechanisms to address the logic of the processor.

    A Software Test Library (STL) lends itself as the natural choice when used with memory protection. It also provides a more flexible approach to testing at runtime compared with many other methods.

    Software Test Libraries

    STLs are a group of software functions which are run on the processor to be tested and detect the presence of a fault. They test the hardware for the presence of permanent faults within the processor functional logic, such as stuck at one and stuck at zero faults.

    STLs can be called by the software stack, selecting the tests to be run based on the available time for execution and without the requirement to take the core offline or reset the processor after testing. As the STL does not depend on hardware redundancy, they can be efficiently implemented with less die area and lower power than a dual core lock-step core. Power usage is equivalent to that of an application running on the CPU and there is no additional power impact like in the case of LBIST. This also enables an STL to be deployed on devices that are already available in silicon, unlike Lockstepping or LBIST which need to be planned at design time.

    STLs can achieve ‘medium’ percentage detection rates of faults within automotive applications that are up to ASIL B for the Single Point Fault Metric and up to SIL 2 industrial applications for the Safe Failure Fraction. The code is generally optimized so that it utilizes a minimal memory footprint. Tests are generally written in assembly to improve deterministic execution and fault coverage when compared to compiling a high-level language like C.

    Enabling Functional Safety on an Existing Design Using STLs

    To address a new market or applications with an existing chip, there may be a need to increase hardware metrics, such as the Single Point Fault Metric (SPFM), as part of the functional safety requirements. When targeting an established chip at applications up to ASIL B or SIL 2, an STL is a natural choice to consider. It can be deployed without the need for additional hardware changes to the device. In addition, the STL allows users to define the individual tests that need to be executed and so limits the test time and memory resources used.

    Each device has a functional safety concept which includes the safety measures and safety mechanisms implemented within the processor. When looking to deploy an STL, it is important to define the areas of the safety concept which will be addressed by the test library.

    STLs can be integrated into the software stack with a standard API and can be run periodically with the flexibility to define how testing is performed. The flexibility enables the licensee to either execute the complete STL with a single call or the testing can be divided into multiple discrete blocks. This either enables the depth of testing to be selected by targeting specific tests or it enables fitting the execution of tests across available time windows. This minimizes the STL’s effect on the main application’s workload and helps meet the desired Fault Tolerant Test Interval (FTTI). The operation and integration of the STL are described in the Arm STL user guide.

    Authored Article by:

    James Scobie, Arm
    ELE Times Bureau
    ELE Times Bureauhttps://www.eletimes.com
    ELE Times provides a comprehensive global coverage of Electronics, Technology and the Market. In addition to providing in depth articles, ELE Times attracts the industry’s largest, qualified and highly engaged audiences, who appreciate our timely, relevant content and popular formats. ELE Times helps you build awareness, drive traffic, communicate your offerings to right audience, generate leads and sell your products better.

    Technology Articles

    Popular Posts

    Latest News

    Must Read

    ELE Times Top 10