# Model-Based Design for Next Generation AURIX<sup>™</sup> Automotive Microcontroller

Kajetan Nürnberger MATLAB Expo 2021





# The next generation of AURIX<sup>™</sup> automotive microcontroller



- In automotive, the demand on processing power is constantly increasing
- Offering the requested processing capabilities is a challenge under circumstances like:
  - Ambient temperature
  - Power consumption
- AURIX<sup>™</sup> TC4x fulfills these requirements by providing a heterogenous architecture

PPU is a flexible architecture to address applications with fast execution times and/or large data processing requirements





#### **Execution time**

# Challenges with heterogenous architectures Programming model





# Challenges with heterogenous architectures Execution time dilemma

- Execution time of modern complex architectures cannot be easily predicted
- Execution time might completely vary on heterogenous compute units
- SoC interconnects might influence the overall response time





# Simulink<sup>®</sup> & Embedded Coder<sup>®</sup> enable a smooth transition between different computing architectures





# Dedicated tool flow enables implementation from model level to target device code





| 1 Code Replacement Library |                     |       |                      |                      |                      |
|----------------------------|---------------------|-------|----------------------|----------------------|----------------------|
|                            | •                   |       |                      |                      | ,                    |
| 2                          |                     |       |                      |                      |                      |
| Name                       | Implementation      | Numin | In1Type              | In2Type              | OutType              |
| RTW_OP_ADD                 | ifx_mm_add_f32      | 2     | single[5 5; Inf Inf] | single[5 5; Inf Inf] | single[5 5; Inf Inf] |
| RTW_OP_ELEM_MUL            | ifx_ms_scale_f32    | 2     | single[5 5; Inf Inf] | single               | single(5 5; Inf Inf) |
| RTW_OP_ELEM_MUL            | lfx_ms_scale_f32    | 2     | single               | single[5 5; Inf Inf] | single[5 5; Inf Inf) |
| RTW_OP_MINUS               | ifx_mm_minus_f32    | 2     | single[5 5; Inf Inf] | single[5 5; Inf Inf] | single[5 5; Inf Inf  |
| RTW_OP_MUL                 | ifx_mm_mul_f32      | 2     | single[5 5; Inf Inf] | single[5 5; Inf Inf] | single[5 5; Inf Inf  |
| RTW_OP_RDIV                | ifx_mm_rdiv_f32     | 2     | single[5 5; Inf Inf] | single[5 5; 16 16]   | single[5 5; Inf Inf  |
| RTW OP TRANS               | ifx m transpose f32 | 02    | single[5 5; Inf Inf] |                      | single[5 5; Inf Inf  |





# What CRL implementations look like

 Special keywords to map static and global data to vector memory vccm

- Special vector data types e.g. vNfloat\_t
- Special intrinsic functions to address special vector instructions





### Navigation example



- > Enhanced ADAS functions rely on sensor fusion
- > Sensor fusion algorithms have a high demand of matrix and vector operations
- > Keeping the time budget within a hard real-time system is challenging with big matrix operations
- > Algorithm developers like to work in HW independently



### Designing the algorithm using Simulink





#### Designing the algorithm using Simulink





# Comparison of the execution time of the complete algorithm



#### > Significant decrease of execution time by utilization of SIMD capabilities

#### Detailed look at execution times





#### > Splitting the algorithm might bring benefit as PPU time can be used otherwise



# Solving the computing resource allocation dilemma





- The enhanced matrix / vector calculation capabilities of the next generation AURIX<sup>™</sup> TC4x can be utilized using the Embedded Coder
- The use of Model-Based Design enables an efficient tailoring of an algorithm to a heterogenous HW architecture
- The higher level representation of a model can be easily ported between different HW architectures
- The SoC Blockset<sup>™</sup> enables simulation of the integrated SOC HW + SW directly within the MathWorks<sup>®</sup> tool chain



# Part of your life. Part of tomorrow.