provided by: EDNAutomotive designers of vehicle subsystems are always scouting for novel ways to extract maximum value from minimum resources. Embedded-system designers for automotive-electronics systems have no other choice than to stuff the maximum number of functions into a minimal code footprint. This article, apart from revealing some tricks of the automotive-embedded-system trade, mainly highlights some clever techniques for replacing a bulky and inefficient code segment with a slim and efficient one.
A typical automotive-control application consists of many real-time control loops based on mathematical models of physical quantities, real-time phenomena, and control strategies. Because shoestring budgets do not allow the use of costly DSPs, automotive designers must resort to a data-driven approach for implementing math functions. Values of the math functions reside in memory as parameters of the input variables. Figure 1 shows a function of two variables?throttle and engine speed, for example?stored as a 2-D look-up table. This approach allows a simple, low-cost, 8-bit microcontroller to emulate a complex math function.
Table 1 shows a sample function as an 8×8-entry look-up table, or ?map,? in automotive parlance. Note that each of the eight points for both axes serves as an index of the look-up table. Most functions are usually more complex, requiring a matrix of approximately 16×16 entries. A typical control loop derives its setpoints from such a math function. Adaptive proportional-integral controllers within the control loop use similar look-up tables for the proportional and integral parameters, so these parameters adapt themselves to system dynamics in real time. Figure 2 shows the complete control loop, including setpoint generation, for controlling a valve opening in real time. A typical automotive application involves many such control loops. More complex math functions require maps with three or more dimensions. These maps depend on series of multiple inputs, such as barometric pressure, coolant temperature, and air temperature.
The embedded code in such an application must access the map data by acquiring real-time inputs, such as throttle-usage percentage, engine speed, and coolant temperature, and then must use these inputs to perform a binary search that determines the row and column indexes of the maps. The code then accesses the maps to instantaneously obtain setpoints or proportional- and integral-control parameters. You repeatedly perform these steps in a control application; doing so optimizes this code and pays huge dividends in reducing the unproductive aspects of your application.
The secret behind this reduction involves analyzing an assembly listing of the embedded C code and sometimes revisiting an algorithm to make it more digestible to a processor. Engineers with strong hardware backgrounds can easily identify the bulky assembly code. An in-depth postmortem analysis of the culprit code fragments in an assembly listing can uncover many useful concepts. For example, 8-bit compilers typically increase the code footprint when they must carry or borrow to account for results that might exceed 8 bits. Alternatively, you can make assembly code more efficient by using shifts and additions to replace routine arithmetic when feasible. Accessing a 2-D map using two indexes involves 8-bit multiplication with many carries, which gobbles up lots of code real estate. When possible, use a bit-set or -reset operation to replace arithmetic operations, such as bisection and averaging.
Finding the culprits
You can use some tricks to detect and replace the culprit code. Consider a map of a 16×16-entry look-up table with a harmless-looking C statement: Data=Map (RowIndex, ColumnIndex);. This statement generates approximately 25 lines of assembly code. The 8-bit compiler treats the collection of the map?s data points as a 256-element array. It computes the array index by performing lengthy 8-bit multiplication, considering a carry at every possible addition.
A smarter approach uses the fact that both row- and column-index would always be 4 bits; hence, a simple nibble concatenation would reduce the code down to six assembly lines. The trick lies in declaring a map as a single array of 256 elements in C code and using two C statements to access the map:
//Shift RowIndex by four and add ColumnIndex to obtain Data
MapIndex=(RowIndex<<4)+ColumnIndex;
Data=Map (MapIndex);.
The clever use of nibble shifting in RowIndex in effect performs multiplication by 16 to obtain MapIndex=16×RowIndex+ColumnIndex.
If you have an oddly dimensioned 10×11-entry map, you cannot use a simple nibble-shift, but you can use multiplication by 10 without needing to perform tedious 8-bit arithmetic:
RowIndexInto2=RowIndex<<1;
RowIndexInto8=RowIndexInto2<<2;
RowIndexInto10=RowIndexInto8+RowIndexInto2;
MapIndex=RowIndexInto10+ColumnIndex;
MapIndex=(Row×(8+2)+Column)
Data=Map (MapIndex);
You achieve this multiplication using a simple shift-and-add operation, causing substantial savings in machine-code footprint.
Every map access requires two indexes that you obtain by performing an index search. Consider the map in Table 1. If the real-time-input values for the engine?s speed and the throttle-usage percentage are 1250 rpm and 45%, respectively, the row index should be five, and the column index should be four. Most popular binary-search algorithms use progressive bisection to zero in on the final index. Listing 1 at www.edn.com/080417ms4278 gives the traditional approach. When applying Listing 1, assume a 250-array element with a low of zero and a high of 250. The algorithm continues to bisect the search interval and evaluates both halves to see which one of them contains a match. The ?winning? half becomes the new and smaller search interval for the next iteration until you find the closest match. You can use this traditional algorithm for reducing your code footprint. Obvious culprits are lines such as index=low+(low+high)/2, which contains two additions and one division. The while statement, while (lowThis process again involves using the compiler to perform 8-bit arithmetic. Listing 2, available at www.edn.com/080417ms4278, produces a smaller version of the code footprint. It uses the same ?divide-and-conquer? theme, except that, this time, the division is processor-friendly. You initialize the search index to zero. Then, assuming an array length of 128 to 255, you set the most significant bit of the index to one to access the 128th element of the array and compare it with the input data. Depending on the outcome of the comparison, you either reset the bit or leave it at one. This approach selects the winning half as the next search interval. You also reset the bit if the index formed after setting the bit is out of range. This scenario does not occur if the array length is an integer power of two. The next iteration sets the next bit to the right, yielding an index of 1100 0000b or 0100 0000b, depending on the outcome of the previous iteration. Each iteration progressively determines the value of the next bit toward the least significant bit. Electronics engineers familiar with the hardware concepts of analog-to-digital conversion will appreciate that this algorithm in effect adopts the successive-approximation philosophy of an ADC.
Figure 3 shows the flow chart for this progression, and Table 2 shows the equivalent assembly-code lines that Hi-Tech Software?s (www.htsoft.com) C compiler for a Microchip (www.microchip.com) PIC16F77 microcontroller produces. All the operation involving direct and indirect arithmetic operations can liberally consume the code space. Thus, a while statement with comparison occupies five lines compared with a bit-based approach that occupies only four. The main culprit for the larger footprint is: index=low+(low+high)/2, which generates 14 lines compared for two lines with bit set/reset.
This example amply proves how a face lift on a worn-out algorithm can gain you quite a few bytes of code space. As Table 2 shows, the new algorithm saves 28% in code space. If the array length is an integer power of two, you can omit the out-of-range check of the index, saving six more assembly lines for a total savings of 43%.
This article highlights basic approach to remove unproductive ?fat? from your embedded application. The key lies in the analysis of the assembly listings, as the binary-search example amply illustrates. Apart from saving machine space, a significant by-product of this strategy happens to be the faster execution resulting from a small-footprint-code execution. You may discover new tricks by reviewing the assembly listing of your C code and attacking the bulky code fragments.
Vishwas Vaidya is the electronics divisional manager at India?s Tata Motors, where he heads a team that designs and develops automotive embedded-control systems. He holds a master?s degree in control engineering from the Indian Institute of Technology (Delhi, India) and a bachelor?s degree in electronics and telecommunications engineering from the University of Pune (Pune, India). You can reach him at vmv74342@tatamotors.com.
author: by Vishwas Vaidya, Tata Motors Ltd
EDN. Copyright © 2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.