Instruction Level Energy Consumption Estimation of Embedded Processor

Embedded systems are portable battery powered devices that have limited power resource. Hence, most of embedded systems need to meet energy constraint. Performance and energy consumption are the most important metrics for embedded system design. Estimation of performance, energy utilization and its validation are essential for embedded system design. Attempt has been made to precisely measure software energy consumption by three methods on ARM Cortex M4 processor. The results are validated with five benchmark programs. Tedious calculation of inter instruction cost has been minimized by taking it as certain percent of total energy. Percentage error between actual and estimated energy is found to be less than 5%. 


I. INTRODUCTION
An Embedded system can be electronic system or a computer system designed to perform a particular order of task(s) or a specific task.It is a system built to execute its functions completely or partially independent of human intervention.It is specially designed to perform specific tasks in the most efficient way.Embedded systems are designed to perform specific tasks.Embedded systems are not always standalone devices.Embedded systems have very limited resources, particularly the memory.Generally, they do not have secondary storage devices.They cannot be programmed to perform anything other than the tasks for which they are designed.Embedded systems are constrained for power.As many embedded systems are powered by a battery, the power consumption has to be very low.There are many other optimization techniques available for design of embedded system which focus on the design for optimization of cost, power, and area.There is always tradeoff which exists between different design metrics which has to be managed carefully by the designer to improve the overall performance of the system.With increasing complexities of functionalities and end user expectation of longer battery life, power has become critical parameter for design consideration.The rate at which complexities are added is much higher compared to development rate in battery technology.The end user expects more time between successive recharges and this is possible with power aware design at various levels.Accurate models for software power, energy consumed during software execution by the processor, are essential for design of power optimum software structure.As software energy contributes significantly to system energy, accurate energy estimation is necessary for system energy optimization.
Power consumption model of the processor software can be categorized as Low-Level models and High-Level models.Low level models are also called as hardware models.Power and energy is calculated from detailed electrical descriptions, comprising circuit level, gate level, register transfer (RT) level or system level.High-Level models deal only with instructions and functional units from the software point of view and without electrical knowledge of the underlying architecture.
In this paper a precise approach for software energy estimation for ARM Cortex M4 processor is presented.The rest of the paper is organized as follows.Section II reviews the issues related to processor power measurement.In section III, experimental setup and result is presented.Validation of result is discussed in Section IV, and Section V concludes the paper.

II. ISSUES IN PROCESSOR POWER MEASUREMENT
The two main approaches in the estimation of the energy consumption of embedded system are: simulation based and measurement based.The simulation based approach uses models relating power consumption and programming instructions.Non availability of all models of a modern processor and if available, high price is the drawback of this approach.The second approach is based on physical measurements of power consumption.Measurement approach is the only way to verify correctness of simulation based approach.Therefore measurement is important to validate the power consumption models.
Measurement methods can be averaged or cycle accurate.It depends on the time interval of energy estimation.Method commonly used is measurement of voltage drop across shunt resistor inserted in the power supply line.The value of resistor so chosen should be very small so that its effect on total current is minimum.Since the value of voltage drop is very small, suitable voltage amplifier need to be used.
Digital Multimeter or data acquisition tools are used.Current shunt method is at very low currents range like to measure current of MSP430 (microamperes) [1], wireless communication modules like Bluetooth (mill amperes) [2].G. R. Udupi is with Computer Science and Engineering, SGBIT, Belgavi, India.(e-mail: grudupi@gmail.com).
Instruction Level Energy Consumption Estimation of Embedded Processor V. A. Kulkarni and G. R. Udupi DMM is used to measure current consumption of 486 microprocessor [3].Cycle accurate power consumption measurement is described in [4] where 0.1 Ohm shunt resistor along with differential amplifier is used.Drawbacks of current shunt method: i) shunt resistor presence in the direct supply line and affecting the total current ii) voltage drop across shunt resistor decreases voltage available for processor core, is overcome by using Wilson current mirror [5].It duplicates the current taken by embedded system.Mirrored current is measured using shunt resistor.Current probes are used for microprocessor power estimation [6].Its advantage is supply lines need not be cut to insert resistor.Due to high price of probes, current shunt method is preferred.Charge transfer method is used to find cycle accurate energy consumption [7].However the measurement setup requires high sampling frequency DAQ, therefore is more complex to implement.Battery voltage drop after executing software is also measure of energy consumed [8].The methods described above uses meters to build a power measurement.Another method is integrating power sensors into hardware architectures.This approach will give more accurate result as the sensors are integrated into the board.Issues related to instruction level power estimation are calculation of base cost, calculation of inter instruction cost, energy sensitive factors and method of processor current measurement [9].In this paper base cost is measured by executing same instruction 1000 times in infinite loop.This will help to minimize the effect of branch instruction in loop and average current is considered.Effect of number of 1's is very small and can be neglected [10].All other factors are considered as certain percent of total energy.Onboard / integrated current measurement is used to find average current.

III. EXPERIMENTAL SETUP AND RESULTS
A. Experimental setup ARM Cortex M4, on-board current measurement circuit is used which increases accuracy of measurements and overcomes many of limitations of current measurement mentioned in literature.It consists of a MAX9634T current monitor chip and a 12-bit ADC with a 12-bit sample at 50k to 200ksps.The MAX9634 multiplies the sense voltage by 25 to provide a voltage range suitable for the ADC to measure.Onboard current measurement is used for energy calculation.The ARM Cortex-M4 is a 32-bit core with 3 stage pipeline and Harvard architecture.Sample rate of 200ksps (5us period) is chosen for all measurements.Average current for a period of 1 second is considered for energy calculation.BL cost is Branch Loop cost.Each instruction takes few micro seconds for execution.It is very difficult to measure current in this short period.As reported in literature, method used is to run given instruction few thousand times in unconditional loop so that average current can be measured.To minimize the effect of branching instruction (BL), given instruction need to be executed several times before branching.To find base cost, each instruction is executed 1000 times in a loop [11] [12] [13].This minimizes the effect of "BL loop" instruction on base cost.Calculation of inter instruction cost involves lot of measurements.Number of measurements is given by [n(n-1)/2].Where 'n' is number of instructions in Instruction Set Architecture.For a microcontroller with 100 instructions, 4950 combinations of measurements to be carried out to find inter instruction cost.This large volume of measurement is tedious and time consuming.From experiments it is found that except base cost, all other costs put together works out to be 5%.This 5% has been taken care in estimated energy.It will simplify the process of estimation to a great extent.
A typical embedded system software programs consist of two parts.1)Initialization part: It configures system modules, initializes program variables, etc.This initialization part of the program is executed only once at the beginning of the program so that the system gets ready to perform its mainoperation.2) Main part: It is usually implemented as an endless loop.From an energy consumption point, one can ignore the initialization part and assume that the system always operates in its main part.This is because when an embedded system is turned on, it is in the initialization phase for only few microseconds and then it goes into the main phase where it operates for hours.This implies that almost all the energy consumption of an embedded system is because of the main phase and not the initialization phase.

B. Instruction Energy Cost
Each assembly instruction is executed 1000 times in a loop to overcome the effect of branch instruction.This is shown in Figure 1.Sample rate of 200ksps (5us period) is chosen for all measurements.Average current for a period of 1 second is considered for energy calculation.
The average current for assembly program shown in Figure 1 is found to be 3.217 mA.The core voltage is 3.3 Volt and frequency is 12MHz.Instruction energy is product of i) average current taken for instruction execution ii) core voltage iii) time required for each cycles and iv) number of cycles for instruction execution.The calculation for instruction cost for the instruction is shown in Figure 1 is given in Table I.Measurements are carried on all possible variants of different instructions of Cortex M4.The variants of instruction MOV along with their average current is given in Table II.

A. Benchmark used
Benchmarks used to validate the results are FDCT (Fast Discrete Cosine Transform), FIR (Finite impulse response filter), JFDCTINT (Discrete cosine transformation on 8x8 pixel block) , MATMULT (Matrix multiplication of two 20x20 matrices) from WCET and STRING SEARCH from MiBench [14] [15] [16].The Mälardalen University, Sweden developed WCET Benchmarks.The Mälardalen WCET research group maintains a large number of WCET benchmark programs.Each benchmark is provided as a C source file (file.c).MiBench consists of a set of 35 embedded applications for benchmarking purposes.These benchmarks are divided into six suites with each suite targeting a specific area of the embedded market.The six categories are Automotive and Industrial Control, Consumer Devices, Office Automation, Networking, Security, and Telecommunications.All the programs are available as standard C source code.MiBench is composed of freely available source code.

B. Grouping of Instructions
Details about all assembly instructions executed are obtained from instruction trace.A total of 3680 instructions from 'jfdctint', 5116 instructions from 'fdct', 3033 instructions from 'string search', 37564 instructions from 'matmult' and 233380 instructions from 'fir' benchmark are traced.Energy for each instruction is calculated after knowing average current taken by it, as shown in Table I.
Estimated, actual energy consumption and percentage error for all benchmark is given in Table III.Table IV shows percentage composition of instructions based on function for all five benchmark considered for validation.Once the grouping is done, benchmark programs executed.Instead of assigning a cost to type instruction, cost is assigned based on the group it belongs to.Once the grouping is done, benchmark programs executed.Instead of assigning a cost to type instruction, cost is assigned based on the group it belongs to.The estimated and actual energy consumption for all benchmark considered using grouping method is shown in Table V.

D. Grouping of Instructions -Cycle wise
Another method of grouping the instructions is based on number of cycles used for execution.Instructions are classified as 1 cycle 2 cycle and 3 cycle instructions.The average value taken for energy calculation for single cycle instruction is 3.111185 mA.The average value taken for energy calculation for two cycle instruction is 2.782308 mA and for three cycle instruction is 3.76 mA.The number of cycles for PUSH and POP depends upon number of registers to be pushed / popped.To simplify the calculations, PUSH and POP are considered as 2 cycle instructions.The percentage composition of cycle wise instructions in all five benchmark considered is shown in Table VI.Once the grouping is done, benchmark programs executed.Instead of assigning a cost to instruction type, cost is assigned based on the number of cycles required for instruction execution.The estimated and actual energy consumption for all benchmark considered using grouping based on cycles is shown in Table VII.Experiments carried out by three methods for all five benchmark.In first method, each instruction is considered with its average current and number of cycles.In second method, instructions are grouped depending on their function.In third method, instructions are grouped based on number of cycles required for execution.The results obtained for all five benchmarks by three methods viz.instruction wise calculation, calculation based on grouping by function and calculation based on grouping by number of cycles required is shown in Table VIII.As can be seen from Table VIII, the readings of instruction wise calculation, calculation based on grouping by function are nearly the same.Thus software energy calculation can be further simplified by considering function wise grouping of instructions, which saves lot of calculations and time.It is evident from Table IX that the difference between estimated energy of instruction wise and grouping of instructions based on function is less except in case of matmult benchmark.Whereas the difference between estimated energy of instruction wise and grouping of instructions based on cycle is quite high.Thus software energy consumption estimation can be carried out with reasonable accuracy by considering the instructions based on their functionality, which will reduce complexity in calculation to a great extent.

V. CONCLUSION
The two major areas where energy consumption can be minimized are hardware and software.Voltage scaling, frequency scaling and keeping component in power saving mode when not in use are methods used in energy minimization using hardware.This research concentrates on second area i.e. minimization of energy consumed by software.Major issues in software energy measurement are: measurement of core current, inter instruction effect and static power.Measurement of core current has been carried out by using on board current measurement.Processor core are considered is ARM Cortex M4.Effect of inter instruction and static power is considered by taking them as certain percentage of estimated energy.This approach minimizes lot of calculation and saves time.Effort has been made to simplify the software energy estimation process by two methods.One by grouping instructions based on their functionality and second by grouping instructions based on number of cycles required for instruction execution.
Percentage error between estimated and actual energy is found to be from -4.89 to 4.55 when each instruction is considered separately with its energy cost.-0.87 to 3.18 (except MATMULT) when grouping is done based on function and 1.41 to 17.46 when grouping is done based on number of cycles.Thus software energy can be estimated by considering each instruction or by grouping the instructions based on function.The grouping will reduce the complexity involved in software energy estimation.

TABLE I :
SAMPLE INSTRUCTION ENERGY CALCULATION

TABLE III :
% error for all benchmarkinstruction wise

TABLE V :
% error for all benchmarkfunction wise

TABLE VIII :
% error for benchmark by three methods

Table
IX shows the % difference between i) energy estimation by instruction and energy estimation by function wise grouping ii) energy estimation by instruction and energy estimation by cycle wise grouping.

TABLE IX :
% error with instruction wise estimation