V<sub>TH</sub>-hopping Scheme for 82% Power Saving in Low-voltage Processors

Koichi Nose, Masayuki Hirabayashi, Hiroshi Kawaguchi, Seongsoo Lee *) and Takayasu Sakurai

Institute of Industrial Science, University of Tokyo
7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558, JAPAN

*) Department of Information Electronics, Ewha University
11-1 Daehyun-dong, Seodaemun-gu, Seoul, KOREA

Abstract

A threshold voltage hopping (V<sub>TH</sub>-hopping) scheme is proposed where V<sub>TH</sub> is dynamically controlled through software depending on a workload. V<sub>TH</sub>-hopping is shown to reduce the power to 18% of the fixed low-V<sub>TH</sub> circuits in 0.5V supply voltage regime for multimedia applications. A positive back-gate bias scheme within V<sub>TH</sub>-hopping is presented for the high-performance and low-voltage processors. The measurement result shows about 90% leakage power reduction is possible by using V<sub>TH</sub>-hopping.

Introduction

High-performance VLSI design with low supply voltage (V<sub>DD</sub>) becomes one of the most important issues in CMOS VLSI's, since mainstream V<sub>DD</sub> will be scaled down to below 0.5V in the coming years. The power and the delay dependence on the threshold voltage at 0.5V V<sub>DD</sub> are shown in Fig. 1. As seen from the figure, the threshold voltage (V<sub>TH</sub>) has to be decreased to achieve high performance. Reducing V<sub>TH</sub>, however, could cause a significant increase in a static leakage power component.

There have been several proposals to reduce stand-by leakage current, for example, MTMCOS [1] and VTCMOS [2]. These schemes, however, cannot suppress the active leakage power. Another approach is a dual-threshold voltage (dual-V<sub>TH</sub>) technique [3], which is to partition a circuit into critical and non-critical gates, and use low-V<sub>TH</sub> transistors only in the critical gates. The drawback of this scheme is that the leakage current cannot be sufficiently suppressed since the large leakage current always flows through the low-V<sub>TH</sub> transistors.

This paper presents a dynamic threshold voltage hopping (V<sub>TH</sub>-hopping) scheme that can solve above-mentioned problems. This scheme utilizes dynamic adjustment of frequency and V<sub>TH</sub> through back-gate bias control depending on the workload of a processor. When the workload is decreased, less power would be consumed by increasing V<sub>TH</sub>. This approach is similar to the dynamic V<sub>DD</sub> scaling (DVS) [4]. In the DVS scheme, V<sub>DD</sub> and the frequency are controlled dynamically based on the workload variation. The DVS, however, is effective when the dynamic power is dominant. On the other hand, V<sub>TH</sub>-hopping is effective in the low V<sub>DD</sub> designs where V<sub>TH</sub> is low and the active leakage component is dominant in total power consumption.

In order to show the effectiveness of the scheme, performance evaluation is conducted using MPEG-4 video coding and a small scale RISC processor with V<sub>TH</sub>-hopping capability is fabricated.

V<sub>TH</sub>-hopping Scheme

Figure 2 shows the total power dissipation depending on the workload. V<sub>THlow</sub> signifies V<sub>TH</sub> applied when the workload is maximum. If the workload is less than the peak workload, V<sub>TH</sub> is increased to the level where the speed requirement is just satisfied. The broken line represents a fixed V<sub>TH</sub> case with only a frequency control. The dynamic power dissipation decreases in proportion to the workload, since the dynamic power is proportional to the frequency.
The leakage power, however, is not reduced since it does not depend on the frequency. The straight line in the figure shows the power dependency of the variable $V_{TH}$ system on the workload. When the workload is lower than the maximum workload (i.e. workload < 1), the higher threshold voltage can be used while guaranteeing the logic blocks to work with the lower frequency. As is shown in the figure, it is clear that the total power is decreased effectively with dynamic $V_{TH}$ control depending on the workload. This sets the basis for the $V_{TH}$-hopping.

The schematic diagram of the $V_{TH}$-hopping scheme is shown in Fig. 3. Using the control signal, CONT, sent from the processor, the power control block generates select signals of $V_{TH}$'s, $V_{THlow}$_Enable and $V_{THhigh}$_Enable, which in turn control substrate bias for the processor. CONT is controlled by software through a software feedback loop scheme [5], which has been proposed for dynamic $V_{DD}$ scaling (DVS) but is also effective for $V_{TH}$-hopping. The software feedback scheme can guarantee hard real-time for multimedia applications with the DVS and the same algorithm guarantees the real-time operation with $V_{TH}$-hopping, since software-wise, the DVS and $V_{TH}$-hopping are the same.

It should be noted that at a power-on sequence, $V_{THlow}$_Enable is asserted and that $V_{THlow}$_Enable and $V_{THhigh}$_Enable are non-overlapping signal to eliminate direct current between two different $V_{BS}$.

CONT also controls the operation frequency of the target processor. When the $V_{TH}$ controller asserts $V_{THlow}$_Enable, the frequency controller generates $f_{CLK}$, and when the $V_{TH}$ controller asserts $V_{THhigh}$_Enable, the frequency controller generates $f_{CLK}/2$.

$V_{THlow}$ is determined so that the maximum performance of the processor achieves the required clock frequency of $f_{CLK}$. On the other hand, $V_{THhigh}$ is determined so that the processor operates at $f_{CLK}/2$.

**Simulation Results of MPEG4 Encoding using $V_{TH}$-hopping**

In order to show the effectiveness of the scheme, performance evaluation is conducted using MPEG-4 video coding. Figure 4 shows a simulation result of power transition in time for MPEG4 encoding case using $V_{TH}$-hopping. If more than two clock levels, hence more than two $V_{TH}$ levels, are provided, more power reduction is possible but the improvement is minor (only 6%) as is shown in the figure. Moreover, if more levels are provided, there are test issues since speed test should be run at more than two frequencies and more area overhead is needed for the control block and selectors. This is why the number of $V_{TH}$ levels is limited to two. Since only $f_{CLK}$ and $f_{CLK}/2$ are used, there is eventually no synchronization problem at the interface of the processor with the external systems.

It is seen from Fig. 5 that $f_{CLK}$ is used only 6% of the time while the processor is run at $f_{CLK}/2$ for 94% of the time. $f_{CLK}$ is still needed because the processor will run at $f_{CLK}$ for 100% of the time when the worst-case data comes, which is very unlikely and for most of the time, the workload is about a half on average. This tendency holds for other
applications such as MPEG2 decoding and VSELP voice codec.

Figure 6 shows the simulation result of a power comparison among fixed single \( V_{\text{TH}} \), dual-\( V_{\text{TH}} \) and \( V_{\text{TH}} \)-hopping cases for MPEG4 encoding. \( V_{\text{TH}} \)-hopping can reduce the power to 18% of fixed low-\( V_{\text{TH}} \) circuit and 27% of the dual-\( V_{\text{TH}} \) scheme in 0.5V \( V_{\text{DD}} \) regime.

In order to suppress the leakage power further, combining the \( V_{\text{TH}} \)-hopping scheme and the dual-\( V_{\text{TH}} \) scheme could be useful. Figure 7 shows the schematic of this scheme. In this scheme, \( V_{\text{TH}} \)-hopping is used only in the critical paths. On the other hand, \( V_{\text{TH}} \) of the non-critical gates is set to considerably higher value (\( V_{\text{THnon_crit}} \)), which is not changed for all the time.

As shown in Fig. 6, however, the above mentioned combination scheme hardly improves the power (only 1.5%) compared with the \( V_{\text{TH}} \)-hopping scheme. The reason is that the difference between the leakage power in the critical paths and the leakage power in the non-critical paths is small since the leakage power in the critical paths has already been suppressed by using \( V_{\text{TH}} \)-hopping. Therefore, it can be said that the scheme using only \( V_{\text{TH}} \)-hopping is the most effective.

**Measurement of RISC Processor with \( V_{\text{TH}} \)-hopping**

The above-mentioned scheme is a normal \( V_{\text{TH}} \)-hopping scheme, where \( V_{\text{THlow}} \) is achieved by zero back-gate bias and \( V_{\text{THhigh}} \) is obtained by applying negative back-gate bias. It is, however, also possible to obtain \( V_{\text{THhigh}} \) by zero back-gate bias and \( V_{\text{THlow}} \) by positive back-gate bias [6].

A small scale RISC processor with \( V_{\text{TH}} \)-hopping capability and the positive back-gate bias scheme is fabricated in a 0.6μm CMOS technology. The overhead of the \( V_{\text{TH}} \)-hopping scheme was 14%. This includes the additional \( V_{\text{BSP}} \) and \( V_{\text{BSN}} \) lines in the standard cell area. A microphotograph of the RISC processor appears in Fig. 8. The size of RISC core is 2.1 mm × 2.0 mm and the size of \( V_{\text{BS}} \) selector is 0.2mm × 0.6mm.

In order to design the processor, the conventional place and route (P&R) tool [7] was used without modifying standard cells. The only modifications are around the substrate/well contacts after the P&R is performed. The detailed process of the P&R for \( V_{\text{TH}} \)-hopping is shown in Fig. 9. First, P&R is executed using the conventional standard cells. In order to add metal lines for \( V_{\text{BSP}} \) and \( V_{\text{BSN}} \), the standard cells are placed at appropriate intervals, which can be done by using the conventional P&R tool with an appropriate parameter (see Fig.9(a)). Next, well contacts located on the \( V_{\text{DD}} \) line and substrate contacts located on the ground line are removed by using SKILL script [8] (see Fig.9(b)). Finally, the n-well pattern, p-well pattern, \( V_{\text{BSP}} \) lines, \( V_{\text{BSN}} \) lines and well/substrate contacts are added to the gap between the standard cells (see Fig.9(c)).
The advantage of this technique is the standard cells need not be modified at all. If the standard cells can be modified, the overhead could be reduced to 9%.

Figure 10 shows the measurement results and SPICE simulation results of the RISC processor. $V_{FW}$ is the positive back-gate bias voltage and $\Delta V_{FW}$ is the peak-to-peak $V_{FW}$ variation which is set to 0.1V ($\pm$10% of $V_{DD}$). In this condition, the worst delay occurs at the lowest $V_{FW}$ and worst power consumption is observed at the highest $V_{FW}$ due to the junction leakage current. The delay improves 29% at 0.9V $V_{DD}$ with 0.6V $V_{FW}$. Leakage caused by forward biased junction with the positive back-gate bias can be reduced by $V_{TH}$-hopping and 91% leakage power reduction is possible for such a case.

**Conclusion**

A threshold voltage hopping ($V_{TH}$-hopping) scheme is proposed where $V_{TH}$ is dynamically controlled through software depending on a workload of a processor. The $V_{TH}$-hopping scheme is shown to reduce the power to 18% of the fixed low-$V_{TH}$ circuits in 0.5V supply voltage regime for multimedia applications. $V_{TH}$-hopping is effective in the low $V_{DD}$ designs where $V_{TH}$ is low and the active leakage component is dominant in total power consumption.

A small-scale RISC processor with $V_{TH}$-hopping and the positive back-gate biased scheme is fabricated and 91% power reduction was possible compared with the fixed positive back-gate bias scheme.

**Acknowledgment**

The VLSI chip in this study has been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo with the collaboration by Rohm Corporation and Toppan Printing Corporation.

This work is carried out under Mirai-Kaitaku project.

**References**


