## Chapter 5

# **Design of Footless Dual-Rail Domino Circuit**

### 5.1 Design Motivation

One of the first realizations of static differential CMOS logic known as the Differential Cascode Voltage Switch Logic (DCVSL) was introduced in 1984 [10]. Since then researchers have shown great interest in differential logic. This is due to its potential to efficiently realize complex logic functions such as XOR/XNOR and multiplexing units which form the basic building blocks for most datapath units. Also due to their dual-rail nature, they can be used to implement self-timed logic. A completion signal is generated when the two rails are different (i.e. after the switching is complete). Many changes to the basic DCVSL were proposed to improve its performance. The dynamic implementation of the DCVSL, Fig. 5.1, was shown to be fastest and most energy-efficient technique in [19], and has been used in several commercial microprocessors, including in our proposed microcontroller introduced in the previous chapters.

However, the presence of the foot transistor slows the gates somewhat, as it presents an extra series resistance. Removing this transistor, while functionally not forbidden, may result in static power dissipation and potentially a performance loss. To avoid the problems mentioned above, two constraints must be met: (1) gate changes to evaluation phase before valid inputs come; (2) gate changes to precharge phase only after inputs change to zero. Recently, some design techniques using delay elements [20] [21] for clocking, had been introduced to realize high performance footless domino circuits. However, using delay elements tends to increase design complexity as it is sensitive to the process-voltage-temperature variations (PVT). Others use data-driven technique where precharge signal is substituted by input signals [22] for footless domino circuit design. However, due to the extra load added to input signals, the circuit's speed performance does not benefit from the

footless design.

To benefit from the speed performance enhancement of removing the foot transistor without using delay elements, we have proposed a new footless dual-rail domino circuit with self-timed precharge scheme. Along with these, we have proposed the use of separator to achieve a whole footless dual-rail domino circuit. Section 2 reviews the conventional domino circuits. The proposed footless dual-rail domino circuit with self-timed scheme and the proposed separator are presented in section 3. Section 4 describes the performance evaluation results. Performance comparison is based on a NAND chain designed with different logic circuits implemented both in  $0.15\mu$ m SOI CMOS technology and 90nm bulk CMOS technology.

### 5.2 Conventional Domino Circuits

In this section, several conventional domino circuits with their own clocking schemes are briefly reviewed.

#### 5.2.1 Dynamic DCVSL Circuit

Fig. 5.1 shows a conventional dynamic DCVSL circuit. The operation of this circuit is divided into two major phase, namely precharge and evaluation phase, with the mode of operation is determined by the precharge signal. When goes low, all gates are precharged simultaneously. The precharge transistors Mp and the foot transistor Mn are turned on and off, respectively, and the outputs of the n-type dynamic gates are charged to VDD, and the outputs of the inverters are set to zero. When goes high, Mp and Mn are turned off and on, respectively, and the circuit enters the evaluation phase. The incoming data inputs may conditionally conduct the pull-down network (PDN) to discharge the dynamic gate, and the output of the inverter makes a low-to-high transition accordingly. One of the disadvantages of this kind of domino circuit is that the existence of foot transistor slows the gates somewhat, as it presents an extra series resistance. Moreover, simultaneous precharge may cause an unacceptable IR-drop noise.

#### 5.2.2 Delayed-Reset Domino Circuit

Fig. 5.2 illustrates the delayed-reset domino circuit (DR-domino) [20]. All domino gates are footless, except those gates connected to the primary inputs. Two benefits come from the usage of footless domino gates: improved pull-down speed and reduced precharge



Fig. 5.1: Dynamic DCVSL gate [10].

signal load. Elimination of the foot transistor does not affect the operation of the evaluation phase. However, simultaneous precharge will cause short-circuit current. To ensure a correct operation, the precharge signal's falling edge of a gate should be delayed until all its inputs going low. This is why consecutive logic stages are driven by a series of delayed precharge signals. One side benefit of such a delayed-reset scheme is that the peak of precharge current is reduced. However, the use of delay elements, together with the need of both footed and footless cell libraries tends to increase design complexity.



Fig. 5.2: The delayed-reset domino circuit [20].

#### 5.2.3 Dual-Rail Data-Driven Dynamic Logic (D<sup>4</sup>L)

D<sup>4</sup>L circuit uses input signals instead of precharge signal for correct precharge and evaluation sequencing [22]. Correspondingly, clock-buffering and clock-distribution problems can be eliminated. Furthermore, the foot transistor can be eliminated without causing a short-circuit problem. A D<sup>4</sup>L two-input gate is shown in Fig. 5.3. In this structure, a signal pair  $(B, \overline{B})$  is used for precharging corresponding gate, instead of a precharge signal. When the precharging wave reaches the input of D<sup>4</sup>L gate, set them to low and precharge the outputs to high. In the evaluation phase, one of the rails in  $(B, \overline{B})$  and  $(A, \overline{A})$  is set to high and prevent short-circuit between VDD and ground in this phase. However, due to the extra load added to input signals, the speed performance does not benefit from the footless design. Also, to ensure that there is no short-circuit problem during the precharge phase, the latest going low input signal pair needed to be choose for precharging sequence control.



Fig. 5.3: Dual-Rail Data-Driven Dynamic Logic (D<sup>4</sup>L) [22].

### 5.3 Footless Dual-Rail Domino Circuit with Self-Timed Precharge Scheme

The presence of the foot transistor in the conventional dynamic DCVSL circuit slows the gate somewhat, as it presents an extra series resistance. To safely remove the transistor, two constraints must be met: (1) gate changes to evaluation phase before valid inputs come; (2) gate changes to precharge phase only after inputs change to zero. We propose a footless dual-rail domino circuit with self-timed precharge scheme to realize a high performance footless domino circuit while meeting the constraints mentioned above. It is also expected that the peak of precharge current could be reduced due to the self-timed precharge scheme. Fig. 5.4 shows the AND/NAND gate of the proposed footless dual-rail domino circuit with self-timed precharge scheme. The self-timed precharge control logic consists of static CMOS inverters whose source of NMOS transistors are tied to input signals, which generate sub-precharge signals (PC1<sup>-</sup>PC4) from precharge signal P in cases of the corresponding input signals are zero. The PMOS precharge tree above the pull-down network (PDN) is used for precharging the corresponding gate.



Fig. 5.4: Footless dual-rail domino AND/NAND gate with self-timed precharge scheme.

#### 5.3.1 Conditions for Evaluation

When P goes low, the output nodes of inverters of control logic are precharged to VDD by the PMOS transistors (p1<sup>-</sup>p4), which in turn disable the PMOS transistors of the precharge tree respectively. The circuit would enter the evaluation phase once one of the PMOS transistors is turned off. During that time, all the NMOS transistors (n1<sup>-</sup>n4) of control logic are in cutoff region, which means that their gate-to-source capacitance (C<sub>GS</sub>)

are almost zero, as shown in Fig. 5.5. In other words, the channel capacitance exerted by the control logic upon the PDN is extremely small during the evaluation phase. This design technique is important as the capacitance determines the switching delay of a logic gate. Also, MOSFET capacitances can be divided into two main categories, channel capacitance and junction capacitance, as shown in Fig. 5.6. To reduce the junction capacitance, the circuit is implemented in silicon-on-insulator (SOI) technology. By implementing in SOI technology, the junction capacitance exerted by the control logic upon the PDN is significantly reduced, as shown in Fig. 5.7. Consequently, an almost single-transistor load per fan-in of PDN is expected to be achieved during the evaluation phase in the proposed circuit. For this reason, a high speed performance footless dual-rail domino circuit could be realized without sacrificing from the speed loss due to the load capacitance exerted by the control logic.



Fig. 5.5: Illustration of self-timed precharge control logic with NMOS cutoff in evaluation phase.

#### 5.3.2 Conditions for Precharge

For correct precharging sequencing, the proposed circuit is implemented with self-timed precharge scheme. When P goes high, the output nodes of inverters of control logic would turn low only after the input signals turn low, which in turn on the PMOS transistors of the



Fig. 5.6: Bulk MOS structure.



Fig. 5.7: SOI MOS structure.

precharge tree. This ensures that the circuit is only allowed to enter the precharge phase if all the input signals turn low, thus preventing short-circuit current. Another benefit of such a self-timed precharge scheme is that the peak of precharge current is reduced.

#### 5.3.3 Separator for Precharge Chain

We have proposed the use of separators to dividing the proposed footless dual-rail domino circuit into several precharge chains as shown in Fig. 5.8. By doing this, the different precharge chains can start precharging simultaneously without erroneous operation. This could shorten the precharge time especially for a long logic chain, but with a comparatively slight increase in evaluation time due to the longer propagation delay in separators. Fig. 5.9 shows the simulation results of precharge time and evaluation time of the proposed circuit with respect to number of separators used in 20-stage fan-out 8 (FO8) NAND chain. The simulation is based on  $0.15\mu$ m SOI CMOS technology.

Also, using the separator, all gates including those gates connected to the primary inputs can be footless gates. The direct benefit is that only one footless cell library is necessary to be developed. When P goes high, the PMOS transistor and NMOS transistor of separator are off and on respectively. The output node is pull-down to ground. This would initiate the precharging of the precharge chains simultaneously even when the primary inputs are not zero. When P goes low, the PMOS transistor and NMOS transistor of separator are on and off respectively. The output node is then equal to the input signal for evaluation.

#### 5.3.4 Performance Evaluation in SOI CMOS Technology

We have designed a 20-stage FO8 NAND chain (Fig. 5.10), with the proposed circuit, dynamic DCVSL, CPL and the conventional static CMOS for performance evaluation. The delay time is measured from the data input to the output. The separator is employed only in the first stage of the proposed circuit. The different circuits were fabricated in a test chip with  $0.15\mu$ m SOI CMOS technology (Fig. 5.11).

Measurement results (Table 5.1), show that the proposed circuit achieves 2.57, 1.72 and 1.12 times speed improvement over the circuit implemented with CPL, the conventional static CMOS and dynamic DCVSL, respectively. The result shows that the foot transistor is efficiently removed in the proposed circuit without speed degradation by the self-timed precharge control logic. Area of the proposed circuit, however, increased to double of the dynamic DCVSL due to the control logic and the PMOS precharge tree.



Fig. 5.8: Separator for precharge chain.



Fig. 5.9: Precharge time & evaluation time as a function of number of separators.



20-stage

Fig. 5.10: NAND chain as performance evaluation.



Fig. 5.11: Overview of chip fabricated.

| Table 5.1: | Measurement | results in | SOI | CMOS | technology. |
|------------|-------------|------------|-----|------|-------------|
|------------|-------------|------------|-----|------|-------------|

| T (EQ8)          | Delay ti      | me (VDD=1.0V)        | Area [um <sup>2</sup> ] | Wn/Wp   |  |
|------------------|---------------|----------------------|-------------------------|---------|--|
| Type (FO8)       | 20-stage [ns] | Ratio (proposed=1.0) | Alea [µm]               |         |  |
| Static CMOS      | 5.56          | 1.72                 | 5117                    | 1.0/1.8 |  |
| CPL              | 8.28          | 2.57                 | 11597                   | 1.0     |  |
| Dynamic DCVSL    | 3.61          | 1.12                 | 18017                   | 1.0     |  |
| Proposed Circuit | 3.23          | 1.00                 | 35294                   | 1.0     |  |

#### 5.3.5 Performance Evaluation in Bulk CMOS Technology

To evaluate the performance efficiency in bulk CMOS technology, we have also implemented the proposed circuit in ASPLA/STARC 90nm bulk CMOS technology. A 20-stage FO4 and FO8 NAND chain, with the proposed circuit, dynamic DCVSL [10], D<sup>4</sup>L [22], DR-domino [20], and the conventional static CMOS. The separator is employed only in the first stage of the proposed circuit. The different circuits were fabricated in a test chip with chip size of  $2.5x2.5mm^2$  (Fig. 5.12).



Fig. 5.12: (a) Circuit layout (b) overview of chip fabricated.

Fig. 5.13 and Fig. 5.14 show the measurement results of the delay time of the 20-stage FO4 and FO8 NAND chain, respectively.  $\Delta t$  is defined as the time-lapse of input signal after the domino circuit enter the evaluation phase. The delay time is measured from the data input to the output. Fig. 5.15 plots the delay time of 20-stage FO8 NAND chain as a function of fan-out. It shows that the proposed circuit not only maintains its speed performance superiority if compared to the dynamic DCVSL in bulk CMOS technology, it also achieves 5% speed enhancement if compared to the DR-domino in FO8 NAND chain. The efficiency of the new footless domino design in terms of speed performance is again being proved, especially in high fan-out circuit design. It is also worth to note that for DR-domino, it requires  $\Delta t$  of 4.5ns to shield against the latency due to its use of delay element before it could achieve stable, and high speed performance. We are expecting that, as the variations due to PVT is increased rigorously as the technology continue to scale down, the design of such timing-dependent footless domino circuit would become even more complicated. Area of the proposed circuit, however, is increased to 7.55, 1.91, 1.68,

and 1.14 times the area over the circuit implemented with the conventional static CMOS,  $D^4L$ , dynamic DCVSL, and DR-domino, respectively. Table 5.2 and Table 5.3 show the summary of the measurement results.



Fig. 5.13: Delay time of 20-stage FO4 NAND chain.



Fig. 5.14: Delay time of 20-stage FO8 NAND chain.

Fig. 5.16 and Fig. 5.17 plot the delay time of 20-stage NAND chain as a function of the power supply voltage. As could be expected, the performance depends linearly on the power supply voltage. The performance efficiency of different circuits due to power supply voltage is summarized in Table 5.4 and Table 5.5 for FO4 and FO8 NAND chain,



Fig. 5.15: Delay time of 20-stage FO8 NAND chain as a function of fan-out.

| Table 5.2: Measurement results of FO4 NAND chain | Table 5.2: | Measurement | results | of FO4 | NAND | chain. |
|--------------------------------------------------|------------|-------------|---------|--------|------|--------|
|--------------------------------------------------|------------|-------------|---------|--------|------|--------|

| Turna (EQ4)      | Delay Time (VDD=1.0V @ $\Delta$ t =10ns) |                      | Prochargo Timo [no]  | Area [um2] | Wn/Wn      |
|------------------|------------------------------------------|----------------------|----------------------|------------|------------|
| Type (FO4)       | 20-stage [ns]                            | Ratio (proposed=1.0) | Freenarge Time [iis] | Αιτα [μπ]  | •• ID •• P |
| Static CMOS      | 1.24                                     | 1.18                 |                      | 208        | 0.54/0.82  |
| $D^4L$           | 1.19                                     | 1.13                 | 3.14                 | 824        | 0.7        |
| Dynamic DCVSL    | 1.22                                     | 1.16                 | 0.16                 | 936        | 0.7        |
| DR-domino        | 1.03                                     | 0.98                 | 5.86                 | 1376       | 0.7        |
| Proposed Circuit | 1.05                                     | 1.00                 | 7.08                 | 1570       | 0.7        |

Table 5.3: Measurement results of FO8 NAND chain.

| Tring (EQ8)      | Delay Time (VDD=1.0V @ $\Delta$ t =10ns) |                      | Dracherge Time [ne]  | Amon [ | Wn/Wp     |  |
|------------------|------------------------------------------|----------------------|----------------------|--------|-----------|--|
| Type (FO8)       | 20-stage [ns]                            | Ratio (proposed=1.0) | Ratio (proposed=1.0) |        |           |  |
| Static CMOS      | 1.99                                     | 1.36                 |                      | 336    | 0.54/0.82 |  |
| $D^4L$           | 1.72                                     | 1.15                 | 4.06                 | 1751   | 0.7       |  |
| Dynamic DCVSL    | 1.65                                     | 1.11                 | 0.40                 | 1590   | 0.7       |  |
| DR-domino        | 1.56                                     | 1.05                 | 6.08                 | 2448   | 0.7       |  |
| Proposed Circuit | 1.49                                     | 1.00                 | 9.64                 | 2983   | 0.7       |  |

respectively. When the supply voltage goes down from 1.2V to 0.5V, the delay time of the proposed circuit is increased by a factor of 5.3 and 6.0 in FO4 and FO8, respectively. In contrast, the delay time of the dynamic DCVSL is increased by a factor of 7.8 and 6.9 in FO4 and Fo8, respectively. We can hence conclude that the proposed circuit performs much better than the dynamic DCVSL when the power supply voltage is scaled down, showing its potential in the future CMOS technology with lower power supply voltage.



Fig. 5.16: Delay time of 20-stage FO4 NAND chain as a function of power supply voltage.



Fig. 5.17: Delay time of 20-stage FO8 NAND chain as a function of power supply voltage.

The energy consumption of different circuits is shown in Fig. 5.18 by using transistorlevel circuit simulation. The evaluation of energy consumption is divided into two phase,

| Turne (EQ4)      | Delay Time [ns] |          |                                                  |
|------------------|-----------------|----------|--------------------------------------------------|
| Type (FO4)       | Vdd@1.2V        | Vdd@0.5V | Ratio( $\Delta$ delay fime/ $\Delta$ vdd) [ns/v] |
| Static CMOS      | 1.01            | 6.54     | 7.91                                             |
| $D^4L$           | 0.98            | 6.79     | 8.30                                             |
| Dynamic DCVSL    | 0.98            | 8.62     | 10.91                                            |
| DR-domino        | 0.86            | 5.74     | 6.97                                             |
| Proposed Circuit | 0.87            | 5.45     | 6.54                                             |

Table 5.4: Performance efficiency of FO4 NAND chain due to power supply voltage.

Table 5.5: Performance efficiency of FO8 NAND chain due to power supply voltage.

| $T_{\rm VPO}(EO8)$ | Delay Time [ns] |          | Detic(A delevitime(A Vdd) [ne/V]                                                                                      |
|--------------------|-----------------|----------|-----------------------------------------------------------------------------------------------------------------------|
| Type (FO8)         | Vdd@1.2V        | Vdd@0.5V | $\operatorname{Kano}(\Delta \operatorname{deray fine} \Delta \operatorname{vou})[\operatorname{ns} \operatorname{v}]$ |
| Static CMOS        | 1.63            | 10.82    | 13.13                                                                                                                 |
| $D^4L$             | 1.36            | 9.69     | 11.90                                                                                                                 |
| Dynamic DCVSL      | 1.28            | 10.07    | 12.56                                                                                                                 |
| DR-domino          | 1.27            | 8.30     | 10.04                                                                                                                 |
| Proposed Circuit   | 1.19            | 8.30     | 10.16                                                                                                                 |

namely evaluation phase and precharge phase. For each phase, energy consumed in a period of 10ns is used for calculation. The energy consumption of the proposed circuit is 1.7 and 1.1 times over the dynamic DCVSL in evaluation phase and precharge phase, respectively.



(b) Precharge Phase

DR-domino

**Proposed Circuit** 

Fig. 5.18: Enery consumption of 20-stage FO8 NAND chain.

DCVSL

D4L

## **Chapter 6**

# Conclusions

### 6.1 Microcontroller with Completion Detection Capability

In this paper, a design method for designing 8-bit non-pipeline microcontroller with completion detection capability is presented. By using dual-rail domino circuit, completion and error detection can be sensed; hence eliminating the need of globally distributed clock. Also, we have proposed an adaptive self-recovery mechanism to enhance the reliability of the microcontroller. The microcontroller is based on the instruction set of Z80 microcontroller with multiplexer architecture. To enhance the power efficiency of the microcontroller, low power design techniques suitable for dual-rail logic such as selective-evaluation, and new multiplexer based on bulb and junction structure have been proposed. With the low power design efforts, the energy consumption and the peak precharge current is reduced by 53% and 68% respectively if compared to the one without low power design consideration.

To evaluate the performance efficiency of the microcontroller, it is implemented with Rohm  $0.35\mu$ m CMOS technology with chip size of  $4.9 \times 4.9$ mm<sup>2</sup>. The measurement results reveal that the fabricated chip functionally works correctly, with an average speed performance of 23.3ns for evaluation time, and it needs 2.2ns for precharge time at nominal power supply voltage of 3.3V. Besides, it nicely demonstrates the two properties that we want to exploit in the microcontroller, namely the average-case performance and the performance adaptation to the physical properties.

The microcontroller senses when a computation has completed, allowing it to exhibit average-case performance regardless of propagation delay variations due to instruction dependency, data dependency and inter-chip variability. Measurement results reveal that the microcontroller achives a speed improvement of 17% over its synchronous counterparts

in terms of instruction dependency. Also, the average-case performance achieves 1.17, 1.17, 1.02, and 1.04 times performance gain over the worst-case performance with the addition, subtraction, AND, and OR instruction, respectively. In addition, extra performance improvement of 4.4% and 2.9% is achieved in instruction dependency measurement and data dependency measurement, respectively, when the influence of inter-chip variability is taking into account.

It is easier to vary supply voltage in the microcontroller, since there is no need to coordinate simultaneous variation of the clock frequency. The microcontroller will run as quickly as the current physical properties allow. Also, the performance variation increased rigorously as the power supply voltage is scaled, causing a more severe timing degradation in worst-case than the average-case situation. We can hence conclude that the microcontroller with average-case performance would perform better than its synchronous counterparts when operating in low power supply voltage.

With the speed performance advantages obtained from the average-case performance in the  $0.35\mu$ m CMOS technology, together with the automatic adaptation to the physical properties, we are expecting that as we scale down technology to the sub-100nm feature size, a larger performance gain with higher reliability against the variations due to PVT would be achieved in the microcontroller with completion detection capability compared to its synchronous counterparts.

### 6.2 Footless Dual-Rail Domino Circuit

Along with these, this paper presents a new footless dual-rail domino circuit that efficiently combines a footless dynamic circuit technique with a robust self-timed precharge scheme for high performance VLSI circuit design. With the self-timed precharge scheme, the use of delay element for timing reference could be avoided. As the variations due to PVT increased rigorously as the technology continue to scale down, the design of timingdependent footless domino circuit would become even more complicated. Besides, the proposed circuit achieves a whole footless dual-rail domino circuit with the use of the proposed separator.

A 20-stage NAND chains are implemented both in  $0.15\mu$ m SOI CMOS technology and 90nm bulk CMOS technology for performance evaluation. Measurement results reveal that the proposed circuit achieves 1.72, 2.57, and 1.12 times speed improvement over the circuit implemented with the conventional static CMOS, CPL, and the dynamic DCVSL, respec-

tively in 20-stage FO8 NAND chain implemented in  $0.15\mu$ m SOI CMOS technology. Also, the proposed circuit achieves 1.36, 1.15, 1.11, and 1.05 times speed improvement over the circuit implemented with the conventional static CMOS, D<sup>4</sup>L, dynamic DCVSL, and the DR-domino, respectively in 20-stage FO8 NAND chain implemented in 90nm bulk CMOS technology.

Also, the proposed circuit performs much better than the dynamic DCVSL when the power supply voltage is scaled down, showing its potential in the future CMOS technology with lower power supply voltage.

## Acknowledgement

I would like to express my sincere gratitude to Prof. Kunihiro Asada for his keen insight, guidance, encouragement, and faith in me throughout my graduate studies. His enthusiasm for teaching and research offered challenging opportunities to express my creativity without barriers, and his constant support and fruitful discussion on my research, since my undergraduate years, led me to become a full-fledged member of society and brought my research to success.

I am deeply grateful to Prof. Makoto Ikeda for meaningful discussion on my research and for making many opportunities for my chip fabrication. He was willing to spend time providing a comfortable environment to promote my research progress, and his constructive support was indispensable for making my research activities successful.

I am thankful to all the colleagues in Asada-Ikeda laboratory for their helpful advice, heartfelt encouragement, comfortable research circumstance and pleasant time: in particular, Dr. Ruotong Zheng, for his generous assistance for establishing chip design environment; Dr. Satoshi Komatsu and Dr. Masahiro Sasaki, for their practical advices on chip test and analysis; Mr. Tetsuya Iizuka, for his contribution as network administrator of the laboratory; Mr. Ken Ishii and Mr. Taku Sogabe, the co-researches of this project.

I would like to thank all the members of VLSI Design and Education Center (VDEC), the University of Tokyo, for their support in chip fabrication. The VLSI chips in this study have been designed with CAD tools for Synopsys, Inc., Mentor Graphics, Inc., and Cadence Design Systems, Inc., and fabricated through the chip fabrication program of VDEC, in collaboration with Rohm Corp., Oki Electric Industry Corp., Semiconductor Technology Academic Research Center (STARC), and Toppan Printing Corp.

Finally, I would like to express my greatest appreciation to my parents, Dia Swee Tong and Yeap Chong Mooi, for their constant support and encouragement in my life.

## References

- [1] T. Masuhara, "Design and Wireless," ITRS Press Release, 2004.
- [2] S. R. Nassif, "Within-chip variability analysis," in Proc. of IEEE International Electron Devices Meeting (IEDM), pp. 283-286, 1998.
- [3] K. A. Bowman, J. D. Meindl, "Impact of Within-Die Parameter Fluctuations on Future Maximum Clock Frequency Distributions," in Proc. of IEEE Custom Integrated Circuits Conference (CICC), pp. 229-232, 2001.
- [4] A. Davis, and S. M. Nowick, "An Introduction to Asynchronous Circuit Design," Technical Report UUCS-97-013, Computer Science Department, University of Utah, 1997.
- [5] K. R. Heloue, and F. N. Najm, "Effect of Statistical Clock Skew Variations on Chip Timing Yield," in Proc. of IEEE-NEWCAS Conference, pp. 211-214, 2005.
- [6] M. Hashimoto, T. Yamamoto, H. Onodera, "Statistical Analysis of Clock Skew Variation in H-tree Structure" in Proc. of IEEE International Symposium on Quality Electronic Design (ISQED), pp. 402-407, 2005.
- [7] A. Forestier, and M. R. Stan, "Limits to Voltage Scaling from the Low Power Perspective," in Proc. of Symposium on Integrated Circuits and Systems Design, pp. 365-370, 2000.
- [8] V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, "Reducing Power in High-performance Microprocessors," in Proc. of IEEE Design Automation Conference (DAC), pp. 732-737, 1998.
- [9] S. Hauck, "Asynchronous design methodologies: An overview," Univ. Washington, Tech. Rep. UW-CSE-93-05-07, 1993.

- [10] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: A differential CMOS logic family," in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), pp. 16-17, 1984.
- [11] D. A. Rennels, and H. Kim, "Concurrent Error Detection in Self-Timed VLSI," in Proc. of International Symposium on Fault-Tolerant Computing, pp. 96-105, 1994.
- [12] A. Ejnioui, and A. Alsharqawi, "Pipeline-Level Control of Self-Resetting Pipelines," in Euromicro Symposium on Digital System Design (DSD), pp. 342-349, 2004.
- [13] S. Komatsu, M. Ikeda, and K. Asada, "Low Power Microprocessors for Comparative Study on Bus Architecture and Multiplexer Architecture," in Proc. of IEEE Asia and South Pacific Design Automatic Conference (ASP-DAC), pp. 323-324, 1998.
- [14] 額田忠之 著,「Z80 ファミリ・ハンドブック」, CQ 出版, 1991 年
- [15] N. Li, "二線式ダイナミック論理による低雑音プロセッサの設計," 卒業論文, 東京 大学電子工学科, 2003.
- [16] R. Karri, k. Hogstedt, and A. Orailoglu, "Computer-Aided Design of Fault-Tolerant VLSI Systems," in Proc. of IEEE Design & Test of Computers, pp. 88-96, 1996.
- [17] P. Larsson and C. Svensson, "Noise in digital dynamic CMOS circuits," IEEE Journal of Solid-State Circuits, Vol. 29, No. 6, June 1994.
- [18] K. L. Shepard and V. Narayanan, "Noise in Deep Submicron Digital Design," in Proc. of IEEE International Conference on Computer-Aided Design (ICCAD), pp. 524-531, 1996.
- [19] P. Ng, P. T. Balsara, and D. Steiss, "Performance of CMOS Differential Circuits," IEEE Journal of Solid-State Circuits, Vol. 31, No. 6, pp. 841-846, June 1996.
- [20] P. Hofstee, et al., "A 1 GHz Single-Issue 64b PowerPC Processor," in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), pp. 92-93, 2000.
- [21] J. Wang, S. Shieh, C. Yeh, and Y. Yeh, "Pseudo-Footless CMOS Domino Logic Circuits for High-Performance VLSI Designs," in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. 2, pp. 401-404, 2004.
- [22] R. Rafati, A. Z. Charaki, G. R. Chaji, S. M. Fakhraie, and K. C. Smith, "Comparison of a 17b Multiplier in Dual-Rail Domino and in Dual-Rail D3L (D4L) Logic Styles,"

in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), Vol. 3, pp. 257-260, 2002.

# **List of Publications**

### **International Conferences**

 K. H. Dia, R. Zheng, M. Ikeda, and K. Asada, "Footless Dual-Rail Domino Circuit with Self-Timed Precharge Scheme," *in Proc. of IEEE Asian Solid-State Circuits Conference (ASSCC)*, pp.309 – 312, Nov. 2005.

### **Domestic Conferences and Meetings**

- K. H. Dia, R. Zheng, M. Ikeda, and K. Asada, "Footless Dual-Rail Domino Circuit with Self-Timed Precharge Scheme in SOI Technology," *IEICE Technical Report*, Vol. 105, No. 569, pp. 47 – 51, Jan. 2006.
- K. H. Dia, R. Zheng, M. Ikeda, and K. Asada, "Design of Completion Detection Style Microcontroller with Dual-Rail Domino Circuit," *in Proc. of IEICE National Convention 2006*, Mar. 2006. (to appear)