# SELF-TIMED CIRCUITRY BETBOSPECTIVE

Victor Zakharov, Yury Stepchenkov, Yury Diachenko, Yury Rogdestvenski

Federal Research Center "Computer Science and Control" of Russian Academy of Sciences

# CONTENTS

- Self-timed circuit what is it?
- Self-timed circuit features
- Self-timed Micro Core
- Self-timed Coprocessor
- Self-timed circuit optimization
- Conclusions

FRC CSC RAS

EnT-2020

### **CIRCUITS DESIGN METHODOLOGIES**



FRC CSC RAS

EnT-2020

# SELE-TIMER CIRCUIT FEATURES

#### Advantages:

- Stable operation under any operating conditions
- \* Wide range of workability
- Natural full self-checking concerning constant faults
- \* Lack of overhead hardware and energy costs associated with global clock tree

#### Drawbacks:

- \* Hardware redundancy
- Increased number of signals

FRC CSC RAS

EnT-2020

# SELF-TIMED CIRCUIT TYPES

#### **Quasi-Speed-Independent (QSI) circuits**

- Do not depend on cell's delays
- Critical path indication only

#### **Speed-Independent (SI) circuits**

- Do not depend on cell's delays
- Full indication
- Purely self-timed in isochronous zone

#### **Delay-Insensitive (DI) circuits**

- Do not depend on delays both in cells and wires
- Full indication

FRC CSC RAS

EnT-2020

## **ISOCHRONOUS ZONE**

Delay in any wire is less than the least delay of any library cell



#### In deep submicron CMOS process



FRC CSC RAS

EnT-2020

#### SPEED-INDEPENDENT CIRCUIT BASE PRINCIPLES

- Two-phase operation mode: working phase and spacer
- The use of dual-rail, biphasic and unary information signals
- Acknowledging the switching of all circuit cells by an additional indication subcircuit
- Handshake between subsequent functional blocks in the information processing path
- Unlimited circuitry basis

FRC CSC RAS

EnT-2020

#### DELAY-INSENSITIVE CIRCUIT BASE PRINCIPLES

NULL Convention Logic (NCL) is a typical representative of the DI circuits

- Two-phase operation mode: working phase and spacer
- The use of dual-rail only information signals
- Each cell indicates all own inputs
- Limited multi-threshold cell library (29 cells)

FRC CSC RAS

EnT-2020

### SI AND NCL CIRCUITS COMPARISON

|            | SI circuits                                                                                                                                                                               | NCL circuits                                                                                                                                               |
|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Advantages | <ul> <li>Less complexity (up to<br/>2 times for combinational<br/>circuits and up to 4 times<br/>for sequential ones)</li> <li>Unlimited cell basis</li> <li>Analysis software</li> </ul> | <ul> <li>✓ Simple indication<br/>(only circuit outputs<br/>should be indicated)</li> <li>✓ Easier design<br/>automation (BALSA,<br/>UNCLE)</li> </ul>      |
| Drawbacks  | <ul> <li>Additional indication<br/>subcircuit controlling all<br/>circuit cells</li> <li>Hard formalization for<br/>design automation</li> </ul>                                          | <ul> <li>Large complexity both<br/>of combinational and<br/>sequential circuits</li> <li>Less performance</li> <li>Larger power<br/>consumption</li> </ul> |

FRC CSC RAS

EnT-2020

#### SOFTWARE FOR SI CIRCUIT DESIGN DEVELOPED IN FRC "CSC"

Self-timing properties analysis: Library cell level analysis (BTRAN, ASIAN) Functional block level analysis (ASPECT) VLSI level hierarchical analysis (LIMAN) **SI circuit synthesis:** Custom and gate arrays base • Industrial standard cell libraries extended by self-timed combinational and sequential cells

FRC CSC RAS

EnT-2020



Gate Array

FRC CSC RAS

EnT-2020

DD

#### **Operation cycle duration for command sets**

|   | Operation<br>set          | Synch-<br>ronous,<br>ns | Speed-independent, ns |         |      |
|---|---------------------------|-------------------------|-----------------------|---------|------|
|   |                           | typical                 | worst                 | typical | best |
| 1 | Cyclic MUL                | 250                     | 166                   | 144     | 118  |
| 2 | Cyclic ROT                | 250                     | 121                   | 102     | 86   |
| 3 | Cyclic NOP                | 250                     | 111                   | 93      | 75   |
| 4 | Cyclic JUMP               | 500                     | 90                    | 78      | 66   |
| 5 | MUL + JUMP<br>+ NOP + ROT | 1248                    | 516                   | 440     | 364  |
|   |                           |                         |                       |         |      |

FRC CSC RAS

EnT-2020

#### Hardware complexity on Gate Array basis

|   | Hardware     | Synchronous,<br>gates | Speed-<br>independent,<br>gates |
|---|--------------|-----------------------|---------------------------------|
| 1 | Multiplier   | 177                   | 444                             |
| 2 | Shifter      | 52                    | 214                             |
| 3 | Counter      | 88                    | 159                             |
| 4 | Command RAM  | 230                   | 192                             |
| 5 | Control unit | 423                   | 380                             |
|   | Total        | 970                   | (1389)                          |
|   |              | $\smile$              |                                 |

FRC CSC RAS

EnT-2020



FRC CSC RAS

EnT-2020

### SI CIRCUIT EXAMPLES: COPROCESSOR



FRC CSC RAS

EnT-2020

### SI CIRCUIT EXAMPLES: COPROCESSOR

#### **Manufactured cases comparison**



Performance, MOPS

Die size, mm<sup>2</sup>

FRC CSC RAS

EnT-2020

### SI CIRCUIT EXAMPLES: COPROCESSOR

#### QSI and SI cases' performance, ns

|   | Condi               | tions | QSI case |        | SI case |         |
|---|---------------------|-------|----------|--------|---------|---------|
|   | U <sub>DD</sub> , V | Т, °С | DIV      | SQRT   | DIV     | SQRT    |
| 1 | 1.98                | -63   | 34.7     | 36.9   | 47.3    | 50.2    |
| 2 | 1.80                | 25    | 46.7     | 49.1   | 63.5    | 67.0    |
| 3 | 1.62                | 125   | 63.9     | 70.3   | 86.9    | 90.1    |
| 4 | 0.32                | 125   | 25 688   | 25 301 | 34 940  | 34 410  |
| 5 | 0.20                | 125   | -        | -      | 340 800 | 336 920 |

FRC CSC RAS

EnT-2020

### SI CIRCUIT STRUCTURE

Factors limiting performance:
Two-phase operation discipline
An indication subcircuit acknowledging switching completion into the current phase



FRC CSC RAS

EnT-2020

### **SI CIRCUIT PIPELINE**

#### **Traditional indication**



FRC CSC RAS

EnT-2020

### INDICATION SUBCIRCUIT STRUCTURE



FRC CSC RAS

EnT-2020

#### INDICATION SUBCIRCUIT OPTIMIZATION

- Bitwise indication of the Logic in the same stage Register
- Bitwise control of the previous stage Register



FRC CSC RAS

EnT-2020

#### INDICATION SUBCIRCUIT OPTIMIZATION

I<sub>L</sub> is a bitwise indication output of the Logic
 I<sub>R</sub> is a bitwise indication output of the Logic



I<sub>L</sub> delay is bigger than others input delays

I<sub>L</sub> delay is close to others input delays

FRC CSC RAS

EnT-2020

#### **BITWISE INDICATION RULE**

Each output controls only those inputs, which it depends on.

**Connectivity coefficient "K":** 



FRC CSC RAS

EnT-2020

#### WALLACE TREE BITWISE INDICATION



#### WALLACE TREE BITWISE INDICATION

For two-stage 54-bit Wallace tree:  $K_1 = 3; K_2 = 4$ 



FRC CSC RAS

EnT-2020

#### WT'S PIPELINE BITWISE INDICATION



#### WT'S PIPELINE BITWISE INDICATION

# Simulation results and hardware estimates (65-nm bulk CMOS process)

| Indication<br>case | Average<br>cycle<br>duration, ps | Complexity,<br>CMOS<br>transistors |  |
|--------------------|----------------------------------|------------------------------------|--|
| Classical          | 970                              | 220 000                            |  |
| Bitwise & group    | 710                              | 225 500                            |  |

FRC CSC RAS

EnT-2020



### CONCLUSIONS

- Speed-independent (SI) circuitry is justified primarily in areas where high operational reliability is a determining factor
- Experimental results proved SI circuits advantages in the workability range and performance. In real conditions, typical computing speed-independent units with a low bit width of processed data have a better performance of 1.7 – 2.6 times than their synchronous analogs
- The use of bitwise or group indication and control in multi-bit SI circuits significantly accelerates their work at the expense of a relatively small increase in hardware complexity

FRC CSC RAS

EnT-2020

# CONTACTS

- Address: Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of Russian Academy of Sciences, Russia, 119333, Moscow, Vavilov str., 44, building 2
- Director: academician Sokolov I.A.
  - Tel.: +7 (495) 137 34 94
  - Fax: +7 (495) 930 45 05
  - **E-mail: ISokolov@ipiran.ru**
  - Speaker: Diachenko Y.G., YStepchenkov@ipiran.ru Support

The research was carried out within the framework of state task No. 0063-2019-0010

FRC "CSC"

EnT-2020