

# International Journal of

Information Technology & Computer Engineering



Email: ijitce.editor@gmail.com or editor@ijitce.com



Volume 9,Issue 2,June 2021

# Sub-threshold circuit standby and active energy optimization at the same time is possible thanks to this technique

<sup>1</sup>Dr.THOTA SRAVANTI, <sup>2</sup>LAXMAN BAVANDLAPALLY <sup>3</sup>B.MADHUKAR, <sup>4</sup>G.LAKSHMIKALA

### **ABSTRACT**

Leakage current increases dramatically when CMOS circuits are downscaled in terms of feature size and threshold voltage. Consequently, decreasing leakage power is a critical design issue as technology goes up, in both active and standby modes. For 22 nm sub-threshold CMOS circuits, this work proposes a combined active and standby energy optimization mechanism. When it comes to active energy consumption per cycle, a dual threshold voltage design is the first option to be explored. For non-critical routes, slack-based evolutionary algorithms can assign reverse body bias for least active energy per cycle and highest frequency at optimal supply voltage. In this study, a lower triangular encoder for IEEE 802.11n wireless LAN is developed with a block length of 648 and a coding rate that is half as fast. At 301.433MHz, the LDPC encoder's hardware implementation runs at 12.12 Gbps.

### 1. INTRODUCTION

Every generation, the leakage power in modern digital CMOS circuits gets more and more spectacular. Power reduction has recently become an essential design consideration as a result of these scaling issues in technology. considerable delay is caused by the exponential relation between voltage and current in the subthreshold region, while abovethreshold action is affected by the power law of a power law.[1].

Popularity of error correction codes

based on low-density parity-check codes has grown as a result of breakthroughs in VLSI, recent MacKay and Neal, and Gallager's original idea from the early 1960s. They all use or plan to use LDPC codes in their present and future communications systems: WLAN, Mobile WiMAX (IEEE802.11n), DVB-Sand, 10GBaseT. (IEEE802.3an). The significant encoding difficulty of LDPC codes overcomes any speed or decoding complexity benefits they may have.

M.Tech, Ph.D, Associate Professor, sravanti815@gmail.com ,M.Tech Assistant Professor, lax406@gmail.com M.Tech Assistant Professor, enlightment2035@gmail.com , M.Tech Assistant Professor, lakshmikala18051991@gmail.com Department: ECE Pallavi Engineering College Hyderabad, Telangana 501505



This study employed QCLDC codes. Quasi-Cyclic is a simpler alternative to LDPC with the same performance. It is possible for the EPC (power-delay product) operating circuits below the threshold to be extremely high even while power consumption is low, due to the substantial delay in these circuits. To establish if this design works or not, the EPC is the most important factor. Using Equation (3), the EPC for an N-gates circuit may be calculated with reasonable accuracy. Both terms (dynamic and leaky) in Equation (3) are defined by their first and second terms, respectively.

N

$$E = \Sigma \ 0.5 \propto (i).$$

$$C(i). \ Vdd^2 +$$

$$Pleak (i) . T (3)$$

$$i$$

$$=$$

$$1$$

where I represents the ith node's switching activity., C(i) is the capacitance of the i<sup>th</sup> gate, The ith gate's leakage power, Pleak(i), is equal to the critical path delay, T.

Several studies have been done to improve power efficiency in sub-threshold circuits. [5-9] An Auburn University research group presented work on dual supply sub-threshold circuit design. A slack-based method was devised by the authors in [6] to save the most energy possible per cycle. It was developed in [8] that MILPs (mixed)

integer linear programming) may be used in a way that would obviate the need for level converters in sub-threshold devices. A typical way of decreasing leakage power consumption in above- and below-threshold activities is dual-Vth design [4]. A high threshold voltage may be used in some off-critical-path gates in order to decrease power consumption without sacrificing critical path delays. Because of this, dual threshold circuits are useful.

algorithms Many have been proposed for power optimization in circuits that are above the threshold, using a dual threshold approach [11-14]. Power consumption can be minimized via linear programming (LP) within certain constraints, such as circuit speed, gate slack and delay. For 32nm sub-threshold threshold circuits. the dual technique was developed in [4]. Slack-based heuristic algorithm was used in [4].

# 1. CHECK CODES FOR LOW-DENSITY PARITY

It is common for LDPC codes, or linear block decoding codes, to be shortened by the codeword's length (n), the message's length (k), the column weight (wc), and wr, row weight (i.e. the number of non zero elements in a row of the parity-check matrix).

# LDPC codes have the following characteristics:

Data is encoded using the codeword and message bits, which both have the same amount of nonzero elements: n (the paritycheck matrix's total number of nonzero rows and columns) and wr (count of rows with at least one nonzero element) (i.e. in a row of



the parity-check matrix the number of nonzero entries).

# Parity-check

Parity-check matrices (H) are used to represent LDPC codes because they meet Equation 1.

$$Hx^{T} = 0$$

Where x is a secret code phrase.

# **Low-density**

A sparse matrix, H is one with a smaller number of '1s relative to '0s. Sparsity in H is a key factor in ensuring the low computational complexity of H.

## **Codes in the form of LDPCs**

"Tanner graph" refers to the graphic representation of LDPC code in graph form. As shown in Figure 1, the LDPC code parity check matrix and Tanner graph for the value 12 and 6 can be seen. Two sets of nodes are used in Tanner graph: check nodes and variables. H's parity check matrix's rows and columns can be translated into node units variable node units, respectively. It is only when the H matrix contains a value of 1 that the check node is connected to the variable node. The mutual information can be sent between check nodes and variable nodes by means of the connected nodes in the network.

In CMOS circuits, the source-bulk bias voltage given in Equation is commonly used to alter MOSFET transistor threshold voltages (4)

$$V_{th} = V_{tho} + \gamma(\sqrt{1 - 2\phi_F} + V_{SB} + \sqrt{2\phi_F})$$

### ISSN 2347-3657

# Volume 9,Issue 2,June 2021

It has VSB as its source-bulk biased voltage; 2F as its surface potential parameter; and as its body effect parameter. In order to raise the threshold voltage of a MOSFET, the RBB approach applies a

(1)

negative voltage across the source-to-substrate p—n junction, while zero RBB is utilized in the design of low Vth logic gates, as shown in Figure 1. Low-voltage NAND gate with zero RBB is shown in Figure 1.a by grounding the fourth terminal of NMOS transistor and connecting the fourth terminal of PMOS transistors to Vdd. Non-zero RBB in Figure 1.b raises the threshold voltage by using the same gate [4, 16].

A MOSFET's threshold voltage is raised by increasing the RBB voltage across its source/substrate p—n junction; this raises the RBB voltage across its reverse biased source/body and drain/body p—n junctions, resulting in an increase in tunneling leakage current at these junctions.

# 2. STATIC TIMING ANALYSIS AND CIRCUIT MODELING

Classifying logic gates according to gate type and number of inputs is a basic step in our design technique (AND2, AND3, AND4, OR2, etc.) Various fan-outs are explored for each of these gates to ensure that the circuit can handle any situation. When HSPICE is utilized to accurately



simulate each gate type, a PDC library for all gates may be generated as illustrated in Figure 2. The library contains information on node capacitance, leakage power, and gate delay, as well as the voltage drop across the drain, drain resistance, and collector voltage (Vdd and Fan-out). The library content (Vdd and RBB) is utilized to calculate and compare total power and delay values throughout the assignment process.

The supply voltage must be varied during the simulations (Vdd). In the case of RBB=0, the maximum delay (DL) for each gate is created by a Vdd and fan-out value specifically determined for that RBB value (DH). Change the less threshold voltage (RBB=0) to the upper (RBB=0)threshold voltage without maximizing the circuit's latency may be identified using these parameters. When simulating a combinational gate-level circuit, DAGs are employed. Figure 3 illustrates the formula for G: (V, E). Sets of nodes (vertices) and sets of directed edges (edges) from the inputs to the outputs of logic gates (E) are shown in the diagram. PIs are located in node 1, whereas POs are located in node 0 of the network (PO).

Analysis of static timing provides information on arrival times and calculates the actual and needed times for gate exits. STA is also used to figure up each gate's critical route delay (T) and slack time (S) (slk). For each gate, the delay margin may

# Volume 9,Issue 2,June 2021

be determined by comparing the critical path delay to this gate's maximum path delay.' Setting (maximum path delay) limitations necessitate that AAT (v) RAT (v) for the chip to function properly.

An estimation of the switching activity at each node is required before a comparison can be made and evaluate the EPC in line with Equation (3). From the probabilistic estimation of switching activity at each node, root inputs for each gate may be derived using DAG analysis.

# Incorporating low-density PARITY CHECK CODES INTO THE DESIGN

The QC-LDPC encoder design, which employs the Richardson-Urbanke encoding technique and uses a half-speed code rate, a packet length of 648 bytes, and a sub-block size of 27, will be described as part of the IEEE 802.11n standard in the next days. The employment of cycles-shifters and blockmemories is commonplace in many QC **LDPC** encoder designs. Hardware optimization and high throughput achieved as a result of this work, which eliminates all of the above mentioned issues. Constant binary matrices are used to reduce hardware multiplication. IEEE 802.11n's QC-LDPC encoder has an architecture depicted in Fig. 5 (rate = 1/2 and subblock size z = 27).





Figure 5. QUALDPC encoder for IEEE

2.1 1n

80

(ra te = 1/2

an d su

blo ck siz

b-

= 27) arc hit

e z

ect ure

In the above architecture, GF (2) is utilized to add binary LDPC codes (i.e.modulo-2 addition). It is thus

necessary to utilize XOR operators to construct all of these blocks. All of the blocks' implementation requirements are not set in stone. Since there are fewer gates and hence fewer blocks of memory, the hardware is simpler.

# Phase 1- Active Mode Energy Optimization

Sub-threshold circuits in the 22nm size can be reduced by using a dual threshold voltage design in this phase. The ideal voltage and dual threshold voltages for the SBGA are shown in Table 2. In order to reduce power consumption, some off-critical gates must have a high threshold voltage set for them (RBB 0). The STA in Table 1 may assign dual threshold voltages to sub-threshold circuit gates without affecting critical path delay at Vddopt (see Figure 1). A look at Table 2 reveals that this method is based on a hybridization of genetic algorithms and the STA algorithm (Table 1).

RBB0 and RBB0 are used in the construction of each gate, as stated in the preceding section (as given in Figure 1 for NAND2). Figure 2 shows the results of testing the PDC library with various fanout, RBB, and supply voltage settings. SBGA automatically assigns Vth-low design for important pathways gates so





that performance is maintained. As a result, SBGA discovers an ideal collection of off-critical path gates that may be allocated to Vth-high design without compromising performance.

During step 8, a subset of nodes must kept at Vth-low levels. This node's slack time is below the lower bound (SL). As seen in the following equation, when a gate's delay is altered from DL to DH, the gate's slack turns negative (4). The longest route delay surpasses the initial circuit delay T, hence negative slack is banned in approach.

Table 2. Slack Based Genetic Algorithm (SBGA)

- 1.  $\forall$  Vdd  $\in$  {Vdd \_set} do
- 2.  $\forall RBBj \in \{RBB \_set\} do$
- 3. Design to all gates As Fix RBB=0
- Delay and slack of every single gate DL(v) and slk(v), Calculate using the STA method: T,
- 5. For all gates, Fix RBB=RBBj
- 6. Delay of evey gate DH(v) and search using the STA algorithm: T<sub>high</sub>
- 7. Compute :  $k = T_{high}/T$  $\Delta(v) = DH(v) - DL(v)$ ;

slack top bound SU= (k-1).T/k and

slack lower bound SL=min  $\{\Delta(v)\}$ 

8. If slk (v)  $\leq$  SL then v  $\rightarrow$  {Low\_Set} the indexing is {idx1} =0's

and number of nodes in the set =N1

## Volume 9,Issue 2,June 2021

9. If slk (v)  $\geq$ SU then v  $\rightarrow$  {High\_Set} the indexing is {idx2} =1's and number of

nodes in the set =N2

- 10. Mixing 3 sets yield :
   {Low\_Set} , {High\_Set} and
   { Rem\_set}
- 11. Claim STA algorithm: find inst. critical pathway slow process Tx ,and energy Ex
- 12. Compute fitness: Ex.  $Tx = (\Sigma^{N} 0.5 . \alpha(i). C(i). Vdd^{2} + Pleakage(i). Tx). Tx$
- 13. Find which of the chromosomes is the low

fitting: Apply choice

- 14. Claim cross-over and mutation : Latest generation pop\_new
- 15.  $\forall$  Mx  $\in$  pop\_new: again repeat steps (14-18) until stop
- 16. End RBB kink
- 17. End -Vdd kink

Nodes that can transformed to a (Vth-high) design without degrading the critical path's latency are included in the "High Set" in step 9. Slack time exceeding the upper limit (SU) established in step 7 is a feature of these nodes.

# 4. RESULTS AND DISCUSSION

Nearly all wireless communication protocols employ QC-LDPC, or quasi-cyclic low-density parity check. Figure 6 depicts how a shift matrix rotates or shifts the IEEE 802.11, N=648 bits, R=12 identity submatrix.

For the IEEE 802.11 QC-LDPC parity check, the codeword length is n = 648 bits,



the bit rate is 1/2, and the block dimension is Z = 27. The total amount of Virtex-5 FPGA space occupied is shown in Table IV. Encoders working at 12.12 Gbps can reach 301.433MHz clock speed.

Table 4: Consumption of Xilinx Virtex-5 FPGA resources for the suggested encoder

|                           | improve encur performa |            |
|---------------------------|------------------------|------------|
| Resource                  | Number                 | Usage Rate |
| Number of slice registers | 2906                   | 4%         |
| Number of slice LUTs      | 1335                   | 1%         |
| Number of Fully used LUT- | 1164                   | 37%        |
| FF Pairs                  |                        |            |
| Number of bonded IOBs     | 64                     | 10%        |
| Number of BUFG/BUFGCTRLs  | 1                      | 3%         |

Comparing the current IEEE 802.11n LDPC encoder to the new suggested design is shown in Table 5. In terms of encoder throughput, hardwired encoders exceed standard cyclic shifters and block memory. The suggested architecture has no barrel shifters, hence it is simpler and more efficient in terms of area.'

At the ideal RBB (0.7 V), which corresponds to the 16-bit RCA's clock frequency of 16-bits per second, the active mode energy savings of 42.28 percent are realized (10.91 MHz), 16-bit RCA circuit contains multiple off-critical routes, and the variation in route length between and non-critical channels is critical substantial. As a result, RBB may be provided 60.2% of the gates at 0.7V. Comparison of optimized and unoptimized analog circuits is shown in Figure 6. At Vddopt (0.32V), a nonoptimized 16-bit RCA circuit consumes 4.29 fJ. (12.041 MHz). As demonstrated

#### .....

# Volume 9,Issue 2,June 2021

in Figure 5, when a non-optimized circuit is run at the same frequency as an optimized one, the EPC drops by 41.37% at Vdd (0.32V) (12.041 MHz). A commonality throughout all of them is that they are all simulations. Designing circuits with lower efficiency beyond Vddopt is one option, while another is to improve circuit performance as a result.

As a result, the operational frequency is low due to the significant crucial route delay.





Figure 5. Single and dual Vth design EPC of a 16-bit RCA circuit

# 5. CONCLUSIONS

a result, deep submicron technologies may be required even in standby mode. Standby leakage current at the sub-threshold can have a significant influence on a design's energy efficiency. Devices with a non-zero standby power consumption of 22nm subthreshold are the focus of this study. Runtime power consumption can be reduced by using a dual threshold voltage architecture. To achieve low EPC at the highest allowable frequency, a slack-based evolutionary technique is employed to find the best RBB or Vth (Vddopt). For 22nm subthe threshold circuits. dual threshold voltage design is still critical for reducing power consumption. The 16-bit RCA achieves 42.28 percent more slack time utilization than the nonoptimized circuit. However, the 74L85 circuit reduces EPC the least (14.58 percent ). Additional EPC reductions of 21 percent and 15.27 percent are obtained by the 74283 and ALU74181 benchmark circuits. On- and off-critical pathways have different amounts of idle time, which EPC reduction attributes to circuit architecture.

An IEEE 802.11n standard stipulates half the coding rate, 648 codewords, and z = 27 sub-block size. In contrast to earlier solutions, the suggested encoder is simple,

# Volume 9,Issue 2,June 2021

parallel, and has a high throughput. This encoder does not utilize cyclic shifters or block memory, and instead depends on multiplication using the constant binary matrix approach to achieve rapid encoding speeds. At 301.433 MHz, the QC-LDPC encoder can process 12.12 Gbps.

## REFERENCES

- Rani NG, Kumar NP, Charles BS, Reddy PC, Ali SM. Design of Near-Threshold CMOS Logic Gates. International Journal of VLSI Design & Communication Systems. 2012 Apr 1;3(2):193.
- 2. A. Wang, B. H. Calhoun; Chandrakasan, A. P. Subthreshold Design for Ultra Low-Power Systems, 1st ed.; Springer: New York, NY 10013, USA, 2006.
- 3. Ahmed R. ADAPTIVE SUPPLY VOLTAGE MANAGEMENT FOR LOW POWER LOGIC CIRCUITRY OPERATING AT SUBTHRESHOLD.
  International Journal of VLSI Design & Communication Systems. 2015 Apr 1;6(2):1.
- Yao, J. Dual-Threshold Voltage Design of Sub-Threshold Circuits, Doctoral dissertation-Auburn University, USA, 2014.
- 5. Kim, K. Ultra Low Power CMOS Design, Doctoral dissertation-Auburn University, USA, 2011.
- 6. R. G. Gallager,(1962)"Low-density parity check codes.', IRE Transaction Info.Theory, Vol.8,No.1,pp 21–28.





- 7. D. J. C. MacKay and R. M. Neal, (1996) "Near Shannon Limit Performance of Low Density Parity- Check Codes," Electronics Letters, Vol. 32 No. 18, pp. 1645–1646.
- 8. Thomas J. Richardson and Rudiger L. Urbanke, (2001) "Efficient encoding of low-density parity-check codes.", IEEE Transactions on Information Theory, Vol 47 No 2,pp. 638–656.
- 9. Huxing Zhang, HongyangYu,"Multirate QC-LDPC Encoder.", IEEE Circuits and Systems International Conference on Testing and Diagnosis, 2009, pp 1-4.
- 10. Georgios

Tzimpragos, Christoforos Kachris and Dimitrios Soudries "A low-complexity implementation of QC-LDPC encoder in reconfigurable logic.", International Conference on Field programmable Logic and Applications.; ,2013, pp 1-4.