



## Towards the Integration of Voltage Regulators in Server Applications

PwrSoC, October 19<sup>th</sup>, 2018

<u>Pedro A. M. Bezerra<sup>1</sup></u>, Florian Krismer<sup>1</sup>, Johan W. Kolar<sup>1</sup>, Arvind Sridhar<sup>2</sup>, Thomas Brunschwiler<sup>2</sup>, Thomas Toifl<sup>2</sup> <sup>1</sup> Power Electronic Systems Laboratory (PES), ETH Zurich, Switzerland <sup>2</sup> IBM Research, Ruschlikon, Switzerland

The complete version of the slides are available at the PES-ETH website: <a href="https://www.pes-publications.ee.ethz.ch/publications/conferences/">https://www.pes-publications.ee.ethz.ch/publications/conferences/</a>



## **Outline**

- Target application
- Motivation to use IVRs
- **System specifications and target achievements**
- ► Components' model
- ► Optimization procedure
- Preliminary experimental results
- Conclusion and outlook



2/23

ETH zürich

## **Point of Load Conversion for Server Applications**

3/23



# Why going for IVRs in Modern Microprocessor Applications?

Microprocessor's Package Allow for considerably **Microprocessor Chip** Demand **Overhead** energy savings FIVR1 Supply voltage 0.6-1.1 V VD1 **Dynamic Voltage and** ~250 A Frequency Scaling (DVFS) Reduced number of FIVR2 VD1 VD2 VD2 VD3 VD3 VD1 VD2 1.1 V interconnects to the Workload 1 Workload 2 ~50 A microprocessor package Size reduction VRM FIVR3 L2 V 1.7 V 0.6-0.8 V VD3 Reliability improvement ~210 A Parasitics of ~30 A Interconnections **Off-Chip VRM** Allow the use of modern on the motherboard FIVR3 **CMOS Technologies for** VD4 0.8-1 V ~50 A power switches Faster response to load FIVR4 and reference voltage VD5 1.5 V ~5 A transients









CarrICool Project (FP7-ICT-619488)



Multi-functional interposer platform that provides scalable cooling, granular chip-level power delivery and optical signaling required for scale-up systems





## **Considered System Specifications and Target Achievements**

#### Specifications

$$V_{in} = 1.6 V$$

$$V_{out,nom} = 0.8 V$$

$$I_{out} = 1 A$$

$$P_{out} = 0.8 W$$

$$\Delta V_{out,max} = 1\% \cdot V_{out}$$

- Specifications taken from the most power consuming voltage domain
- Power is scaled down





- Overall efficiency  $\eta > 90\%$
- Overall power density  $\rho > 1 \text{ W/mm}^2$
- Chip power density  $\rho$  > 20 W/mm<sup>2</sup>
  - Only 1% of the microprocessor area is allowed for power management

#### Beyond the state of the art!





## **IVR Design and Optimization**



## **Integration Level and Considered Topology**

**2.5D** integration level

#### **Four-phase interleaved buck**

#### Main waveforms



approaches

transistors

ETH zürich IE

Take advantage of high

FOM deep sub-micron



- Better quality factor integrated passives compared to 2D and 3D better heat and loss distribution among the components
  - Allow for phase shedding at low load operation
    - Stacked configuration supports the relatively high input voltage

#### [A] 1**.**25 $I_{\rm out,pk}$ 1.20 $I_{ m out,val}$ 1.15 $\Delta I_{out,pp}$ [A] $I_{L,pk}$ 0.4 $I_{L3}$ $I_{L1}$ 0.3 0.2 $I_{L,val}$ [A] D<sub>1</sub> 0.4 L1,exp 0.3 $\Delta I_{L1,pp}$ 0.2 T\_/2 T, 0

 Output and input current ripple reductions

8/23 -



## **Racetrack Inductors with Core Material**

#### Dimensions description



ipdia //23 —

## **Deep-Trench Capacitors**





- Capacitance density up to 250 nF/mm<sup>2</sup> with high capacitance stability vs. temperature
- ESR vs. capacitance extracted from experimental data



Lallemand et al., EMPC, 2013



# **Power Transistor Model for Stacked Configuration**

#### **Cadence transient simulations demand high computational efforts**

- **Based on measurements results**
- ► Too long simulation time

#### Necessity of accurate and simplified loss modeling for optimization

- Semiconductor losses dependent on the transistors channel width (T<sub>wP</sub>, T<sub>wN</sub> channel length fixed by design rules), dead times (t<sub>d.1</sub>, t<sub>d.2</sub>), and chip temperature
- Low computational effort

#### Cadence based simplified transistors model!

- Represents the most significant source of losses
- Uses a discrete number of cadence simulations and a multivariable interpolation algorithm





**Power Electronic Systems** Laboratory

0 o-

ETH zürich

### **Considered Power Stages**



Gate drivers: 2 Power devices: 4 Level Shifters: 1 Independent gate signals: 2

 $v_{\rm ds,TN_2}$ Time  $v_{G,L}$ 00 Gate drivers: 2 Power devices: 6 Level Shifters: 1

Independent gate signals: 2

Proposed CMOS ANPC



Problem of unequal voltage distribution during the switching transients and steady-state.

**Clamping switches are added to assure voltage balance among the transistors** 



Bezerra et al., COMPEL, 2017



## **Considered Power Stages**

#### ► Conventional CMOS HB

CMOS ANPC

#### Proposed CMOS ANPC



Unlike the conventional CMOS ANPC, the proposed bridge maintains the clamping switches off during the entire dead-time period assuring soft-switching



#### **Considered Power Stages**



- Due to the voltage balance and less losses during the hard switching event up to 1% efficiency can be saved using the proposed bridge at 150 MHz
- The efficiency improvements of using the proposed CMOS ANPC increase with frequency compared to the conventional approach



### **Pre-optimization Loop of the Power Switches**





ETH zürich

## **Optimization Procedure**



## Considered design space

► Inductor

| Sym.           | Description       | Range   | Step      | Unit |
|----------------|-------------------|---------|-----------|------|
| N              | Number of turns   | 1 5     | 1         |      |
| t <sub>w</sub> | Winding width     | 10 1400 | 10 or 500 | μm   |
| t <sub>t</sub> | Winding thickness | 10 50   | 20        | μm   |
| t <sub>s</sub> | Winding spacing   | 10 50   | 20        | μm   |
| c <sub>t</sub> | Core thickness    | 1 10    | 3         | μm   |
| <i>с</i> і     | Core length       | 1 10    | 3         | mm   |

#### Power Switches

| Sym.                    | Description     | Range  | Step | Unit |
|-------------------------|-----------------|--------|------|------|
| T <sub>wP</sub>         | P channel width | 5 15   | 5    | mm   |
| $T_{\rm wP}$            | N channel width | 5 15   | 5    | mm   |
| <i>t</i> <sub>d,1</sub> | Dead-time 1     | 20 120 | 50   | ps   |
| <b>t</b> <sub>d,2</sub> | Dead-time 2     | 10 50  | 50   | ps   |

#### Capacitor

| Sym.                        | Description        | Range   | Step | Unit |
|-----------------------------|--------------------|---------|------|------|
| $\mathbf{C}_{\mathrm{out}}$ | Output capacitance | 0.1 500 | 10.2 | nF   |

Only one transistor size was used for the 14 nm IVR

#### **Performance Comparison Between IVRs**



- 90% efficiency achievable with 0.2 W/mm<sup>2</sup> more power density using the proposed ANPC HB and 14 nm technology
  - Switches and power stage are more efficient for high frequency operation
- Selected inductance and switching frequency at 90% efficiency:
  - ► 32 nm Conventional HB:
    - 51 nH @70 MHz
  - ► 14 nm Proposed ANPC HB:
    - 16 nH @160 MHz

## **Implementation: PMIC**

#### **PMIC in 14 nm CMOS process**



**Four-phase interleaved ANPC buck** 

- Versatile open loop converter for better PMIC characterization
- Compatible with single and coupled inductors



#### 18/23 -

## **Implementation: Active Power Stage**





**Schematics** 

#### **Finfet concept**



- Globalfoundrie's 14 nm Bulk CMOS
- **Finfet 3D transistors optimized for digital circuits**
- Design uses exclusively low voltage devices



## **Implementation: Inductors**

Coupled inductors with magnetic core



#### **Strip-line**

Racetrack



#### ► 100 MHz design ► 150 MHz design = 21.6 nH = 14.4 nHL<sub>Self2</sub> L<sub>Self2</sub> $\mathbf{k}_2$ = - 0.95 (k<sub>max</sub>) $= -0.95 (k_{max})$ $\mathbf{k}_2$ = 2.4 nH $\mathsf{L}_{\mathsf{Self1}}$ = 1.6 nH $L_{Self1}$ = 0.95 ( $k_{max}$ ) $\mathbf{k}_1$ $= 0.95 (k_{max})$ $\mathbf{k}_1$

#### Cross section view



## **Implementation: All-silicon-based Demonstrators**



21/23



## **PCB-based demonstrators and Experimental Results**

**Demonstrator** 



Wire-bonded PMIC



► Passive devices







Switching node



#### Efficiency estimation



## Conclusions

- Extraction of the switches' loss models could be transposed for different Technologies
- Migration from 32nm SOI to 14 nm Bulk allows for high efficient and dense designs
- Interposer-based 2.5D integration allows the use of different components' process
- Better understanding of the switches' switching behavior allows for improvement in efficiency and reliability of the power stage

## Outlook

- **Fully Characterization of the designed demonstrators** 
  - **L**osses characterization of the individual devices and full systems
    - Requires accurate temperature measurements
    - Voltage probes embedded to the chip
- Testing of the designed closed-loop IVRs





Thank you for your attention!

