A 125-ps Access, 4GHz, 16KB BICMOS SRAM
Xuelian Liu , Hadrian Olayvar Aquino, Alexey Gutin John. Mcdonald
Electrical, Computer and Systems Engineering, Electrical, Computer and Systems Engineering,
Renssenlaer Polytechnical Institute Renssenlaer Polytechnical Institute
Troy, NY,12180 Troy, NY,12180
Email:firstname.lastname@example.org Email: email@example.com
Abstract—A 128Kbit BiCMOS SRAM with a typical access time Sel[0-1] of 125ps was developed with 0.13um IBM Silicon Germanium 1-to-4 BiCMOS technology. The fast access time with moderate demux 64 Word line driver power dissipation has been achieved using the following Word line driver bit bus driver techniques: CML decoder, CML driver circuit, bipolar sense driver 64 64 dr 7to128 decoder amplifier. CMOS 6T memory cell is used to achieve the high 64 bits wide 64 bits wide dr 32 word lines 32 word lines packing density. Post-layout simulations have demonstrated that driver driver 2to4 5to32 this SRAM macro can achieve working frequency of 4GHZ. dr Word line driver dr Word line driver This Macro is especially useful for realizing ultrahigh speed, 64 64 bits wide d driver 64 bits wide d high density SRAMs which are used as cache in super- driver 32 word lines 32 word lines MUXMUd dr 64 computing processors. d dr X driver driver 64 I. INTRODUCTION dr 4-to-1 mux dr 64 bit bus d Fast speed microprocessors require high speed and large Sel[0-1] d d capacity cache memories. Since clock rate for CMOS have d limitation due to wire scaling problems, memories designed Sense Amps using BiCMOS technology, which can combine CMOS memory‟s high density and low power character with bipolar memory‟s high speed advantage, turn to be a promising way Fig.1 Block diagram of 8Kbit block to meet the memory requirement in high speed micro- processor. delay for driving the gates. Also, the unselected sub section Previously, A 450ps access-time 512Kb SRAM macro was can be power down to save the power.
fabricated in a 45nm SOI technology . A 0.65ns access time, The address access time for the SRAM macro consists of 72Kb ECL-CMOS RAM Macro was demonstrated in the the sum of the delay times of four circuit stages, i.e., word 0.3um BiCMOS technology. In this paper, we present 128 decoder, driver circuits, memory-cell array, and sense circuits. Kbit BiCMOS SRAM macro by applying CML Pre-decoder CML logic with three signal levels is used in the design due and decoders, bipolar drivers, bipolar sense amplifier and sub to its high switch speed and small voltage swing for each bank design. For high density consideration, 6T CMOS signal level . The Bipolar part and the CMOS part uses memory cell is used. The bipolar circuits and CMOS circuits different power rail. The small voltage swings and lower use different power supply voltage, so a high speed CML to level dependence on the upper power supply voltage CMOS voltage converter is designed to meet requirements. increases the need for a stable power supply. As a The macro features an access time of 125ps, and the test chip consequence, a negative power supply is preferred so that the can operate at 4GHZ. stable earth ground (0 V) can be used as the upper voltage.
II. ARCHITECTURE A. Decoder Design
The 128Kbit SRAM is configured with 512 word lines and A schematic of word decoder circuits is shown in Fig. 256 bit lines, and is divided into four banks. Each bank is 128 2.An address input signal is input into the buffer, and then word lines. Each bank is composed of four blocks, which are decoded by the wired-OR function. The outputs of the wired-divided by bit lines. So each block is 128 word lines and 64 bit OR function are input into the NOR gate in circuits as show lines. The block diagram of 8Kbit block is show in Fig1. in Fig.2. Each branch in the circuit is biased by a current Every block shares one decoder and pre-decoder block. The source. The current set in each branch will determine the pre-decoder block produces the pre-decoded signals in order decoder‟s speed and the power. When more current is used, to generate the MUX/DEMUX select signals, which is used to the CML gate will have faster speed, but will consume more select one of the 32 bits words lines and 64 bits bit lines sub power. Fig.3 shows the relationship between the delay time section in the block. The further divided word line in block of the decoder and the power dissipation in the word reduces the capacitor along each bit line, and then reduces the decoder circuits.
Fig.4 (a) simulation of decoder (b) simulation of word line driver
Fig.2 schematic of word decoder
Fig.5 schematic of word line driver also working as word line driver. The simulation result of the circuit is shown in Fig.4 (b).
C. precharge ,Memory Cell, wirt/reade circuit Fig 6 show the schematic of bit line structure, including pre-charge, memory cell and write/read circuit. The six transistor memory cell is used. Because of the small line Fig 3 power and delay relationship in the word decoder loads, due to the block-oriented archtecture, direct peripheral circuits such as WRITE circuits can be formed by CMOS The graph shows that compared to the increment of power gates in order to save layout area. consumption, the decoder delay improvement is not very During a write operation, the bit lines are initially evident. The reason for this is that in the IBM 8HP process, precharged. While the write pulse is low, the transistors on the cutoff frequency of transistor is as high as 210GHZ and is the bit line are off allowing the bit line to be precharged. The very fast. Therefore, low current steer is enough to get the DI and DI‟ signal pass through the PMOS FET in memory high speed operation of decoder. So the steering current in the cell. After the bit lines are precharged and the write pulse is decoder is set to be 0.83mA with a power consumption of 400mw. The simulation result of the decoder is show in Fig.4(a). B. CML to CMOS voltage converter/word line driver After the decoder, The input CML signals are received and converted into an MOS level signal , and memory cells operate at the MOS level signal. Therefore, the CML-MOS level conversion is inevitable in this configuration. Also, the capacitor in the word line is large because all the gates of the pass transistors in the memory cell are connected to the word lines, which introduce the parasitic capacitor and then increase the delay to drive this large capacitor. So a driver with large current is needed to short the charge and discharge time of this capacitor. A new CML to CMOS converter, which is shown in Fig5, is proposed. The requirements will be met by first using a level shifter to convert the Level1 CML voltage level to Level3, and then using an amplifier to change the voltage [-1.8, -2.05] volt to [0 -1.5] volt level. The current in the converter is 3mA , which is Fig.6 schematic of bit line structure suitable for driving the word line .So the Voltage converter is
high, DI and its complement are allowed to appear in the bit value when the bit line is not being read. In fact, this is a kind lines. If DI is „1‟ then BL‟ is pulled down, otherwise BL‟ is of D-latch sense amplifier designed using the CML logic. kept high. If DI‟ is „1‟ then BL is pulled down. So, The write Two cross-coupled inverters provide positive feedback. The circuit reverses the data. DI is connected to the write circuit sensitivity of the sense amplifier is important. The simulation that connects to BL‟ and DI‟ is connected to the write circuit waveform in Fig.8 shows that the sense time is 10PS. that connects to BL.
During a read operation, the bit lines are also initially E. Power-Saving Technique precharged, the when read pulse is high and word line is high, the value stored in the memory pass through the pass The power down technique is used in the design. A 2-4 transistor in the 6T memory cell to influence the voltage level pre-decoder will generate signal to select each block. When in the bit lines. The small voltage changes in the bit line will the select signal for each block is low, the signal will cut off pass though the MUX and get amplified by the sense the reference voltage generator, which is connected to a amplifier. The four sections of the block share one sense current source transistor‟s base of the CML gate. As a result, amplifier block. all the CML current of all address input buffer circuits is cut . off when the block is unselected. This will save power D. SENSE AMPLIFER consumption. The following table is the power consumption
As the number of cells on each bit line increases, the and the delay for each function block of 128k bits memory. capacitance on the bit line increases.This capacitance has a TABLE I. RESULT OT POWER AND SPEED large effect during a read operation because the current from
a memory cell is typically low. So the charge and discharge Write driver/ Memory Sense decoder CML to CMOS cell amplifier speed of capacitor is relatively slow. Because of this, a sense
amplifier is necessary and proposed. A differential amplifier power 400mW 1500mW 30mW 700mW
amplifies the resulting potential difference between the bit delay 25ps 40ps 40ps 10ps lines. A feedback circuit turns the differential amplifier into a latch. Fig.7 is the schematic of this circuit. The transistors
Q1 and Q2 act as a differential amplifier. They translate slight The power consumption can be further reduced by reducing changes on the bit lines into CML level. Q4 and Q5 store this the power supply. There are two ways to reduce the power supply. First is to change the three signal levels to two signal
levels; second is to change the bias transistor in the current
source to MOSFET. Based on these two methods, the power
supply for the bipolar part can change from -3.4V to -2.2V,
which will provide around 35% decrease of the power.
The layout of the 128K bit SRAM macro is shown in Fig.9. The macro is designed using IBM 8HP(0.13um) BiCMOS
Fig.7 schematic of sense amplifier
Fig.9 the layout of the 128kbit micro Fig.8 the operation of the sense amplifier
technology and is 4mmX4mm in size. It contains 4 Table 2 show the performance of different SRAM banks, each composed of 32K bits. The bipolar part has the -macros designed using different technology, which have been 3.4 voltage supply and CMOS part has -1.5 voltage supply. published previously. The macro presented in this paper has The Power consumption of the chip is 2.7W. faster speed, but larger area and more power consumption Post layout simulation is show in the Fig.10 and Fig.11. In than CMOS SRAM.
the Fig.10, after each write, the date is read from the same TABLE II. COMPARE OF THIS WORK TO PREVIOUS WORK address. It takes 125ps from write/read enable and input
address change to the data stored in the memory cell or Access shown in the output buffer. The test is operated in 4GHZ.In Tech. Size area power time each clock cycle (250ps), either a read or a write operation is 45nm 475um implemented. In fact, the operation frequency can be H. Pilo 512kb 24mW 450ps CMOS X482um improved further by reducing the pre-charge time to 75ps and 65nm 732um L. Wiss 256kb - 550ps keeping the read or write access time at 125ps, which means CMOS x404um a working frequency of 5GHZ(200ps per cycle). However, 0.3um 1.5mmx3.H. Nam 72kb 3.3W 650ps BiCMOS 5mm Due to test limitation, the maximum frequency the test chip 0.6um 2.25mm working is designed to be 4GHZ. The simulation is carried Sant.M 64kb 2.5W 900ps BiCMOS x3.8mm out both at room temperature and high temperature. Both get 0.13um 4mm proposed 128kb 2.7W 125ps the right waveforms. Figure.11 is the output waveforms of BiCMOS x4mm
sequential read from different address. When the address
changes, the read enable signal also asserts. The read enable IV. CONCLUSION signal just needs to maintain 125ps to get the data output correctly in the output buffer. A fast speed 128kbit SRAM macro is proposed using the IBM 8HP technology, with CML decoder, CML driver, CMOS memory cell and CML sense amplifier. 125ps access time is achieved in the post-layout simulation and the test chip is targeted to work in the maximum frequency of 4GHZ. The power assumption of the macro is 2.7W. The power can be further reduced by lowering the power supply voltage of the bipolar part. The proposed SRAM macro is designed to be used as cache for a ultra-high speed processor. ACKNOWLEDGMENT This work has been sponsored by an NSF EAGER grant numbered 1031440.
REFERENCES Fig.10 read after write waveforms  S.J. Jeng et al., “A 210-GHz fT SiGe HBT with Non-Self-Aligned Structure”, IEEE Elect. Dev. Lett., VOL. 22, pp. 542-544, 2001  Takada, M.; Nakamura, K.; Yamazaki, T.: High speed submicron. Fig.10 read after write waveforms BiCMOS memory. IEEE Trans. Elect. Dev. 42 (1995) 497-504  H. Pilo, V. Ramadurai, G. Braceras, et al., “A 450ps Access-Time SRAM Macro in 45nm SOI Featuring a Two-Stage Sensing-Scheme and Dynamic Power Management”, ISSCC Dig.Tech. Papers, pp. 378-379, Feb. 2008  H. Nambu et al., “A 0.65-ns, 72-kb ECL-CMOS RAM Macro for a 1- Mb SRAM,” in IEEE J. Solid-State Circuits, vol. 30,no. 4, 1995, pp. 491–499.  K. Nah et al, Jack F. McDonald: F-RISC/G: AlGaAs/GaAs HBT Standard Cell Library. ICCD 1991: 297-300  Kuan Zhou, Jong-Ru Guo, Chao You, John Mayega, Russell P. Kraft, T. Zhang, John F. McDonald, Bryan S. Goda: Multi-ghz Sige Bicmos Fpgas with New Architecture and Novel Power Management Techniques. Journal of Circuits, Systems, and Computers 14  L. Wissel, H. Pilo, C. LeBlanc et al., “A 550ps Access-Time Compilable SRAM in 65nm CMOS Technology,” Proc. CICC, pp. 21- 24, Sep. 2007.  Santoro,M.,Tavrow,L, and Bewick,G., A subnanosecond 64K BiCMOS Fig.11 Sequential read waveforms SRAM, Proc, 1994 Bipolar/BiCMOS circuits Technol Meet.,95,1994