Double Data Rate SDRAM Takes On Rambus

Bill Gervasi, Technology Analyst

Double Data Rate (DDR) SDRAM is the natural successor to today's SDRAMs, offering a natural migration path to much higher performance. DDR addresses the need for higher system performance without adding significantly to overall system cost. Direct Rambus has been positioned by an aggressive marketing campaign to fill this need, and gag rules have largely prevented an unbiased examination of the options for memory technologies. Recently announced delays in the availability of Rambus solutions have made it clear that something is seriously wrong. Let’s see how DDR and Rambus compare, and gain some insights to the reasons for these Rambus delays.

Part of the confusion surely comes from the obvious fragmentation of the memory market. Intel is apparently the primary supporter of the Rambus solution while much of the rest of the industry has chosen DDR. Server and workstation manufacturers chose DDR for peak throughput and ability to support chip kill error correction. Chipset and router companies are just now revealing their DDR plans, and it is anticipated that within the next couple of months we will see the graphics vendors lend their support to DDR for latency reasons. For companies sitting on the fence, a careful analysis of the options is in order, since it is possible that we will have two incompatible standards in the market for the next couple of years.

When comparing memory technologies, a handful of factors must be taken into consideration. Most critical of these are cost, performance, power and heat, board layout considerations, and a roadmap to the future.

A note on terminology: technically, both DDR and Rambus are double data rate devices, transferring data on rising and falling edges of the memory clock. Terminology used in this market can therefore be a little confusing as clock frequency, data rate, and peak throughput are mixed. DDR PC-266 has a 133MHz clock, 266M transfers per second per pin data rate, and 2.1GB per second peak throughput on a memory module. Similarly, Rambus 800 has a 400MHz clock, 800M transfers per second per pin data rate, and 1.6GB per second peak throughput on a memory module.

Cost

Table 1: Memory Cost Comparison, First Quarter 2000
	DDR PC-266	Rambus 800	DDR Advantage
Royalty	None	Negotiated %	No royalties
Package	TSOP(II)	CSP	100%
Die Increase vs PC-100 SDRAM	1%	15%	14%
Premium vs PC-100 SDRAM	5%	60%	55% and growing
Continuity modules, clock chip, heat sink	0	$10-14	$10-14

Cost is very naturally at the top of everyone's list, and any proposed change in memory types must first pass this litmus test. This includes the cost of the memory components, the modules, and other system level considerations to make a complete memory subsystem. This is an area where DDR really shines over its competitor, Rambus. The most obvious cost advantage is in the area of royalties. JEDEC publishes the DDR specification as an open standard whereas Rambus solutions must pay a royalty on not just the controller logic but also every Rambus memory attached to it.

The use of low cost, high volume Thin Small Outline Package (TSOP) for DDR versus the boutique Chip Scale Package (CSP) used for Rambus is another advantage. Memory testers and handlers for DDR are the same equipment used for older single data rate SDRAMs, whereas the retooling for Rambus will cost millions of dollars in capital equipment. At the system level, the Rambus solution also needs dummy continuity modules, heat sinks, and external clock generators totaling $10 to $14.

Die size is another advantage for the DDR memories. Structurally, a DDR SDRAM is a lot like a standard SDRAM with a two-bit prefetch that doubles the external speed without requiring a faster memory core. The primary difference is the inclusion of a delay locked loop (DLL) for synchronizing the output data to the memory clock. This DLL imposes a small penalty, roughly 1% of the die, and as such there are many designs in the industry where single and double data rate SDRAMs are actually the same dice. Direct Rambus penalties are much higher, consuming at least 15% of the die area. Less obvious is that the Rambus interface does not scale well with geometry shrinks, so the Rambus die penalty actually grows as the RAM design process improves and the RAM array shrinks. Current price quotes for memories in 2000 show that DDR commands a 5% premium over PC-100 memories, but Direct Rambus is 60% and growing to a 100% premium over PC-100 on subsequent die shrinks.

Performance

Table 2: Performance Comparison
	DDR PC-266	Rambus 800	DDR Advantage
Latency	41ns	58ns	41%
Peak Burst	2.1GB/s	1.6GB/s	33%
Low Power Mode	7.5ns	100ns	1200%

DDR SDRAMs offer superior performance in three distinct aspects: latency to first data, peak burst performance, and switch time between power modes. Latency is increasingly important in today's solutions where on-chip caches make the accesses to main memory more random. DDR SDRAMs use the familiar, fast RAS/CAS addressing protocol where real work is accomplished on every clock cycle. Random data can be accessed in as little as 41ns on a PC-266 DDR SDRAM. This contrasts to the Rambus packet protocol where the entire packet must be received before the RAM starts performing the internal access. A Rambus 800 device delivers random data in 58ns, or 41% slower than the DDR devices.

An unfortunate misconception in the market is that megahertz equals performance. The megahertz need to be applied across a system bus width to give a meaningful measurement, i.e. GB/s of peak burst performance. Peak burst performance is another key advantage for DDR. Typical DDR bus widths of 64bits mean that eight bytes are transferred simultaneously. With a 133MHz clock supplied, and transferring data on rising and falling edges, a DDR SDRAM DIMM enjoys a peak burst performance of over 2.1GB per second. Rambus 800 RIMMs with a 16bit interface running with a 400MHz clock provide only 1.6GB per second. DDR peak performance is a full 33% faster than Rambus.

Power or heat constrained solutions must also deal with the low power modes offered by memories. Both DDR and Rambus offer four operating power levels. However, the time it takes to switch between modes is very different. For example, a DDR memory can slip in and out of Precharge Power Down mode with a single 7.5ns clock delay. Rambus devices, however, take 100ns to enter and exit Nap mode.

At the system level, Rambus suffers another performance setback due to the need for heat throttling. This degradation is so pronounced that systems vendors are discouraged from looping system benchmark programs for fear that the throttles will engage, lowering performance numbers for Rambus solutions as the systems heat up.

Power and heat

Table 3: Comparing Typical Power Consumption
	DDR PC-266	Rambus 800	DDR Advantage
Bursting	335mW	470mW	40%
Standby	50mW	275mW	550%
Spread Over	4 chips	1 chip	300%

Power consumption numbers for DDR and Rambus are dramatically different. A look at two systems with four 256Mb RAMs, one using DDR and one using Rambus, shows a typical bursting power of 335mW for DDR and 470mW for Rambus. Also assume that the memory controller can put memories into a standby power down mode where the DDR solution consumes 50mW and Rambus consumes 275mW, a full 550% higher than the DDR solution.

Dissipating heat is easy for DDR, which burn less power than the older SDRAMS due to their lower supply voltage, 2.5V versus 3.3V. Also, the heat generated by data bursting is spread across all DDR SDRAMs on the bus, so the heat is absorbed into the printed circuit board and carried away through the system. Rambus systems, however, have large contiguous memory blocks in each memory chip. A 128Mb Rambus DRAM has 16MB all on one chip, so when a typical application runs, it will cause all power to be generated in that one chip, causing a hot spot in the system design. For this reason, Rambus solutions must provide complex and expensive fans or heat sinking techniques to spread that hot spot through the system chassis. At the recent SO-RIMM announcement, the presenter suggested using bare Rambus dice on the module’s printed circuit board to provide height clearance for a thermal transfer pad, a metal heat sink, and another thermal transfer pad. This assembly is mechanically coupled to a special heat transferring trap door on a notebook computer to dissipate the heat through the user’s thighs.

Module fallout due to bare dice assembly of untested memories will make the SO-RIMM prohibitively costly. Any screening of Direct Rambus memories for mobile application will fragment an already weak yield position and raise costs of both mobile and desktop memories. With these factors and the power considerations taken into account, it is unlikely that we will see Rambus-based mobile solutions for at least two years.

Board layout considerations

Table 4: Board Layout Comparison
	DDR PC-266	Rambus 800	DDR Advantage
PCB Impedance	55W ±15%	28W ±10%	15% raw PCB cost
Laminate	1 or 2 layers	1 layer only	Contributes to PCB cost
Clock Route	Fanout	Round trip	Less EMI
Clock Frequency	133 MHz	400 MHz	Less EMI

DDR SDRAM board layouts are very similar to existing SDRAM layouts, and can use the same low cost printed circuit board technologies mass produced today. These boards, with their 55W nominal impedance and ±15% tolerance, can be cheaply manufactured with one or two laminate layers. Direct Rambus solutions require tightly controlled printed circuit boards, however, with impedance of 28W ±10% which virtually requires a single laminate layer. This requires a large change in existing printed circuit board manufacturing processes. Systems manufacturers are wisely instructed to perform an incoming inspection test on 100% of incoming Rambus bare boards. These tests use time-domain reflectometry on special coupon test traces to screen failing boards before assembly. These stricter impedance requirements and incoming screen tests increase the manufacturing costs of Rambus solutions by as much as 50%, according to OEMs.

The number of data and control signals initially appears to be an advantage for Rambus over DDR. However, this advantage quickly fades once ground guard traces are added around each high frequency signal trace. Further complicating the Rambus layout and adding to system cost is the need for ground stitching, or dropping vias every inch along the ground guard traces down to the required ground plane on the inner layers of the printed circuit board. Similarly, the pin count advantage long praised as a Rambus advantage largely fades when sufficient ground pins are added next to each high speed signal on the memory controller.

Clock routing is a simple matter in DDR SDRAM designs where the memory controller supplies a differential clock signal to the memories. Since the DDR SDRAMs provide a data strobe for reading data, clock layout is simpler and less critical than with SDRAMs where an echo clock needed to be routed in order to eliminate flight time differences. DDR clocks run at a relatively relaxed 133MHz. Rambus clock routing, however, requires an external clock source and a 400MHz clock signal that runs past all the memories, to the controller, then back past all the memories to a terminator. This long trace complicates EMI concerns as it acts as a long antenna, radiating high frequency noise into the system chassis.

Roadmap to the future

In 1999 we will see production quantities of DDR SDRAMs operating at 133MHz for a bandwidth of 2.1GB/s. These specifications have already passed through the JEDEC standardization committees. The JEDEC standards approach also comprehends that tighter, point to point networks can operate at a higher frequency due to light signal loading. JEDEC is reviewing DDR specifications for small systems that will operate with 200MHz clocks for 3.2GB/s using the same devices that operate at 133MHz in large system configurations. In addition, definition of the next member of the JEDEC family, DDR II, is well under way that takes the data rates to 400MHz and 600MHz for large and small systems, respectively, providing peak throughput of 3.2GB/s to 4.8GB/s.

Rambus future plans are far more restrictive. Production delays have made it unlikely that the 400MHz devices will yield the promised 1.6GB/s in 1999 in any appreciable volumes. There is discussion of a variant in 2000 with a 500MHz clock for small systems that would achieve 2GB/s. This would still be 60% slower than DDR small systems solutions in the same time frame. Beyond 2000, perhaps a circuit board technology can be developed to support a 600 MHz clock. Year after year, Rambus solutions will play catch-up with DDR family devices on burst performance, and will never be able to compete on latency as long as the Rambus stays with a packet protocol.

Some have argued that Rambus could catch up to DDR by adding more Rambus channels to the controller. There are a few reasons why this is not cost effective. The first reason is the technical difficulty of managing the simultaneous switching transients of two such interfaces on a single controller die. A second is the large and inflexible footprint of the Rambus interface that drives each channel, making floor planning and pin assignment of controller dice difficult. Third, since parallel 16bit RIMMs are impractical from a board space standpoint, there’s the challenge of creating a module with a 32bit bus and dealing with the layout and power issues including the many guard ground traces and the cost of thousands of stitching vias.

Summary

On each of the primary considerations for systems designers, DDR memories provide a superior solution. DDR SDRAMs are low in cost yet provide very high performance in terms of latency, peak performance, and power management protocols. The power and heat profile as well as the printed circuit board layout considerations are very similar to existing PC-100 solutions today, making a simple transition to DDR possible. More importantly, there is plenty of headroom in the DDR family to take the technology to increasingly higher levels of performance without making large changes in industry infrastructure. Once in a while, engineering skill really can overcome marketing hype.

Rambus users, on the other hand, are struggling to produce a design that is affordable given today’s realities of testers, PCB materials, and requirements for low power consumption. The cost of this technology has already relegated it to the very low volume $2500+ Pentium III market segment, and it faces a long battle against increasingly strong competition before it can hope to move into the higher volume markets that are more cost sensitive.

It remains to be seen if the market can sustain two memory architectures. While DDR has the clear technical advantage, Rambus has the attention in the press. It will be fascinating to see which is more compelling in the long run.