Tuesday, July 22, 2008

Nehalem (microarchitecture)


(Redirected from Nehalem (CPU architecture))
Jump to: navigation, search

Nehalem is the codename for a future processor microarchitecture being developed by Intel[1]. Nehalem will be released in late 2008 for high-end chips[2] and Q3 2009 for mainstream chips. The microarchitecture is the planned successor to the Core microarchitecture.

Nehalem uses the 45 nm manufacturing methods from Penryn and applies it to the new Nehalem microarchitecture. A working system with two Nehalem processors was shown at Intel Developer Forum Fall 2007[3], and a large number of Nehalem systems were shown at Computex in June 2008.

The processor is named after the Nehalem River in Northwest Oregon, which is in turn named after the Nehalem Native American tribe in Oregon. The code name itself had been seen on the end of several roadmaps starting in 2000. At that stage it was supposed to be the latest evolution of the NetBurst architecture. Since the abandonment of NetBurst, the codename has been recycled and refers to a completely different project.

Intel CPU core roadmaps from NetBurst and to Sandy Bridge.

Technology

Nehalem microarchitecture.
Nehalem microarchitecture.

As of its current description (at Spring IDF 2008), Nehalem appears to incorporate the most significant new architectural changes to the x86 microarchitecture since the Pentium Pro debuted in 1995. Nehalem is highly scalable with different components for different tasks. Various sources have stated Nehalem's specification will have:

  • 2, 4, or 8 cores
  • 45 nm manufacturing process
  • Integrated memory controller supporting DDR3 SDRAM and between 1 and 6[citation needed] memory channels
  • Integrated graphics processor (IGP) located off-die, but in the same CPU package[4]
  • A new point-to-point processor interconnect, the Intel QuickPath Interconnect, replacing the legacy front side bus
  • Simultaneous multithreading, which enables two threads per core. Simultaneous multithreading has not been present on a consumer Intel processor since 2006 with the Pentium 4 and Pentium EE. Unlike the SMT implementations on the Pentium 4 and the Atom, SMT on Nehalem is referred to as 'MTT'.[5]
  • Native (monolithic, i.e. all processor cores on a single die) quad and octo (8) core processors[6]
  • 32 KB L1 instruction and 32 KB L1 data cache per core
  • 256 KB L2 cache per core
  • 8 MB L3 cache shared by all cores
  • 33% more in-flight micro-ops than Conroe[7]
  • 2nd level branch predictor and 2nd level Translation Lookaside Buffer[7]
  • Modular blocks of components such as cores that can be added and subtracted for varying market segments[8]

Event demonstrations at the Shanghai Intel Developer Forum showed A1 silicon Bloomfield-based Nehalem processors at IDF running at 3.2 GHz. This processor had 32 KB L1 instruction and 32 KB L1 data cache, 256 KB L2 cache per core, and 8 MB of shared L3 cache.[9]


Performance and power improvements

It has been reported that Nehalem will have a focus on performance, which accounts for the increased core size.[10] Compared to Penryn, Nehalem will have:

  • 1.1x to 1.25x the single-threaded performance or 1.2x to 2x the multithreaded performance at the same power level
  • 30% lower power usage for the same performance
  • According to a preview from AnandTech "expect a 20 - 50% overall advantage over Penryn with only a 10% increase in power usage. It looks like Intel is on track to delivering just that in Q4."[11]

PC Watch found that a Nehalem "Gainestown" processor has 1.6x the SPECint_rate2006 integer performance and 2.4x the SPECfp_rate_2006 floating-point performance of a 3.0 GHz Xeon X5365 "Clovertown" quad-core processor.[10]

A 2.93 GHz Nehalem "Bloomfield" system has been used to run a 3DMark Vantage benchmark and gave a CPU score of 17966.[1] The 2.66 GHz variant scores 16294.[2] A 2.4 GHz Core 2 Duo E6600 scores 4300.

AnandTech tested the Intel QuickPath Interconnect (4.8 GT/s version) and found the copy bandwidth using triple-channel 1066 MHz DDR3 was 12.0 GB/s. A 3.0 GHz Core 2 Quad system using dual-channel 1066 MHz DDR3 achieved 6.9 GB/s.[12]

Overclocking will be possible with Bloomfield processors and the X58 chipset.[13] The mainstream PCH will not have the QuickPath Interconnect so its overclocking potential is called into question.[3][4]


Variants

Nehalem will come in variants for servers, desktops, and notebooks. The four-socket server CPU is codenamed Beckton, the two-socket server CPU is codenamed Gainestown, and the single-socket desktop CPU is codenamed Bloomfield.[14] Server processors will support registered DDR3.[15]

Seven codenames have been associated with the Nehalem microarchitecture in a PC Watch article.[16] These include two server processors, three desktop processors, and two mobile processors. The server processor, Beckton, will have 44 bits of physical memory address and 48 bits of virtual memory address. The mainstream and value processor, Havendale, will have a FDI bus. [5]

Codename Market Segment Process Cores (Threads) Speed Price Cache Memory Controller Bus Interface GPU TDP Socket Release Timeframe
Westmere [17] DP server 32 nm 6 (12)

256 KB L2/core
12 MB shared L3
Quad channel DDR3 4x QuickPath No Unknown LGA1366 H1 2010
Extreme desktop Triple channel DDR3 2x QuickPath
Performance desktop
Mainstream desktop
Beckton[16] MP server 45 nm 8 (16) 256 KB L2/core
24 MB shared L3
Quad channel FB-DIMM2 4x QuickPath 130 W[18] LGA1567 Q2 2009[19]
Gainestown[16] DP server 4 (8) 256 KB L2/core
8 MB shared L3
Dual and Triple channel DDR3
800/1066/1333/1600 MHz[20]
2x QuickPath LGA1366 Q3 2008[6]
Bloomfield[16] Extreme desktop 3.2 GHz $999 [7] 1x 6.4 GT/s
QuickPath[20]
Oct 2008[8]
Performance desktop 2.93 GHz $562 [9] 1x 4.8 GT/s
QuickPath[20]
Mainstream desktop 2.66 GHz $284 [10]
Lynnfield[16] Performance desktop

Dual channel DDR3
800/1066/1333 MHz[21]
DMI x4/x2
PCI Express 2.0
95 W LGA1160[16] Q3 2009 [11]
Mainstream desktop
Clarksfield[16] Extreme mobile 55 W mPGA 989[16]
Performance mobile 45 W
Havendale[16] Mainstream desktop 2 (4) 256 KB L2/core
4 MB shared L3
Yes 75 W[20] LGA1160[16]
Value desktop
Auburndale[16] Performance mobile 45 W mPGA 989[16]
Mainstream mobile
Power optimized mobile 35 W

Note: "Extreme" processors have an unlocked clock multiplier. TDP values for CPUs with integrated GPUs include the GPU. Prices are for batches of 1000.


The successor

Westmere (formerly Nehalem-C) is the name given to the 32 nm shrink of Nehalem. Westmere should be ready for a 2009 release provided that Intel stays on target with its roadmap. However, it appears that the bulk of Westmere's versions, including mobile versions, will be released sometime in 2010. [12][13] From various sources, Westmere's changes and improvements from Nehalem have been reported as follows:

  • 32 nm process.
  • Native hexa (6) core processors.[17]
    • The successor to Bloomfield and Gainestown is hexa-core.
  • A new set of instructions called AES-NI, that gives over 3x the encryption and decryption rate of AES processes compared to before.[22]
    • Delivers six new instructions that will use the Advanced Encryption Standard (AES) algorithm and also an instruction that will perform carry-less multiplication (PCLMULQDQ). Those instructions allow the processor to perform hardware accelerated encryption not only providing a faster execution but also protects against software targeted attacks.
    • AES-NI may be included in the integrated graphics of Westmere.
  • Westmere's integrated graphics may be released at the same time as the processor.
  • Release dates:
    • Late 2009 or early 2010 for DP server chips. [14]
    • H1 2010 for high-end desktop chips (Bloomfield successor). [15]
    • H2 2010 for mainstream and value desktop chips, assuming Westmere is released for that segment. [16]
    • 2010 for mobile chips, assuming Westmere is released for that segment. [17]

The successor to Westmere will be Sandy Bridge, scheduled for release in 2010, according to Intel roadmaps. [18]

No comments: