Intel Core i7 3960X (Sandy Bridge E) Review: Keeping the High End Alive
by Anand Lal Shimpi on November 14, 2011 3:01 AM EST- Posted in
- CPUs
- Intel
- Core i7
- Sandy Bridge
- Sandy Bridge E
If you look carefully enough, you may notice that things are changing. It first became apparent shortly after the release of Nehalem. Intel bifurcated the performance desktop space by embracing a two-socket strategy, something we'd never seen from Intel and only once from AMD in the early Athlon 64 days (Socket-940 and Socket-754).
LGA-1366 came first, but by the time LGA-1156 arrived a year later it no longer made sense to recommend Intel's high-end Nehalem platform. Lynnfield was nearly as fast and the entire platform was more affordable.
When Sandy Bridge launched earlier this year, all we got was the mainstream desktop version. No one complained because it was fast enough, but we all knew an ultra high-end desktop part was in the works. A true successor to Nehalem's LGA-1366 platform for those who waited all this time.
Left to right: Sandy Bridge E, Gulftown, Sandy Bridge
After some delays, Sandy Bridge E is finally here. The platform is actually pretty simple to talk about. There's a new socket: LGA-2011, a new chipset Intel's X79 and of course the Sandy Bridge E CPU itself. We'll start at the CPU.
For the desktop, Sandy Bridge E is only available in 6-core configurations at launch. Early next year we'll see a quad-core version. I mention the desktop qualification because Sandy Bridge E is really a die harvested Sandy Bridge EP, Intel's next generation Xeon part:
If you look carefully at the die shot above, you'll notice that there are actually eight Sandy Bridge cores. The Xeon version will have all eight enabled, but the last two are fused off for SNB-E. The 32nm die is absolutely gigantic by desktop standards, measuring 20.8 mm x 20.9 mm (~435mm^2) Sandy Bridge E is bigger than most GPUs. It also has a ridiculous number of transistors: 2.27 billion.
Around a quarter of the die is dedicated just to the chip's massive L3 cache. Each cache slice has increased in size compared to Sandy Bridge. Instead of 2MB, Sandy Bridge E boasts 2.5MB cache slices. In its Xeon configuration that works out to 20MB of L3 cache, but for desktops it's only 15MB. That's just 1MB shy of how much system memory my old upgraded 386-SX/20 had.
CPU Specification Comparison | ||||||||
CPU | Manufacturing Process | Cores | Transistor Count | Die Size | ||||
AMD Bulldozer 8C | 32nm | 8 | 1.2B* | 315mm2 | ||||
AMD Thuban 6C | 45nm | 6 | 904M | 346mm2 | ||||
AMD Deneb 4C | 45nm | 4 | 758M | 258mm2 | ||||
Intel Gulftown 6C | 32nm | 6 | 1.17B | 240mm2 | ||||
Intel Sandy Bridge E (6C) | 32nm | 6 | 2.27B | 435mm2 | ||||
Intel Nehalem/Bloomfield 4C | 45nm | 4 | 731M | 263mm2 | ||||
Intel Sandy Bridge 4C | 32nm | 4 | 995M | 216mm2 | ||||
Intel Lynnfield 4C | 45nm | 4 | 774M | 296mm2 | ||||
Intel Clarkdale 2C | 32nm | 2 | 384M | 81mm2 | ||||
Intel Sandy Bridge 2C (GT1) | 32nm | 2 | 504M | 131mm2 | ||||
Intel Sandy Bridge 2C (GT2) | 32nm | 2 | 624M | 149mm2 |
Update: AMD originally told us Bulldozer was a 2B transistor chip. It has since told us that the 8C Bulldozer is actually 1.2B transistors. The die size is still accurate at 315mm2.
At the core level, Sandy Bridge E is no different than Sandy Bridge. It doesn't clock any higher, L1/L2 caches remain unchanged and per-core performance is identical to what Intel launched earlier this year.
The Lineup
Processor | Core Clock | Cores / Threads | L3 Cache | Max Turbo | Max Overclock Multiplier | TDP | Price |
Intel Core i7 3960X | 3.3GHz | 6 / 12 | 15MB | 3.9GHz | 57x | 130W | $990 |
Intel Core i7 3930K | 3.2GHz | 6 / 12 | 12MB | 3.8GHz | 57x | 130W | $555 |
Intel Core i7 3820 | 3.6GHz | 4 / 8 | 10MB | 3.9GHz | 43x | 130W | TBD |
Intel Core i7 2700K | 3.5GHz | 4 / 8 | 8MB | 3.9GHz | 57x | 95W | $332 |
Intel Core i7 2600K | 3.4GHz | 4 / 8 | 8MB | 3.8GHz | 57x | 95W | $317 |
Intel Core i7 2600 | 3.4GHz | 4 / 8 | 8MB | 3.8GHz | 42x | 95W | $294 |
Intel Core i5 2500K | 3.3GHz | 4 / 4 | 6MB | 3.7GHz | 57x | 95W | $216 |
Intel Core i5 2500 | 3.3GHz | 4 / 4 | 6MB | 3.7GHz | 41x | 95W | $205 |
Those of you buying today only have two options: the Core i7-3960X and the Core i7-3930K. Both have six fully unlocked cores, but the 3960X gives you a 15MB L3 cache vs. 12MB with the 3930K. You pay handsomely for that extra 3MB of L3. The 3960X goes for $990 in 1K unit quantities, while the 3930K sells for $555.
The 3960X has the same 3.9GHz max turbo frequency as the Core i7 2700K, that's with 1 - 2 cores active. With 5 - 6 cores active the max turbo drops to a respectable 3.6GHz. Unlike the old days of many vs. few core CPUs, there are no tradeoffs for performance when you buy a SNB-E. Thanks to power gating and turbo, you get pretty much the fastest possible clock speeds regardless of workload.
Early next year we'll see a Core i7 3820, priced around $300, with only 4 cores and a 10MB L3. The 3820 will only be partially unlocked (max OC multiplier = 4 bins above max turbo).
163 Comments
View All Comments
JlHADJOE - Tuesday, November 15, 2011 - link
On Page 2, 'The Pros and Cons':> Intel's current RST (Rapid Story Technology) drivers don't support X79,
Rapid Storage, perhaps?
jmelgaard - Tuesday, November 15, 2011 - link
Computers are only getting faster one way today, and that is more cores, designing for up to a strict number of cores is merely stupidity in today's world.That said, developing games that support multiple cores might be somewhat more difficult than designing highly concurrent applications that processes data or request for data. (I can't say for sure as I have only briefly touched the game development part of the industry, but I work with the other part on a daily basis)
But while you might save development cost right now going down that road, you will spent the savings ones you suddenly have to think 8 cores in.
Carrying technical debt is never a good thing (And designing with a set number of cores in mind can to my programming experience only add that), it will only get more expensive to remove down the road, that has been proven to be true again and again.
And that is even considering that Frostbite 3 might be developed from the ground up, they still have to think up the concept again, while had they gone for high concurrency, then that concept would already be in place for the next version.
TC2 - Tuesday, November 15, 2011 - link
note,BD 4x2bc ~ 2B elements, 315mm2
SB-E 6x2hc ~ 2.27B elements ~ +14%, 435mm2 ~ +38% (includes unused space for 2 more cores), up to 15MB cache, ...
impressive at all!
C300fans - Tuesday, November 15, 2011 - link
Intel Gulftown 6C 32nm 6 1.17B 240mm2Intel Sandy Bridge E (6C) 32nm 6 2.27B 435mm2
I dont see any impressive thing. any performance improves?
Blaster1618 - Tuesday, November 15, 2011 - link
Given QPI @ 3.2 Ghz 205 Gb/s (25.6 GB/s) also handled the PCI load, can't we have something in the middle. I'm still a little confused is DMI 2.0 still just mainly simple parallel interface where QPI is a high speed series interface?C300fans - Tuesday, November 15, 2011 - link
Just imagine DMI 1.0 is a 4pcs pci-e 1x 1.0.DMI 2.0 is a 4pcs pci-e 1x 2.0
jmelgaard - Tuesday, November 15, 2011 - link
Clearly you didn't read a single of my points, or simply lack the understanding.Applications are not developed to target specific cores, you OS handles all that, it is a simple matter of pushing out jobs in threads or processes.
Processing in 10, 100 or 1000 threads/processes is no more difficult than doing it in 4... it just requires you have enough "JOBS" to process (and that term was deliberately chosen)...
This requires a different mindset though, and this might be more difficult to think of games that way right now, mostly because they have been use to running everything in that single game loop, but doing it now could be a rather good ROI down the road.
DarkUltra - Tuesday, November 15, 2011 - link
How about overclocking with turbo boost enabled? I mean, if the 3960X is stable at 4.4GHz, can it be stable at 4.8GHz when games or applications only use four cores? Then it would overclock and perform as good as a 2600K with four heavy threads.yankeeDDL - Tuesday, November 15, 2011 - link
Guys, there are always people with more money than brain that will purchase just about anything.That's not the point. Having the fastest CPU makes it a status symbol and whoever makes it can have the luxury to price it in the $1000 range, for fools to buy.
I don't know about CPUs, but I do know that the top performing GPUs (HD6990 and GTX590) are sold in extremely low volumes, both because of the relatively low ROI, both because the market is so little that inventory are scarce to begin with.
So, you may be right on the CPU side, but in general, you're both wrong.
This said, my point was that if AMD had performed and delivered a good CPU, instead of the FX8150, OR, the FX8150 at a good price point ($170, not $279), then Intel would have had a tougher time in pushing out the 3960X for this price, AND, it would have had to work harder on the chipset. However, because of the huge lead it has over AMD, Intel now can comfortably rebrand a "mid range" chipset and shove it to the customer who has no choice but take it if they want the best CPU.
retoureddy - Wednesday, November 16, 2011 - link
I agree on the fact that only 2 6GB SATA ports are a disappointment. Interesting though is to run two SSD in RAID 0 on the intel controller. With two Kingston SSD I manage real good figures (Crystal Disk Mark) : 4000MB test -> 1040MB/s Read and 621MB/s Write in (SEQ) / 675 and 481 (512K) / 28 and 253 (4K) / 279 and 405 4K QD32. I never managed this kind of throughput on the Z68 or P67 on-board controllers. These numbers are getting close to hardware RAID controllers like ARECA and LSI. I would have been interested to see where the bottleneck lies if X79 would have had more ports. Even though X58 is 3GB Sata you had no problem bottle-necking the Intel RAID controller at around 800MB/s.