art with code

2010-04-09

Intel 48-core research chip

Intel press release, technical whitepapers.

48 cores at 1GHz, peak performance around .. 48 billion instructions per second. If it has SSE and mul-add then 190 double GFLOPS, 380 single GFLOPS? Without mul-add half that, and without SSE 48 double&single GFLOPS. Power draw something like the top Core i7s (@ 100 GFLOPS), so it should be anywhere between 0.25-4 times better (or worse) at raw number crunching. On parallel CPU-like scalar workloads, possibly 4 times the peak performance of a Core i7 980X (depending on instruction throughput). Probably pretty bad single-threaded performance (think of an underclocked Atom).

Nitpicking the press release:
Application software can use this network to quickly pass information directly between cooperating cores in a matter of a few microseconds

A few microseconds? One microsecond is 1000 cycles at 1GHz. If few means 5, that'd be 5000 cycles... I guess they really mean nanoseconds which'd give L2-like latency.

Now, this is a research chip, but let me think a bit about the commercial implementation.

What's the memory bus going to be like? Core i7 has three memory channels feeding it, and four times the computational power needs four times the bandwidth. Twelve DDR3 channels or fewer channels of something more expensive like GDDR5? The current research system has four DDR3 channels, so I guess it's not much faster than a Core i7.

And what's the price? Compared to GPUs, it'd be in the $100 bracket. Compared to CPUs, it'd be in the $4,000 bracket. From the pictures it looks like a pretty big chip (though it's at 45nm), so maybe the price reflects that? As it is, you could probably sell it at somewhere between $300 and $5000, depending on whether you're targeting heterogeneous supercomputers ($300, competing against GPUs) or x86 loads ($5000, competing against low-end Xeons and Opterons).

If it came with one or two cores with fast single-threaded performance, it could go places. As it is, it's in a bit of a bad place: worse single-threaded performance than cheap CPUs, lacking graphics drivers (?) to serve as a GPU replacement. So, x86 scientific computing and webservers as the first target?

Plus if you write code that runs fast on this chip, it's going to run fast on a GPU as well, so it's kinda hard to figure how this will pan out. There is no legacy code for 50-core+ systems (apart from real-time graphics software), so the competitive advantage of x86 ISA might be less important. Who knows, I don't.

No comments:

Blog Archive