Multi-Core The Cool Factor at Hot Chips Conference

Wednesday Aug 22nd 2007 by Andy Patrizio

The clock speed race is buried once and for all at Stanford, but programming these multi-core monsters remains an elusive art.

STANFORD, Calif. -- Hot Chips is an appropriate term for a conference on future trends in microprocessors. On a rare hot day here in Memorial Hall on the campus of Stanford University, even the air conditioners fail to counterbalance the heat from notebooks that adorn practically every lap in the auditorium.

Inside, the talk among engineers and computer scientists is around multi-core and all things multi-core. Intel and AMD have shifted their strategy from clocks to cores and every demonstration, from graphics cards to research projects, were showing off their multi-core efforts as well.

The problem is while the hardware engineers have made a monumental effort to build the multi-core machines, the applications have not come. That's because parallel programming is a complicated science that's driving even the impressive collection of PhDs at this show up a wall.

"A lot of it is compiler science that needs to be updated to make programming [multithreaded applications] easier, and it will happen," Peter Glaskowsky, technology analyst for Envisioneering, told "Multi-core is really good at a narrow class of applications. A lot of people are doing a lot of work so multi-core will benefit many kinds of applications."

But just throwing cores at the problem won't help without careful design, said Erik Lindholm, an Nvidia engineer and veteran of Silicon Graphics in his keynote speech. Lindholm was discussing the scalar design of Nvidia's most recent video chip, the G80, which is found in the 8800 line of cards.

"You can't build infinitely wider hardware, your scalability goes down," he said. There must be balance between workload units. In the case of a video card, that means balancing the pixel processors, vertex engines and triangle animation. "You don't want to emphasize one part of the shader and stall out another. That will cause bubbles in the pipeline."

Nvidia (Quote) discussed its Compute Unified Device Architecture, or CUDA, a technology for writing applications in the C language (define) that utilize the computation power of the G80. The company has introduced a line of computers under the Tesla brand name.

The Tesla products are designed to aid in heavy computation projects, especially floating-point calculations, in science and medicine. The G80 can handle up to 12,288 threads and has 128 thread cores. CUDA is designed to address the threading problem by allowing a programmer to write multi-threaded applications with just a few lines of C code.

AMD followed with a demonstration of its HD 2900 video card, but stuck to promoting it as a graphics processor. "To us, whether you are playing video or doing 3D, it's a form of decoding and decompression… so our view of the graphics chip is it's a decoder and decompressor," said Mike Mantor, a Fellow at AMD (Quote).

Intel (Quote) showed off its 80-core prototype, which was designed to be a network on a chip with teraflop performance, and running at under 100 watts. The caveat to this prototype is that it's not compatible with x86 systems. Right now, it remains a lab experiment.

The chip uses a tile design for the cores, in an eight-by-ten grid. Each tile has a router connecting the core to an on-chip network that links all the cores together, rather than make them go through the frontside bus like its Core 2 and Xeon processors. Due to its advanced sleep technology, Intel estimates it cuts two- to five-fold reduction in power leakage.

The many-core speeches continued with Madhu Saravana Sibi Govindan of the University of Texas at Austin, who discussed UT's own multi-core project, TRIPS (The Tera-op, Reliable, Intelligently adaptive Processing System).

TRIPS uses a design known as EDGE, Explicit Data Graph Execution, which executes a stream of individual instructions as a block. Processors today function by executing instructions one at a time, very fast. EDGE attempts to run as many instructions as possible in one block.

TRIPS can execute up to 16 instructions per cycle, whereas the Intel Core 2 processor can only do 4. Because of its large blocks, a 366Mhz prototype was able to flatten a Pentium 4 in some benchmarks, while it was flattened in others. At this point, the processor and code for it is still in the development stages and Govindan said maximum performance required hand coding, a skill not many people have acquired.

This article was first published on To read the full article, click here.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved