Kaveri's Potential

  • Posted on: 8 June 2014
  • By: John Franklin

I have a lot of respect for AMD because over the years they have come up with a number of true innovations, leaving Intel (sometimes the whole industry) to play catch-up for a bit. The x86-64 architecture (a.k.a. AMD64) provided backwards compatibility to the 32-bit x86 instruction set when Intel was ready to move on with Itanium. AMD followed up by baking the memory controller in the CPU. When paired with HyperTransport, this gave multiple CPU servers significant performance enhancements while still providing memory coherency.

Their current line of processors -- dubbed APUs -- merge the CPU and GPU into a single chip, trying to leverage that integration to better performance. Up until now, the two might live on the same silicon, but there was still a high wall between them. In the latest generation, codenamed Kaveri, AMD has merged the GPU and CPU in a tightly unified architecture called HSA.

Reading benchmarks in most news articles, the GPUs don’t get a lot of exercise and the CPUs just aren’t beefy enough to go it alone against Intel’s chips, unable to beat a Core i5, and often struggling against a Core i3. Maybe the CPUs are sluggish because they had to give up space on the chip for their GPU partners. Or maybe AMD’s focus on developing HSA put a hold on CPU optimization.

Innovation comes with costs. In all three cases, old code more or less worked on these new CPUs, but the rest of the computing community had to adapt to fully realize the capabilities of the designs. For the AMD64 architecture, compilers needed to be updated to support the 64-bit instructions, more & wider registers, and other features. This was followed by software applications and libraries optimizing their hand-crafted assembly. On-chip memory controllers created a non-uniform memory model that needs to be managed by the OS so code could be executed on the CPU that has direct access to the memory holding the data as often as possible.

HSA requires both be done all over again. The prior generation of APUs, Trinity and Richland, have separate memory regions for CPU and GPU, but Kaveri merges the memory space of the CPU and GPU. Where we used to copy memory from CPU to GPU memory space, page tables now dictate who controls what memory. Data can be moved by simply reassigning the page, if the kernel knows to do so. This is going to require changes to operating systems to better manage memory just like with the NUMA architecture from a decade ago. The OS may also need to update their process schedulers to understand the difference between scheduling time on the CPU vs the GPU.

The last piece is to update software to run parallelizable, data-intensive code on the GPUs instead of the CPUs. If you think that doesn’t happen often, you obviously don’t use a web browser. Every time an image is loaded on a page, decompressing it can be accelerated by the GPU. SSL connections can be encrypted and decrypted in GPU faster than in CPU. Of course, video transcoding can be done in GPU far faster. However, this all depends on some updated build tools. The payoff promises to be big.

I own an old Trinity-based laptop and a lightweight Temash one. While Trinity can’t take advantage of the full HSA, it can leverage OpenCL implementations of key libraries, and the Temash’s 1GHz clock leaves it begging for a GPU assist. I’d love to see a performance boost from simply updating packages to versions with optimized GPU code. Looking forward, AMD has recently announced the laptop versions of Kaveri chips, and the A8-7600 will be out Real Soon Now™, which would be perfect for a new work laptop and a custom HTPC.

AMD has always been underdogs to Intel and I like to root for the little guy, even if AMD isn’t quite so little. Personally, I think HSA is an innovation that will change how we look at computing. It’ll be gratifying to see their work pay off. It’ll be exciting to write code for it.