LU Decomposition in C (and under CUDA)

As part of any major project, it occasionally happens that you assume something is a ‘solved problem’ when its really not. In my case it was solving small linear systems, of the form Ax=B, where A is an nxn matrix, B is a n vector. This is a problem that’s been solved in libraries such as LAPACK, LINPACK, BLAS, etc etc. The issue appears when you’re trying to do this stuff within a specific hardware environment (CUDA), and you cannot call host functions from the device, and the cuBLAS libraries cater only to large matrices processed in parallel ...

April 18, 2011 · Andrew Bolster

CUDA Compute 20 Error and other issues

There’s a quirk of using older CUDA drivers is that the latest NVIDIA SDK code examples are not backward compatible, i.e compiling the 3.0 SDK against the 2.3 toolkit (that I’ve spent the last day doing) is a fools errand (Thanks very much to @thebaron on #cuda on freenode and tkerwin on StackOverflow.) Basically, the 3.x drivers reclassify newer cards based on the; previously, the ‘compute’ value (a measure of OpenCL adherence) would max out at 1.3, but now the range is extended up to 2.0, but the 2.3 toolkit does not recognise this value, so craps out. ...

April 14, 2011 · Andrew Bolster

Ongoing CUDA work, aka, I love this book.

If anyone has any interest in CUDA, or GPU/Parallel programming in general, David B. Kirk and Wen-mei Hwu’s groundbreaking “Programming Massively Parallel Processors” is a must. ** The sub-title of the book is “A Hands on Approach” and I didn’t get it until a third of the way through the book, that that’s exactly what it is. The pairing of Kirk, a NVIDIA Fellow, outgoing NVIDIA Chief Scientist and generally world-weary technologist and all round ‘hardware guru’ with Hwu, a well-heeled educator and researcher at the University of Illinois provide a practical but in-depth look at not just the pure ‘programming’ to deal with massivly parallel processing, but instead assumes that the reader can work out for instance how to do Matrix Multiplication the ‘basic’ way from looking at the NVIDIA CUDA API’s, and looks at how to take advantage of the hardware to give sometimes incredible speed increases. ...

June 14, 2010 · Andrew Bolster

SEE, Programming Abstractions, Assignment 1

SEE, or, Stanford Engineering Everywhere, has turned out to be my favourite E-learning resource; I’ve dipped into it a few times over the past few years but in light of my recent investment into a CUDA enabled Graphics Card, I thought that it was coming high time to brush up on my C++ programming, which I’ve basically left stagnant for two years after advancing no further than function pointers, structures, and templates. So, in the spirit of openness that SEE tries to foster, I’ll be blogging my work through their CS106B course, Programming Abstractions, the second of three programming courses. (I passed on CS106A, Programming Methodology, since I’ve had enough Java shoved down my throat to last a lifetime…). ...

April 28, 2010 · Andrew Bolster

FIX:CUDA on Debian Jessie

Hopefully a super quick one (while I’m procrastinating from procrastinating). Debian Jessie is a lovely operating system until you try and do anything with it. Lots of Package deprecations etc etc etc. Anyway, I’ve got a history with GPU stuff and I’ve been playing with integrating it into some of my research, but in a bout of insanity I decided while I’m over in Liverpool (for another 4 hours) to wipe my old workstation and bring it over from Ubuntu 15.04 to Jessie (which I’ve been using on my main laptop for a while now). ...

Andrew Bolster