Feeds:
Posts
Comments

Archive for August, 2009

Density is not the problem

Unless you’re in a newly constructed data center I would argue that compute density isn’t the problem you should focus on. You won’t even have the power and cooling density to fully utilize the most dense systems out there.

[There are definitely exceptions to this, such as when you're dealing with the maximum distance for your networking essentially defining for you the radius for the area where you need to fit your equipment.]

But for most HPC users, this isn’t the case. You’re not pushing the physical limits for electrical signals and your power and cooling are limited. If you’re in a data center built some years ago and you’re ready to upgrade to the next generation hardware, then you already can get more performance out of every rack unit than were the case when the data center was built. In other words, you probably have floor space to spare when you move to newer hardware.

So why would you pay extra for higher density?

My take on it is that unless you’re in the very high end of HPC or have some other very special reason to do it, you shouldn’t. Density is not the problem to focus on. Results per watt is.

If you follow that train of thought, and assume that you indeed have data center space to spare (or at least don’t need to reduce it), you first start to look at more generic servers that may or may not have more space in each box. You then distribute them more sparsely in the space you have. In one rack or multiple racks. Remember that you (usually) get more work done per box than you did with the last generation of hardware you installed.

Now, this may not meet your overall performance requirements. If so, it’s time to look at accelerators like GPU or FPGA and replace/complement your x86 servers with this. Depending on factors like your applications, if you have access to source code, if you have the skills to deal directly with FPGAs, etc – you’ll end up in your personal spot in this range of solutions. Nvidia for example has been working on this for a long time and have a nice set of both applications ready to take advantage of their Tesla GPUs and they also have good development tools that make it easy to just use it with an application or develop for it. Or, if you do have the skills to deal with FPGAs directly and have the volume and budget to support it, you could create a very specific accelerator for your needs.

The important thing is that by deploying accelerators like this you can address your overall performance requirements and still solve for “results per watt”.

At this point you have a so-called “nice” problem to contend with. This is where you need to decide if you want to get maximum performance out of the power/space/money budgets you have to work with or if you’re OK meeting a certain performance level and instead minimize the number of boxes you need to get there. I.e. do you exceed your performance target within your money/power/psace budgets or do you give something back from your budgets?

Read Full Post »

This morning I was reading John West’s article about Intel’s acquisition of RapidMind.  It’s the latest example of the High Performance Computing (HPC) industry recognizing the need to make the use of accelerators and many/multi-core and cluster parallelism easier, or to be specific with regards to the InsideHPC article, to design software for this.

I have always viewed the need to customize your software for specific accelerators to in most cases be a dead-end approach, be it GPU, FPGA or anything else. Granted that there’s a set of exception cases where developers and end-users are prepared to go down that route, fully aware of the costs. But to really reach the larger audience you need to make it much easier and essentially hide the complexity. History is littered with the remains of accelerator companies that never really solved that problem and only could take advantage of a limited window of opportunity.

I compare this with the times when I had to do assembly programming and count cycles to get that last ounce of performance that was needed in the embedded realtime systems I was working on while at Ericsson. In our case it made sense to do those time critical pieces at that low level, but for the most part we were using a high level language with built-in constructs for our most used and critical functions (realtime signaling and communicating over a high speed network designed for telecom and defense related applications). Only very few developers had to deal with the complexity of assembly level and really knowing what hardware was underneath. This approach greatly enhanced productivity when designing the actual applications and the performance was “good enough” so that we came out ahead every time.

I see many similarities with that and where the HPC industry has been with the use of accelerators and many/multi-core in parallel systems. It’s been a journey from having only those low level or hardware specific tools available for the really dedicated to where we now have several approaches to upleveling it to a point where the application developer can have essentially one source code and let the “system” take care of translating it in such a way that they take maximum (or close enough) advantage of the hardware it runs on.

Steve Wallach of Convex and Data General fame, now at Convey Computer, has said it very well: “The architecture which is simpler to program will win”

Apart from Intel/RapidMind; take a look at what Nvidia is doing with CUDA, OpenCL and integration with PGI compilers; what Convey Computer is doing with their HC-1 system; and for that matter what Apple and Microsoft are doing for promoting common API’s (OpenCL and DirectX Compute respectively)

We’re at an inflection point where the use of various type of accelerators now is easy enough for developers and we’re getting to a point where it’s also easy to deploy. Essentially providing “stealth acceleration” where it “just works” almost regardless of what hardware you have. This opens the door wide open for heterogenous clusters with Grid/Cloud level software that takes the pain out of scheduling for optimum time to results.

If I compare with my previous example of what we were doing for realtime networked applications, the next step would be a high level language that allows the developer to stay close to the application code and not worry about things like how to use MPI for best performance and scaling in a cluster. Sun’s Fortress language seemed to be addressing this in a way similar to what Java did for its space. However, with the Oracle acquisition of Sun you have to wonder if Fortress will survive? I’m hoping it will, as an open source project.

Read Full Post »

(For explanation of the “pig” reference in the title, please read to the end. It’s not meant to be negative)

I’m attending CloudWorld in San Francisco this week. Actually, it’s a collection of three conferences in parallel OpenSource World (formerly LinuxWorld), Next generation Data Center and CloudWorld. I’ve been focusing on the Cloud Computing side.

For someone like me with a background in HPC and Grid Computing (and distributed computing before the “Grid” term was invented), it’s a little like what Yogi Berra would have said: “This is like deja vu all over again.”. Cloud Computing is about making a big computer out of a bunch of smaller ones and giving access to this “service” over a network, often using modern web based portals and security mechanisms etc. Sound familiar?

When you poke under the hood, it looks eerily similar to Grid Computing. The “plumbing” is the same. It evolved out of trying to address the same problem: building that big computer out of distributed parts. So it’s no surprise that there are similarities. What’s different is the scale and the standardization at the different levels that is possible now with how the Web has evolved.

Some challenges still remain though and one big one is around culture. How do you gain the trust of a user so that s/he will trust the service enough to place her/his precious data in the cloud? This was the same problem we had with Grids. Things are however slowly changing. To some part people are getting used to having some of their data on the web, through their personal interaction with web commerce etc. But most people are still wary about putting all their information out there. The same applies to business, people are looking at what can be risked to be out there (even through so-called secure mechanisms) and what they aren’t ready to put to that risk (yet). It’s about gaining trust, and I would claim we’re not 100% there yet.

Bill Nitzberg from Altair made the connection with history in his talk and also with HPC as pioneering this technology. He made the observation that every decade since the 60′s have had it’s version of building a bigger computer out of smaller ones. Yesterday it was called Grid. Today it’s called Cloud. On the surface it looks like a “pig with a different snout” (a saying I picked up from listening to David Feherty and Nick Faldo during last weekend’s golf broadcast. I just love listening to their banter!). They are not quite the same thing though, the scale is different both in terms of number of machines that work together and also in terms of the scope of the problem being solved. “Tomorrow” we’ll talk about constellation of clouds and have a cute name for that.

Bottom line, if you have High Performance Computing or Grid background. You’re well positioned to understand the Cloud Computing issues and able to leverage your experience. I’d say this is very different from having a general enterprise data center background. In HPC and Grid you’re used to thinking about thousands of compute resources behind a virtualization layer (Grid), enterprise data centers (in general) don’t deal with that scale. In HPC we’re used to pushing the boundaries just a little beyond what’s comfortable in order to get that last ounce of performance.

Read Full Post »

Follow

Get every new post delivered to your Inbox.