art with code

2011-11-08

Favicon notify

Hack of the day from a few weeks back: show small notification bubble in the favicon. Check out the code at https://github.com/kig/faviconNotify and go to http://fhtr.org/faviconNotify for a demo.

Usage

  FaviconNotify.set(number);
  FaviconNotify.clear();

Longer example of use in HTML

  <html>
    <head>
      <link rel="icon" href="favicon.ico">
      <script src="faviconNotify.js"></script>
      <script>
        window.onblur = function() {
          FaviconNotify.set(1);
        };
        window.onfocus = function() {
          FaviconNotify.clear();
        };
      </script>
    </head>
  </html>

Riffing on the work of +Michael Mahemoff http://softwareas.com/dynamic-favicons
And http://userscripts.org/scripts/show/24430
And +Mathieu Henri http://www.p01.org/releases/DEFENDER_of_the_favicon/

See also http://faviconist.com/favicon-library (updated dynamic favicon library to incorporate badge-setting).

Updated Three.js deck

I updated the "Basics of Three.js"-presentation to work with the latest version of the library. My Google Developer Day "Introduction to WebGL"-presentation is also online, and updated to match the latest changes to Three.js.

In more detail, the changes were:
  • $FOO.addChild and $FOO.addLight were merged into a single method, $FOO.add
  • Camera was deprecated and split into a PerspectiveCamera and OrthographicCamera
  • Camera no longer has a target that it tracks, you need to use camera.lookAt(vec3) on every frame to accomplish that.
  • MeshShaderMaterial was deprecated and renamed to ShaderMaterial
  • material.ambient now needs an AmbientLight in the scene to work (AmbientLight color is multiplied by material.ambient in the shader)
  • ColladaLoader was renamed to THREE.ColladaLoader

2011-09-27

Basics of Three.js Presentation


The slides for my Basics of Three.js talk are now up at fhtr.org/BasicsOfThreeJS

They take you through getting started with Three.js, building small apps and using custom shaders. There's a lot of live examples to play with, including the world's worst 3D modeler (pictured). You can fork the deck at github.com/kig/BasicsOfThreeJS

If you want to show off your awesome Three.js skills, add a new section to the deck and send me a pull request. Remember to add your name to the credits if you do!

2011-08-29

Canvas image filter library

Here's a small image filter library for the canvas element: github.com/kig/canvasfilters

It's based on the HTML5Rocks Canvas Image Filters article.

2011-08-27

WebGL Filesystem Visualizer


I put the HTML5WOW 3D filesystem visualizer demo online at fhtr.org/wfsv. Works only on Chrome as it uses the file input webkitdirectory attribute to recursively list directory contents.

Use the file input button in the top-left corner to select a directory to visualize. The visualizer tries to generate a model of the full directory tree, so it only works on small dir trees (less than 500 files or so). Use the console to navigate by clicking on file names. You can also use some shell commands like ls, cp, rm and mv. The filesystem contents aren't modified though, only the visualization.

Here's a video of the visualizer in action from the Google I/O 2011 "HTML5: The Wow and the How"-presentation:

2011-08-19

Ken Burns effect using CSS


The Ken Burns effect is a special effect used in documentaries when you only have a static photograph of an interesting item. To add some movement and life to the photograph, you zoom into the photo and pan towards a point of interest. It's named the Ken Burns effect because it was used a lot by a documentary film maker named Ken Burns.

Anyhow.

You can achieve the Ken Burns effect using CSS animations. It's not even particularly difficult. Just create a div with overflow:hidden to hold the image, then change the image's CSS transform property. Or if you want to be totally retro and backwards-compatible, you could also achieve the effect by changing the image's top, left, width and height using a JS setInterval.

So, CSS:

.kenburns {
overflow: hidden;
display: inline-block;
}
.kenburns img {
transition-duration: 5s;
transform: scale(1.0);
transform-origin: 50% 50%;
}
.kenburns img:hover {
transform: scale(1.2);
transform-origin: 50% 0%; /* pan towards top of image */
}


And the corresponding HTML:

<div class="kenburns" style="width:640px; height:480px;">
<img src="image.jpg" width="640" height="480">
</div>


If you hover over the image, it will slowly zoom in and pan towards its top edge.

You can see the effect in action here. And a more complex version with a JS-driven lightbox here.

The (quick, hacky) code is on GitHub.

2011-08-11

Spinner library


Ok, uploaded the spinner library to GitHub. There's a demo page too. It's a wee bit buggy though. I think the transition events are not firing when the page is hidden.

Even getting it to the current phase was a pain, the interaction between CSS transitions and animations, the DOM, JavaScript and events is flaky at the edges. Once I get an animation or transition going, it works well. But if I want to change some CSS properties in JS with the intent to trigger an animation or transition, it's pretty much guesswork whether it will actually work without glitches.

Hmm... As a workaround, I think I could just create a new element on every show/hide-cycle. That way the animation can't possibly fire one 100% frame before starting from 0%, right?? (This happened on Firefox... or maybe it was an unanimated frame or something. I ended up switching from animations to transitions because of that.)

On Chrome Canary, border-radius and overflow:hidden don't behave like you'd expect. Instead of clipping the contents to the rounded box, it clips on the rectangular box. I don't know why. And sometimes the rendering flickers when using transforms that toggle HW compositing for the page.

2011-07-19

Story of Tomte's New Hat


To break the monotonic programmingification, here's a story about Tomte's new hat.

2011-07-06

Microbenchmark findings

Looping through canvas ImageData pixels: 1D traversal for (var i=0; i<data.length; i+=4) ... is slightly faster than 2D traversal for (var y=0; y<height; y++) { for (var x=0; x<width; x++) ... }. Caching the ImageData width, height and data properties to local variables is slightly faster than not (except in the 2D case, where it's a good bit faster). Stay away from crazy counting hacks, they're more likely to slow you down than speed you up (probably because they make the compiler's job harder and it can't optimize the code properly).

Based on the above, my preferred pixel loop format is:

var width = id.width;
var height = id.height;
var data = id.data;
for (var y=0; y<height; y++) {
for (var x=0; x<width; x++) {
var off = (y*width + x) * 4;
var r = data[off];
var g = data[off+1];
var b = data[off+2];
var a = data[off+3];
// ...
}
}

And if you just need to map over the pixels and don't care about the coords, the simple 1D loop:

var data = id.data; // or just use id.data directly, no big perf impact
for (var i=0; i<data.length; i+=4) {
var r = data[i];
var g = data[i+1];
var b = data[i+2];
var a = data[i+3];
// ...
}


Sparse blitting vs. dense blitting: If all you need to do is change the values of a small number of pixels, use fillRect. If you need to fill the entire canvas or a significant portion (say, 256x256) of it with custom data, use putImageData. There's no major performance difference between clearing an ImageData buffer before use vs. allocating a new one, so I'd go with clearing the buffer just to avoid extra GC work.

Text drawing: fillText is significantly faster than strokeText. Whether the text is aligned on an integer pixel only seems to matter on Opera's fillText.

Path drawing: Nothing very conclusive, path point count doesn't seem to have a major impact on path point throughput.

Spritesheets: For best performance, cache your spritesheet frames to separate canvases.

Image drawing: Align your images to integer pixels, don't transform them, use canvas elements instead of IMGs. Transformations perform very badly on non-accelerated canvases. Just offsetting an image by fractional pixels causes the browser to use the slow path. On accelerated canvases, transformations don't really matter all that much. You still get the best perf by doing aligned non-transformed draws though.

Clearing the canvas: Just use clearRect.

WebGL texture sources: Use premultiplied textures if possible. Use ImageData if possible, just don't specify it as premultiplied (same goes for typed arrays). Canvases are faster than images on Chrome, images are faster than canvases on Firefox. Typed arrays are about as fast as canvases.

2011-06-08

New articles

I have two new articles up at HTML5 Rocks. The first is about making image filters using the canvas element and the second one is about building a kiosk-style app for showcasing Chrome Experiments at this year's Google I/O.

Go check them out and let me know how you like them.

2011-06-02

Canvas & WebGL microbenchmarks

I've been writing some microbenchmarks for canvas and WebGL over the past few weeks. They're at jsPerf, but jsPerf being what it is, I'm going to collect all the benchmark links here. So, without further ado:

Canvas 2D tests


Clearing the canvas
Drawing images
Spritesheets vs. individual sprites
Path creation
Filling and stroking text
Drawing text at different sizes
Sparse blitting
Dense blitting
Looping through ImageData pixels

There's also this test on the speed of the different canvas composite operations over at GitHub, I probably should port it over to jsPerf as well.

WebGL tests


Texture sources and pixelStorei settings

2011-04-24

Runfield & Remixed Reality

Oh right, I did these demos for Mozilla a couple months back:


Runfield


Runfield is a Canabalt clone with painted graphics (I made a guide on how to do your own graphics for it, but it's a bit "First you sketch a good-looking picture and then you paint it! Done!"). The graphics were painted in MyPaint and GIMP. The renderer is done with Canvas 2D and uses drawImage to draw thin vertical slices from the background images to make up the undulating ground.

The main things I wanted to communicate with Runfield were speed and polish. Showing that you can make a fast 2D game with JS and have it look good. Accordingly, most of the dev time was spent making the graphics and optimizing the engine (:

Optimization tips: draw images aligned to the pixel grid, eliminate overdraw (if you know that a part of an image is not going to show, don't draw that part), use the first couple seconds to detect the framerate and drop down to a lighter version if the framerate is low.


Remixing Reality


Remixing Reality is another demo to showcase what you can do with JavaScript today. It's processing video frames in real-time to locate AR markers and uses WebGL to draw 3D models on top of the markers. If you click the play button on the side, music starts playing and there's a 3D music visualizer powered by BeatDetektor2 and the Mozilla audio API, again analyzing the audio in real-time.

The AR library powering the thing is JSARToolKit, a pure-JS port of the Flash FLARToolKit (which uses NyARToolKitAS3, which is a port of the Java NyARToolKit, which is a port of the C ARToolKit. Whew.) Porting it over to JS was pretty quick, since the AS3 syntax is close enough to JS syntax that I could write a good-enough syntax translation script in a couple days. Then I implemented the AS3 class semantics in JS and off we go.

Well, it wasn't quite that easy. The syntax translator is a hack and I had to go and manually fix things. And implement the pertinent parts of Flash's BitmapData. And write a shim to make it work with Canvas. But hey, 14 kloc port in a week!

The job didn't end there though. It was slow. The biggest slowdown was that the library was reading a couple pixels at a time from the canvas, and each of those reads called getImageData. So, cache it, problem solved.

It was still a bit slow, mostly due to FLARToolKit using BitmapData's color bbox queries to do feature detection. I.e. find the smallest rectangle in the bitmap that includes all pixels of a certain color. Each call to BitmapData#getColorBoundsRect needs to go through the pixels in the bitmap and find the first row where the wanted color is found, then the bottom row, then scan the rows in between to find the left-most and right-most columns. This process was not too fast in JS.

But NyARToolKit, the library which FLARToolKit is based on, was doing the feature detection in an entirely different way. Its algo was running on RLE-compressed images (Run-Length Encoding: pack data as [value][number of repetitions], e.g. aaabbbb becomes a3b4). And since the images in question are thresholded to black and white, RLE works very well. Expected result: smaller images => less work for JS => faster.

So... I made the JS version use the NyARToolKit version. And hey, it was 5x faster! Nice!

Another thing that helped performance on Firefox 4 was using typed arrays instead of normal JS arrays. Fx4's JIT generates more efficient machine code for typed arrays. On Chrome 10(? IIRC), typed arrays and normal arrays didn't have much of a performance difference, but the code ran fast enough on normal arrays already.

For the 3D stuff I used my Magi library. With a Blender export script to get the models in. And a slightly tweaked lighting shader to make it fill the unlit areas with some ambient. Fun times.

2011-04-01

Browser rendering loop

The browser rendering loop is how the browser displays the web page to you.

The main stages of the rendering loop are:
  1. Updating the DOM.
  2. Rendering the individual elements.
  3. Compositing the rendered elements to the browser window.
  4. Displaying the browser window to the user.

The DOM updates happen in JavaScript or in CSS transitions and animations. They include things like "Hey, I'd like that header to turn red." and "Please draw a thick line on the canvas."

If you're drawing to a canvas, you might expect the browser to draw as soon as you issue a drawing command. Which is what actually happened in earlier browser versions. But in the latest browsers, it doesn't quite work that way. Nowadays the browser queues up the drawing commands and only starts drawing when it absolutely needs to. Which is usually just before compositing, in the second stage of the rendering loop.

However, if you want to force the browser to finish drawing before continuing JS execution, you can try doing getImageData on the 2D Canvas and readPixels in WebGL. As they need to return the finished image, they should force the browser to flush its draw queue. This comes in handy if you ever need to figure out the time it took for the browser to execute your drawing commands.

Once all the individual elements are drawn, the browser composites them together to create the final browser window image. And finally, the browser window image is shown to the user via the OS window manager.

The frame rate perceived by the user is the frequency at which step 4 is repeated. In other words, how often the updated browser window is shown to the user.

As most flat panel displays can only update 60 times per second (the TV frequency), the browser tries to display only up to 60 frames per second. Going over 60 when your display can't take advantage of it would only burn more CPU and reduce battery life, so it makes sense to clamp the update frequency to the display's update frequency.

Optimally, the browser would finish all its drawing before doing a new composite, but current browser implementations have slight problems with that. So it's really rather difficult to figure out the actual framerate visible to the user. If you have a high-speed video camera, you could record the display and see how fast it's updating.

But if you want to do it all in the browser, you could try something like this. First, hook up to the frame loop with requestAnimationFrame. Second, flush the drawing queue when your frame is done. Third, measure time from flush to flush. Hopefully browsers will move towards requestAnimationFrame only being called after flushing the previous frame.

Firefox and Chrome actually make this a bit easier for you by providing some built-in framerate instrumentation. Chrome dev channel has an about:flags FPS counter that (sadly) only works for accelerated content. Firefox 4 has the window.mozPaintCount property that keeps track of how many times the browser window has been redrawn.

References:
GPU Accelerated Compositing in Chrome
Hardware Acceleration in the latest Firefox 4 beta
ROC: Measuring FPS
Measuring HTML5 Browser FPS, or, You're Not Measuring What You Think You're Measuring

Please send me a note if anything above is wrong / misguided / an affront to your values and I'll try and fix it.

2011-03-31

Detecting new globals in JavaScript

JavaScript has this annoying little feature where if you forget the 'var' keyword when assigning to a variable, it creates a new global. This has a tendency to cause some veeery interesting bugs. So. It would be nice to detect those new globals. And I happen to have just the thing for that.

Paste this snippet to your webpage (preferably after creating all the globals you intended to create):
<script type="text/javascript">
if (true /* MONITOR_GLOBALS */) {
(function(){
var globals = {};
var startGlobals = [];
for (var j in window) {
globals[j] = true;
startGlobals.push(j);
}
if (false /* PRINT_INITIAL_GLOBALS */)
console.log("Initial globals: "+startGlobals.sort().join(', '));
setInterval(function() {
var newGlobals = [];
for (var j in window) {
if (!globals[j]) {
globals[j] = true;
newGlobals.push(j);
}
}
if (newGlobals.length > 0)
console.log("NEW GLOBALS: "+newGlobals.sort().join(', '));
}, 1000);
})();
}
</script>

Now whenever a script creates a new global, you get a notification in your JS console. Hopefully sparing you from some agonizing hours of debugging.

2011-02-05

Trying to make sense of this whole cooling thing

I kind of find CPU heat sinks interesting. Probably because of the noisy fans messing with my concentration. Why do we need those things anyhow?

I'm not a physicist so if I'm saying Very Silly Things below, I'd appreciate if you left a comment to set me straight.

Fans are devices to move ambient-temperature coolant fluid (read: air) into contact with heat sink fins, so as to maximize heat conduction from the heat sink to the environment. Heat conduction = thermal conductivity * area * temperature difference / distance between the different temperatures [according to Wikipedia]. If you think of the heat sink surface–ambient air -system in terms of that model, a fan would be trying to minimize the distance between the hot heat sink and the cool air. In practice, pushing off the boundary layer of hot air and replacing it with cooler air.

There are also other methods for optimizing the heat conduction equation, here's a small roundup. Endothermic reactions for soaking up heat and transporting it away: heat pipes. Maximize the temperature difference: dry ice, LN2 (plus the boiling is endothermic). Increase thermal conductivity of the coolant fluid: water cooling, mineral oil cooling. Increase conducting area: finned heat sinks, larger heat sinks - and I guess the increased mass acts as a thermal buffer that soaks up spikes in heat production. Decrease distance between the different temperatures: fan pushing in cold air, chimney enhancing natural convection. (Natural convection: hot air expands, making it less dense than cool air, which lets cool air fall below hot air. As long as you have gravity, that is.)

Sci-fi, where our Author reveals his lack of understanding in The Art of Physicks

  • You could also get cold air close to the heat sink fins by ionizing air and charging the heat sink with reverse polarity to attract the cool air to the heat sink (hopefully neutralizing the air in the process). Maybe that could get cold air more effectively through the boundary layer than mechanical pushing with a fan? They're doing some commercial research into that, and there's also a DIY Ion Cooler.

  • Some sort of channels on the heat sink surface to require less airflow to replace the boundary layer. Funnel the air to a higher velocity to push through the boundary layer more efficiently.

  • This is neat: vibrating piezoelectric heat sink fins.

  • Heat travels as phonons, maybe you could somehow create a heat waveform and sap it out through destructive interference with an audio source. Sort of like noise cancelling headphones. Audible noise is in the kilohertz range, heat is in the gigahertz range. Apparently these guys at MIT are making phononic mirrors.

  • Make the heat do work, cooling the heat sink down in the process. Stuff a heat sink full of piezoelectric crystals to convert the thermal expansion of the heat sink into electricity, conduct the electricity away to cool the heat sink.

  • Can heat be focused? Heat up a small region of the heat sink to a high enough temperature for blackbody radiation to really kick in, use mirrors to transport the resulting photons away. (Intensity of blackbody radiation grows as fourth power of absolute temperature, temperature grows roughly linearly with power input (apart from phase changes).

  • A meter-high 0.1x0.1m chimney with 50C internal air and 20C room temperature could generate 0.01 m^3/s airflow, or 22 CFM? Plugging in values to Q = C*A*sqrt(2*g*H*(Ti/T0-1)): 0.7 * 0.01m^2 * sqrt(2*9.81m/s^2*1m*(323K/293K-1)) = 0.0099 m^3/s.

  • Rotate the chimney, get a fire tornado!

  • An interesting page on thermal design

More half-baked ideas to keep the year rolling

Tone down flashing ads by rendering the web page to an accumulation buffer at a low opacity, so that there'll be a motion blur effect that evens out the flashing.

2011-01-27

Intel X25-V SSD

Bought an Intel X25-V 40GB SSD to do some random read benchmarking. Don't stick it into a PATA controller driven SATA port, as that limits the bandwidth to 70 MB/s and random access times to 5 ms.

Sequential read speed on the unused disk was limited by the SATA bus. So I guess it's not actually reading anything for unused blocks, just generating a string of zeroes on the controller and sending it back. Would be pretty amusing to have a special disk driver that keeps a list of the unused blocks in system RAM and generates the data on the CPU for any read accesses to them. You could probably keep it all in L1 and get nice bogus 4k random read benchmark numbers. "1 ns access latency!?? 100 GB/s random read bandwidth??? Whaaaat?" (An allocation bitfield for 80 gigs in 512k blocks is about 20 kB. Increase blocksize or add RLE for bigger volumes.)

Read speed on actual data was around 200 MB/s. Average random read time for a 4k block was 0.038 ms with 128 threads and 0.31 ms with a single thread. So the controller can do 8 reads in parallel. Which is nice for a cheapo SSD.

Didn't really test write performance apart than doing a cursory check. And yes, it is low bandwidth. 40 MB/s streaming writes. So it's best used as a random read drive.

Might make a nice random read array with 6 drives or somesuch. Hypothetically: 48 parallel reads, 160 000 4k reads per second (650 MB/s). And then all you need is software that can take advantage of that.

Getting flash chip latencies lower would be good.

2011-01-17

Stupid ideas to kick off 2011

They have those spinning disks with a couple gigs of flash as read cache, right. So, how about spending about 30e to put a bunch of fast flash chips and a couple gigs of DRAM on a motherboard to act as SATA read cache. Read the flash into DRAM during boot checks (I guess you have around 5 seconds to do it, so 0.5-1GB/s should suffice.) Ooh, mysteriously computer boots as if from ramdisk.

Or alternatively, if you like software more than hardware, use the fast flash as the OS disk, suck it to RAM during boot checks, write OS to be able to utilize that.

Also, maybe they could make a computer that's not actually a computer but a rock. And then you could have a mouse that's not a mouse but a rat and it'd bite your fingers off and give you rabies.

2011-01-15

How to take advantage of that memory?

[edit] Here's an rc.d init script I'm using to prefetch the FS to page cache: usr-prefetch. Copy to /etc/init.d and run `update-rc.d usr-prefetch start 99 2 .` to add it to runlevel 2 init scripts. I currently have 4GB of RAM, and the total size of the prefetched stuff is 3.7GB, so it might actually help. Maybe I should go buy an extra 2GB stick. It takes about a minute to do the prefetch run with cold cache, so the average read speed is around 60 MB/s. Which is pretty crappy compared to the 350 MB/s streaming read speed, maybe there's some way to make the prefetch a streaming read.

Ubuntu's ureadahead reads in the files accessed during boot in disk order. If you add all your OS files to it, it should be possible to stream the whole root filesystem to page cache in about ten seconds. I dunno.

And now I'm reading through ureadahead's sources to figure out how it sorts the files in disk order. It's using fiemap ioctl to get the physical extents for the file inodes, then sorts the files according to the first physical block used by the file. To read the files to the page cache, it uses readahead. If there was some way to readahead physical blocks instead of files, you could collapse the physical blocks into spans with > x percent occupancy, do a single streaming read per span (and cache only the blocks belonging to the files to cache).

Another easy way to do fast caching would be to put all system files on a separate partition, then cache the entire partition with a streaming read. Or allocate a ramdisk, dd the root fs onto it at boot, use it through UnionFS with writes redirected to the non-volatile fs. But whether that's worth the bother is another thing.
[/edit]

In the previous post I put together a hypothetical machine with 16GB RAM and 1.5GB/s random access IO bandwidth. Which is total overkill for applications that were built for machines with 100x less resources. What is one to do? Is this expensive hardware a complete waste of money?

The first idea I had on taking advantage of that amount of RAM was to preload the entire OS disk to page cache. If the OS+apps are around 10GB in total, it'll take less than 10s to load them up at 1.5GB/s. And you'll still have 6GB unused RAM left, so the page cache isn't going to get pushed out too easily. With 10s boot time, you'd effectively get a 20GB/s file system 20 seconds after pushing the power button.

Once you start tooling around with actual data (such as video files), that is going to push OS files out of page cache, which may not be nice. But what can you do? Set some sort of caching policy? Only cache libs and exes, use the 1.5GB/s slow path for data files? Dunno.

If you add spinning disks to the memory hierarchy, you could do something ZFS-like and have a two-level cache hierarchy for that there filesystem. 10GB L1 in system RAM, 240GB L2 on SSDs, a couple TB on spinning disks. But again, how sensible is it to cache large media files (which will likely be the primary use of the spinning disks.) IT IS A MYSTERY AARGH

It'd also be nice to have a 1-2GB of faster RAM between the CPU caches and the system RAM (with GPU-style 150GB/s bandwidth), but maybe you can only have such things if you have a large number of cores and don't care about latency.

But hmm, the human tolerance for latency is around 0.2s for discrete stuff, 0.015-0.03s for continuous movement. In 0.2s you can do 4GB worth of memory reads and 0.3GB of IO, in 0.02s the numbers are 0.4GB and 0.03GB respectively. On a 3.5 GHz quad-core CPU, you can execute around 3 billion instructions in 0.2s, which'd give you a ratio of 0.75 ops per byte for scalar stuff and 6 ops per byte for 8 op vector instructions. On GPUs the ops/byte ratio is larger, at close to 20 ops/byte for vector ops, which makes them require more computation-heavy algorithms than CPUs.

In 4GB you can fit 60 thousand uncompressed 128x128 images with some metadata. Or hmm, 6 thousand if each has ten animation frames.

2011-01-13

The computer hardware of 2011

Here's a 1500e computer for 2011. It's a build where I tried to minimize the bottlenecks and get a nice memory pyramid with GPU cache 2 TB/s, GPU RAM 300GB/s, CPU cache 200GB/s, RAM 20GB/s, IO 2GB/s. Throw in 100e worth of spinning disks there in RAID-0 for a trailing 0.2GB/s disk IO.

CPU: 4-6 cores, 3.5GHz, 150e.
Motherboard: 150e.
Case: 50e steel box with razor-sharp edges and PSU from hell.

Total: 350e.

Memory: 8GB @ 150e.
Memory bandwidth: 20GB/s.

Total: 500e.

IO subsystem: 4xSSD RAID-0 @ 500e.
2GB/s streaming IO bandwidth.
1.5GB/s random access IO bandwidth at 4k block size.

Total: 1000e.

GPU: two upper middle-class things @ 500e.
Computing power: 3 TFLOPS.
Memory bandwidth: 300GB/s.

Total: 1500e.

(Optionally 2x 500GB disks, 0.2GB/s IO bandwidth @ 100e for a total of 1600e.)


Reasons for the selections:

The performance difference between a 150e CPU and a 300e CPU is ~15%, and around 25% with overclocking. The point is more to get high memory bandwidth with minimal cost. Non-bargain mobo selected for 6Gbps SATA and dual GPUs. Only 8GB memory to cut costs. All money saved was tossed on getting random access IO onto the order-of-magnitude curve and the GPUs to cap the top end of the memory bandwidth curve (and for a disproportionate amount of computing power). The SSD numbers are based on recently announced SATA 6Gbps drives. If you don't care about GPU performance, get a 300e CPU, use IGP, double the RAM and buy an extra SSD.

Now, programming this thing is going to be interesting. Not only do the CPU and GPU require parallelism to extract reasonable performance, but the random access IO subsystem does too. The IO subsystem latencies are probably in the 0.2ms range but it can do 64 parallel accesses.

A single-threaded C program will achieve maybe 1% of peak performance of the hardware, if that.

For legacy software, you could buy a 300e computer and it'd perform just as well.

CPU: 70e, mobo: 60e, case: 50e, RAM: 20e, SSD: 90e. Toss a 40e disk in there for good measure.

Or better yet, quit computers altogether and spend all the money on furniture.

Blog Archive