art with code

2012-12-26

Using Swiffy to create HTML animations in Flash

Introduction


The above animation was drawn in Flash but runs in pure HTML5, so it works on your mobile as well. Go on, try it out! Magic, no? Keep reading to find out how to make your own HTML animations in Flash.

First off, get Flash. The easiest way is to get a Creative Cloud subscription and download it from there. Done? Ok. Now, draw your animation. Don't know how? No problem.

Creating your animation


Here's a five-line tutorial to creating Flash animations: Create a new ActionScript 2.0 project, go to the timeline pane, click the onion skin button (looks like two squares), hammer F6 to create new keyframes, use comma and period to move back and forward in the animation, use the brush tool to draw each frame. For more details, you could check out this Adobe tutorial for Flash basics, Drawn to Life for animation lessons, or for more advanced workflow, the Foundation Flash Cartoon Animation book.

Installing the Swiffy extension


Go to the Swiffy project page, download the Swiffy Extension and open it. The extension opens in the Adobe Extension Manager, which walks you through the installation process. Now the next time you open Flash, you should have an "Export to HTML5" menu item in the Commands menu.

Exporting your animation


To export your animation, go to the Commands menu and click on "Export to HTML5". You get a small HTML file with your animation in it, ready for use.

Embedding your animation on your website


The easiest way to use your animation is to put it in an IFRAME. The name of the Swiffy animation file is <your animation name>swf.html, so if you had a file MyAnimation.fla, the HTML file would be MyAnimation.swf.html. And to use it on your page you'd make an IFRAME like this <iframe src="MyAnimation.swf.html" width="500" height="400">.
As the animation is now plugin-free, all the usual CSS transforms and animations should work perfectly on it.

Done!


Now go forth to make your own HTML animations and wow the world!

The source is available on GitHub.






2012-12-19

Rethinking digital cameras a bit

Here's the idea: move the sensor instead of the lens.

Ok. Think about the implications for a bit.

Are you thinking? Alright, let's go!

You can use a moving sensor for autofocusing manual-focus lenses. The Contax AX did it in the 90s. The problem then was that the Contax AX was a SLR, so moving the film plane required moving the heavy mirror assembly with it. It was a bit like moving the garage to change the position of your car. Doable, but... wouldn't moving the car be easier?


With mirrorless cameras, you don't have a heavy mirror assembly to move. You only need to move the sensor chip. An electronic viewfinder works the same no matter where the sensor chip is.


One other effect of moving the film plane on the Contax AX was that it could turn non-macro lenses into macro lenses. Moving the sensor is also useful for in-body image stabilization, like on the Olympus m43 cameras. Suppose you take in-body image stabilization one step further and make the sensor angle manually controllable. Bam, you've got yourself a tiltable lens.

When you use an autofocus lens, the motor in the camera body or inside the lens needs to move heavy pieces of glass back and forth. This operation is usually slow, noisy and uses a good deal of battery.

If you use the sensor for focusing and image stabilization, you can make the lenses much lighter and smaller. By moving the light sensor chip instead of heavy glass, you can focus faster and get longer battery life.

A native lens made for a movable sensor camera can be a fixed focus lens with the aperture control on the lens. Simple, small and light. Compare this to a common DSLR lens with autofocus motors and image stabilization motors.

Wide-angle lenses on DSLRs require some optical gymnastics to work with the 40+ mm flange distance[distance from the back of the lens to the sensor] required by the mirror assembly. This is why DSLR lenses shorter than 40 mm are larger than 50 mm primes. Compare a Canon 24 mm f/1.4 to the 50 mm f/1.4. Due to the retrofocus design needed, the 24 mm is 1.5x as large, 1.5x as heavy and four times as expensive. With a mirrorless design, you could move the sensor right up to the back of the lens and make small wide-angle lenses. Well, as long as the sensor works OK with light coming from obtuse angles. [src]

A moving sensor also makes lens adapters easier to make. As long as the lens can be focused to infinity and has aperture controls on the lens, it turns into an image-stabilized tiltable macro autofocus lens on a moving sensor camera. And the moving sensor makes flange distances less of an issue.

On the negative side, telephoto lenses may require too much movement to focus in-camera. You'd have to focus them to a rough range first, then use the in-camera focus to do the rest.

Does anyone have ideas on how to build a prototype of something like this? Starting with a sensor chip that moves back and forth. You can get a decent manual focus lens from eBay for 20 bucks and Arduinos should work for the microcontroller stuff (dunno about contrast-detection AF, guess you could make it real slowww). Precision motors & rails? I've got a mostly unused medium format body lying around here, but maybe it'd be better to just make a big dark box with the lens attached to it.

2012-11-25

Why HCO?

The question is, why are you primarily made out of hydrogen, carbon and oxygen? Wouldn't some other combination do as well? Hydrogen is understandable, right. It's the lightest and most common element in the universe. So it makes sense that you've got a lot of hydrogen in you. But carbon and oxygen?

Carbon has an atomic number of 6. Hydrogen has an atomic number of 1. Why jump all the way to 6 instead of using one of the elements between? Let's look at them in order. Number two is helium. Helium is a noble gas and doesn't really react all that much. So it's unlikely that you would have much helium bound into your molecules. Number three is lithium. Lithium nuclei verge on unstable, so they get broken easily. For that reason, there's not much lithium around in the solar system. Number four is beryllium, another rare element that's produced primarily through cosmic rays punching heavier nuclei and knocking out protons. It appears in stellar nucleosynthesis, but gets fused away rapidly. Number five is boron. Boron is only produced through cosmic ray spallation, so it's very rare.

Then we get to number six, carbon. Carbon is the fourth most common element in the universe, and can form a massive amount of different compounds. It's produced in stellar nucleosynthesis by the triple-alpha process where two helium nuclei fuse into a highly unstable beryllium nucleus, followed by a third helium nucleus fusing with the beryllium to produce a carbon nucleus. Additionally, if a fourth helium nucleus fuses with the carbon nucleus, they produce oxygen. If carbon gets fused with a hydrogen nucleus instead, the result is nitrogen. Much of the carbon gets further fused into neon, which breaks up into helium and oxygen in neon-burning conditions.

So you get the five most common elements: hydrogen, helium, oxygen, carbon and nitrogen. Helium is not very reactive, but the other four are. And so you too are made up of hydrogen, oxygen, carbon and nitrogen, in that order (by atomic count).

The funny way to think about it is that plants and animals are solids made out of gases. Burn hydrogen and oxygen to get water, add in some carbon dioxide and nitrogen and bake in sunlight.

[Sources: Wikipedia]

2012-11-18

Carbonated air

[copypasta from G+]

The problem is one of controlling the amount of carbon in the atmosphere. Which would be a very handy technology to have. Bye bye ice ages, etc.

There was a bunch of carbon under the ground. People dug it up and put it into the air by coupling it with oxygen, creating an airborne CO2 molecule and releasing a decent amount of heat in the process. The amount of carbon moved from underground into the atmosphere is around 10 GT per year, and it's rising as more and more people want to have heat and heat-byproducts like electricity. A small portion (~2.5%) of CO2 is created in the production of concrete, where the shells of long-dead marine organisms are decomposed from CaCO3 to CaO + CO2.

To break the carbon away from the CO2 molecules, you'd probably have to expend more energy than the act of putting them together released. The other option is to move the CO2 out of the atmosphere.

To move the CO2 out of the atmosphere, we have to separate it from the rest of the air and move it into a place where it can't escape from. To successfully do this, we have to move at least as much carbon out of the atmosphere as we're putting in there.

Trees are one option. A tree is mostly solidified air and water. It takes CO2 from the atmosphere and with the power of sunshine turns it into cellulose, growing a little bit in the process. One square kilometer of forest generates around 300 cubic meters of tree biomass per year, which contains around 75 tons of carbon. To capture 10 GT of carbon per year, we'd have to plant 133 million square kilometers of new forest and bury all the new growth back underground. The land area of the world is 148 million square kilometers.

Another option is chemical weathering to bind the CO2 into silicate rocks. This happens naturally and triggers ice ages. But it's kinda slow and requires exposing a whole lot of rock.

These guys http://pubs.acs.org/doi/full/10.1021/es0701816 propose a method to do a sort of oceanic acid switch. Put the CO2 into the ocean and take HCl out so that the ocean acidity doesn't change. Then get rid of the HCl by using it to weather silicate rocks, a reaction that's much faster than the CO2 weathering. The problem here is that it takes 100-400 kJ per 12 grams of carbon. To take out 10 GT of carbon per year would use 2.5-10 TW of energy (or 15-60% of the world's annual energy production. Which might be a useful number for an atmospheric carbon tax.)

Anyway, the stuff is up there and the current biosphere can't use it up fast enough. Industrial-level use of carbon put it there, and it's going to take an industrial-level solution to get it back.

The really nice solution would be a chemical reaction that releases energy and binds CO2 into a heavier-than-air compound that's easy to store. Like http://pubs.acs.org/doi/abs/10.1021/jp205499e . Then you could take the CO2 exhaust, react it further to generate more energy, and put the solid exhaust into a pile. The problem with the Li3N + CO2 reaction is that lithium is pretty rare. Total worldwide production is measured in thousands of tons, compared to the billions of tons required for getting rid of CO2.

[new words]

You could also react CO2 with something else to produce a solid exhaust. Photosynthesis turns CO2 and water into sugar and oxygen. Another possibility might be burning magnesium in CO2, producing MgO and carbon soot, CO2 + 2Mg => 2MgO + C.  The resulting MgO turns to MgCl+ H2O in HCl, which can be electrolysed to separate out the magnesium. The captured magnesium can be used to burn the next batch of CO2. Anyway, the energy required to electrolyse the magnesium is likely going to be more than the energy released in burning the C and Mg in the first place. Say, the electrolysis might require around 18 MJ per kg of Mg. You need 2Mg @ 24 g/mol to burn 1C @ 12g/mol, or about 40 Gt of Mg to burn 10 Gt of C. At that kind of energy use, you'd need around 22 terawatts for the electrolysis. Global energy production is around 15 TW, natch.

As for usable liquid exhausts, CO2 reacts with hydrogen to produce methanol. Methanol is a fuel and can be burned to produce CO2. This is the basis of Methanol Economy.

Other alternatives include capturing CO2 at the factory pipe, using a portion of the generated power to store the CO2 in a tank. Where the tank may be the bottom of the ocean or a drilled gas deposit, as you're going to need a lot of volume. 10 GT of carbon means 36 GT of CO2. Stored as uncompressed gas, 36 GT of CO2 would take 18 thousand cubic kilometers of space. If you freeze the CO2 solid, it'd still take up 23 cubic kilometers. As liquid, around double that. The biggest LNG storage tank in the world is 200,000 cubic meters. You'd need to build 115,000 of those every year to fit 23 cubic kilometers of solid CO2.

Deep ocean waters contain something like 38,000 gigatons of CO2. You could pump all the fossil carbon in the world – around 1,000 gigatons – there and cause just a 3% increase. That's kind of a silly way to go about it though. Carbon is valuable. Today, 90% of the world's energy production is from carbon oxidation (France is the major anomaly here, they're producing 80% of their electricity with nuclear power.) Likewise, control over atmospheric carbon is valuable. Ice age coming? Crank up the CO2. Too much heat in the atmosphere? Sequester some away.

Dealing with carbon is going to require a lot of energy though. You'd want to build a lot of wind, solar or nuclear to produce enough energy to power the carbon capture mechanisms. In the long term, hydrocarbons might work as a battery technology for continuable energy production. Continuable? Why yes, there's a limited amount of carbon readily available. Putting it all into the atmosphere or into the oceans isn't really going to help you keep burning it. Putting the carbon into trees and relying on tree-driven solar power to turn it back into burnable carbon requires lots of trees. In 2010, we moving 9 Gt of carbon into the atmosphere. The current estimated worldwide reserves of carbon are around 800 Gt. The carbon gasification growth rate between 1990 and 2010 was 2% annually. At that rate, all the carbon reserves known to energy companies would be up in the atmosphere and down in the oceans by 2060.

2012-09-01

Hi-res trouble

Retina displays, mobile phones, zoomable browsers. Whatever you call it, the days when you could assume a single image pixel to be displayed on a single display pixel are gone.

Nowadays a normal web site has images that are too big for mobiles and too small for hi-res desktops. The mobile shows your 800x600 image on 588x441 display pixels and the hi-res desktop shows it on 1600x1200 display pixels.

And it's not just images. The <canvas> element suffers from the same issues. Safari renders the <canvas> at display resolution and scales down when zooming out, Chrome uses a layout resolution <canvas> that gets scaled to display resolution. By comparison, SVG scales gracefully.

The issue with <canvas> is that if you think of it as a pixel drawing surface, you think layout pixels should be drawing surface pixels. If you think of it as a vector drawing surface with some pixel manipulation commands, you think display pixels should be drawing surface pixels.

What I'd like to have as a web developer is a way to not care. I'd just make my website look good on tablets and desktops using <img src="foo.bar"> like always and use getImageDataHD for hi-res <canvas> pixel manipulation. For mobiles I'd have to make a custom UI as always, but I'd like to use the same photos as on the large-screen site.

Technology-wise I'd want the browser to load enough of each image to fill all the display pixels with at least one image pixel. And stop loading there. For huge images, I'd like the browser to load only the portion of the image that's visible on the screen.

And this all should happen with a single HTTP request per image, have no bandwidth overhead and require no server-side support.

The single request requirement means that either the image element or the image filename need to contain enough information to load the image at the optimal size. The no bandwidth overhead requirement means that the loaded file should contain only the optimal size image. The requirement for no server-side support means that the image should be a static file or a directory.

The <picture> element and <img> srcset-attribute are ways to add more information to the image element. The problem with them is that I want to use my regular <img src="foo.bar"> and not type more stuff.</picture>

Making a custom image file format with multiple image sizes would get you <img src="foo.bar">. On the downside you either end up with bandwidth overhead from loading several versions of the image, or require two HTTP requests: one to load the list of image sizes, second to load the wanted image.

With server-side support the browser can send display resolution in the request and the server will pick the appropriate image to send back. That needs server-side support though.

If you have the image stored as a directory with multiple sizes of the image and tiled versions for large images, and you name the image versions by their display resolution, the browser can directly request the optimal resolution image. By having the image name contain the maximum image size, the browser can limit itself to loading only up to that resolution. For backward compatibility, the <img> src-attribute can point to a valid image inside the directory. If you implement that in JavaScript you end up with an extra request for the <img> src-attribute though.


2012-04-30

Hosting static web pages on App Engine


http://github.com/kig/app-engine-demo-static

How to host your website on Google App Engine:
1. Sign up to http://appengine.google.com
2. Create a new application (I named mine webtest-12345).
3. Install the Python SDK https://developers.google.com/appengine/downloads
4. Create a new directory with an app.yaml file like this:
   https://github.com/kig/app-engine-demo-static/blob/master/app.yaml
   (replace webtest-12345 with the name of your app).
5. Create a subdir www/ and put your site there, remember to have index.html.
6. Start the Google App Engine Launcher and do "Add Existing Application...",
   pointing it to the directory with the app.yaml file.
7. Test locally.
8. When happy, click deploy.
9. Done! (Check it out: http://webtest-12345.appspot.com/)

If you want to have a more automatic experience, here's a small script:

  export MY_APP_NAME=your-app-name-here;
  git clone https://github.com/kig/app-engine-demo-static.git;
  cd app-engine-demo-static;
  sed -e "s/webtest-12345/$MY_APP_NAME/" -i '' app.yaml;
  appcfg.py update .

To host the site from your own domain:
1. Go to the Application Dashboard (easiest way is to click the Dashboard
   button in App Engine Launcher)
2. Go to Administration > Application Settings.
3. Go to Domain Setup and click on Add Domain...
4. If you're on Google Apps, enter your domain.
5. Otherwise click on "Sign up to Google Apps"
  5.1. Enter your details and optionally register your domain.
  5.2. Go through the rest of the sign up process.
  5.3. Create your website app with your new Google Apps account
       following the instructions above. Or juggle accounts (you need to be
       logged in to Google App Engine to add the app to your domain and you
       need to be logged in to Apps to accept the addition.)
  5.4. Done? Ok, let's continue!
6. Click the "Yes I want to add this app to my domain"-button.
7. Now you're on the Apps app settings page. Click on "Add new URL",
   enter www and click "Add".
8. If you registered your domain via Google Apps:
   Go to go to "Domain Settings" > "Domain Names" > "Redirect your naked domain"
   Continue >> I've completed these steps >> Save Changes
   (You don't need to change any settings at the DNS, they're all
    correct already)
8.1 Otherwise log into your DNS service provider and
   add a CNAME record from www.YOURDOMAINNAME to ghs.google.com and
   do the "Redirect your naked domain" bit above and in your DNS add
   the four A records for YOURDOMAINNAME. Click Save Changes on the
   Google Apps page.
9. Done! (http://webtest-12345.com)

Wait for a couple minutes for all the DNS changes to settle.

2012-02-06

New things in WebGL land

Compressed textures are coming!
http://www.khronos.org/registry/webgl/specs/latest/#COMPRESSED_TEXTURE_SUPPORT
Draft extension: WEBGL_compressed_texture_s3tc

Extension initiatives for depth textures, uint element indexes and public-webgl discussion about anisotropic filtering!
In the draft extension list: WEBGL_depth_textureOES_element_index_uint


What is this: OES_vertex_array_object


Uint8ClampedArray is the Typed Array equivalent to Canvas ImageData.

2012-02-04

More basic math

A couple more simple "why is that" breakdowns:

Why does multiplying by -1 give you the negative of a symbol:

(-1xA) = (-1xA)
(-1xA) = ((1xA) + -(1xA)) + (-1xA)
(-1xA) = A x (1 + -(1xA)/A + -1)
(-1xA) = A x (-(1xA)/A)
(-1xA) = -(1xA) x A/A
(-1xA) = -(1xA) x 1
(-1xA) = -(1xA)
(-1xA) = -A

Why is -(B x A) = (-B x A):

-(B x A) = -(B x A)

-(B x A) = -1 x (B x A) 
-(B x A) = (-1 x B) x A
-(B x A) = (-B x A)

Why does multiplying by 0 give you 0:

0A = 0A           | X + -X = 0
0A = (B + -B)A    | (X+Y)Z = XZ + YZ
0A = (BA) + (-BA) | (-BA) = -(BA)

0A = (BA) + -(BA) | X + -X = 0
0A = 0


2012-02-01

Exposure lock on Nikon D5100

Go to MENU > Custom Settings > Controls > f2 Assign AE-L/AF-L Button. Set it to AE lock (Hold).

Now the AE-L button acts as a toggle.

Very useful for videos.

2012-01-29

Shooting high ISO in broad daylight


DSLRs these days get usable results even at super-high ISO (sensor sensitivity to light, higher it is the less light you need). But, um, what's the use of ISO 25600 if most of your shooting happens in bright daylight (or anything other than pitch-black darkness). Let's think!

What you get from cranking up ISO is the capability to use faster shutter speeds and tighter apertures. So you get photos with less motion blur and more of the image in focus. So you could shoot in motion and not need to focus. This is starting to sound useful. Shoot while walking without having to stop.

How about framing though? Having your camera up on your eye when you're walking up stairs sounds like a recipe for broken bones. Shooting from the hip would be nicer, but now you can't see what's in frame. Wide-angle lenses to the rescue! Get enough of the scene in the image that you'll likely have your subject in the frame, and do the framing in post.

It's really fast to take photos if you don't have worry about focus or framing. Point camera at what you're interested in, press shutter, done.

High ISO looks like crap in color though. Go black and white and you'll get smooth usable results at 6400 and noisy results at 25600 on a Nikon D5100 / D7000 / Sony NEX-5N (all have the same sensor AFAIK. I have a D5100). I'd kinda like to try a NEX-5N with a pancake lens for a small setup.

To recap: set ISO to 6400 or 25600, shoot in black and white, use manual focus (set to near-infinity or 20 meters or something), set shutter speed to 1/1000 or 1/500, aperture to f/16, use a 24mm lens or wider, snap away while walking!

Here's a gallery of my results from yesterday. They're not all pure examples of this technique, for some I brought the camera to my eye to do framing.

2012-01-27

Animating a million letters with WebGL





Here's an WebGL animated book demo! It's got just 150000 letters, but it does scale up to two million.


Writing efficient WebGL is a bit funny. The basic idea is to collect as many operations as possible into a single draw call, as changing the WebGL state machine state and doing WebGL calls is relatively expensive. If you want to draw more than a couple thousand objects at once, you need to adopt a quite different strategy for drawing.

The usual way of drawing with WebGL is to set up your uniforms, buffers and shaders for each object, followed by a call to draw the object. Unless your object is very complex, the time taken in this way of drawing is dominated by the state setup. To draw in a fast way, you can either do some buffer editing in JavaScript, followed by re-uploading the buffer and the draw call. If you need to go even faster, you can push more computation to the shaders.

My goal in this article is to draw a million animated letters on the screen at a smooth framerate. This task should be quite possible with modern GPUs. Each letter consists of two textured triangles, so we're only talking about two million triangles per frame.

Ok, let's start. First I'm going to create a texture with the letter bitmaps on it. I'm using the 2D canvas for this. The resulting texture has all the letters I want to draw. Then I'm going to create a buffer with texture coordinates to the letter sprite sheet. While this is an easy and straightforward method of setting up the letters, it’s a bit wasteful as it uses two floats per vertex for the texcoords. A shorter way would be to pack the letter index and corner index into one number and convert that back to texture coordinates in the vertex shader.

I also upload a two-million triangle array to the GPU. These vertices are used by the vertex shader to put the letters on the screen. The vertices are set to the letter positions in the text so that if you render the triangle array as-is, you get a basic layout rendering of the text.

With a simple vertex shader, I get a flat view of the text. Nothing fancy. Runs well, but if I want to animate it, I need to do the animation in Javascript. And JavaScript is kinda slow for animating the six million vertices involved, especially if you want to do it on every frame. Maybe there is there a faster way.

Why yes, we can do procedural animation. What that means is that we do all our position and rotation math in the vertex shader. Now I don't need to run any JavaScript to update the positions of the vertices. The vertex shader runs very fast and I get a smooth framerate even with a million triangles being individually animated every frame. To address the individual triangles, I round down the vertex coordinates so that all four points of a letter quad map to a single unique coordinate. Now I can use this coordinate to set the animation parameters for the letter in question.

The only problem now is that JavaScript doesn’t know about the particle positions. If you really need to know where your particles are, you could duplicate the vertex shader logic in JavaScript and update them in, say, a web worker every time you need the positions. That way your rendering thread doesn’t have to wait for the math and you can continue animating at a smooth frame rate.

For more controllable animation, we could use render-to-texture functionality to tween between the JavaScript-provided positions and the current positions. First we render the current positions to the texture, then tween from the JS array towards these positions, updating the texture on each frame. The nice thing about this is that we can update a small fraction of the JS positions per frame and still continue animating all the letters every frame. The vertex shader is tweening the positions.

Using MediaStream API

Using the MediaStream API to access webcam from JavaScript:

navigator.webkitGetUserMedia("video,audio", 
  function(stream) {
    var url = webkitURL.createObjectURL(stream);
    videoTag.src = url;
    videoTag.onerror = function() {
      stream.stop();
      alert('camera error');
    };
  },
  function(error) {

    alert(error.code);
  }
);

Very basic math

I was playing around with the idea of presenting fractions in the same way as negative numbers. Instead of 1/x, you'd write /x. Just like instead of 0-x, you write -x. And since multiplication with single-letter symbols is often annotated with putting the symbols next to each other, marking the inverse with /x looks quite natural: A x /B = A/B, 9 x /7 = 9/7.

It also makes you think of the inverse in less magical terms. Consider the addition rule for fractions:


A   C   AD   BC   AD + BC
- + - = -- + -- = -------
B   D   BD   BD     BD

There's some crazy magic happening right there. The literal meaning is (A x D x 1/B x 1/D) + (C x B x 1/D x 1/B), but you wouldn't know from looking at that formula. And it gets even more confusing when you start multiplying and dividing with fractions. Think about the following for a moment:

A   C   AD
- / - = --
B   D   BC


Right?


In linear notation with /B and /D and suchlike, this all actually sort of makes sense in a non-magical way. Here's the first of the above two examples (with intermediate phases written out):


(A x /B) + (C x /D)
= [1 x (A x /B)] + [1 x (C x /D)]
= [(D x /D) x (A x /B)] + [(B x /B) x (C x /D)]
= [(A x D) x (/B x /D)] + [(B x C) x (/B x /D)]
= (/B x /D) x [(A x D) + (B x C)]

 [here's where you go: "oh right, /7 x /4 = /28", analogous to 7 x 4 = 28]

And the second one:

A x /B x /(C x /D)
= A x /B x /C x D
= (A x D) x (/B x /C)


Note the similarity with addition:


A + -B + -(C + -D)
= A + -B + -C + D
= (A + D) + (-B + -C)


Now, you might notice that there is a bit of magic there. How does /(C x /D) magically turn into (/C x D)? Or -(C + -D) to (-C + D) for that matter. Let's find out! Here's how it works:


/(C x /D)
= 1 x /(C x /D)
= [(/C x D) x /(/C x D)] x /(C x /D)
= (/C x D) x /(/C x C x D x /D)
= (/C x D) x /(1 x 1)
= (/C x D) x /1 -- Remember the axioms 1 x N = N and N x /N = 1. Since 1 x /1 = 1 we get /1 = 1.
= (/C x D) x 1 = (/C x D)


For the -(C + -D) case, replace / with -, x with + and 1 with 0.


And there you have it, my small thought experiment. And derivations for some basic arithmetic rules. I kinda like how breaking the magic bits down into the basic field axioms makes things clearer.


[edit]

Why is /A x /B = /(A x B)?

/(A x B) x (A x B) = 1 


1 x (/A x /B) = (/A x /B)
/(A x B) x (A x B) x (/A x /B) = (/A x /B)
/(A x B) x (A x /A) x (B x /B) = (/A x /B)
/(A x B) x 1 x 1 = (/A x /B)
/(A x B) = (/A x /B)

2012-01-21

Fast code

I was thinking of the characteristics of high-performance language runtimes (read: execution times close to optimal for hardware) and came up with this list:
  • Flat data structures (err, like an array of structs where struct N is in memory right before struct N+1)
    • streaming memory reads are prefetcher-friendly, spend less time chasing pointers
  • Tightly-packed data
    • memory fetches happen in cache line -sized chunks, tightly-packed data gives you more payload per memory fetch
    • fit more payload into cache, faster subsequent memory accesses
  • Reusable memory
    • keep more of the working set of data in cache
  • Unboxed values
    • spend less time chasing pointers
    • generate tight code for data manipulation because data type known (float/double/int/short/byte/vector)
  • Vector instructions
    • more bytes manipulated per instruction, data moves faster through an execution unit
  • Parallel execution
    • more bytes manipulated per clock cycle, data moves faster through the processor
  • Keep data in registers when possible
    • less time spent waiting for caches
  • Keep data in cache when possible
    • less time spent waiting for memory
    • instead of going over the full data set several times end-to-end, split it into cache-sized chunks and process each one fully before moving onto the next one
  • Minimize amount of unnecessary data movement between processing units
    • keep data close to processor until you're done with it, even more important with GPUs
  • Flat code layout
    • low amount of jumps per byte processed
  • Tight code
    • keep more of the program in cache
  • Interleaved I/O
    • work on already loaded data while loading in new data
    • minimum amount of time spent waiting for I/O
You might notice that the major theme is optimizing memory use. I started thinking of program execution as a way to read in the input data set and write out the output data set. The sizes of the input and output data give you a nice optimum execution time by dividing the data set size by memory bandwidth (or I/O bandwidth if you're working on big things). The flow of the program then becomes pushing this river of data through the CPU.

Suck in the data in cache line -sized chunks, process entire cache line before moving to the next, preload next cache line while processing the current one. Use vector instructions to manipulate several bytes of data at the same time, use parallelism to manipulate several streams of data at the same time. Make your processing kernel fit into L1 instruction cache

Gnnnn ok, back to writing JavaScript.


Blog Archive