Hah.
The motion of your speaker will look like 'that ugly thing' because it has inertia - there is no way it can cease to exist in one location then appear in another as a square wave would demand.
Try putting a synthesized square (at relatively low frequency) wave through a one pole low pass filter (to emulate the limited speed at which your speaker can move). This is what happens.
You should read about taylor series or whatever you want to call them - the fact is the theoretical harmonic components of a square wave go up to infinity, and considering harmonic components is pertinent since the instrumentation (ear drum -> bones in your ear -> hairs in cochlea) in your ear effectively performs a wavelet transform on the sound waves coming into your ear. Obviously you can't represent all of these harmonics in a finite bandwidth system (i.e. human hearing, sampled audio, or in fact any moving system with real mass) - and the loss in these higher frequency components from a 'true' square wave creates the ripples you see in a band limited square wave. On the high end of the scale, C9, 10, you only get a small number of audiable/transmitted harmonics, so what is seen will truly look like only a combination of 2 or 3 sine waves...
Hmm, that was a bit of a ramble but I think it's all in there...