In the early years of film making the best way to ensure a smooth moving camera shot was to use a camera dolly (looked like miniature railway tracks), as you can imagine this was quite restrictive in terms of the cost and the locations that you could actually set up the rail like trappings, and so in 1975 Garret Brown invented the Steadicam which is a brand of Camera stabilizer which mechanically isolates the camera from the movement of the operator. The Steadicam was used to produced many iconic scenes in films like Rocky and the Shining. Over the last 20 years video stabilization methods have moved away from mechanical isolation, and more towards mathematical analysis, and we are now at the point that in the last year alone we have seen two methods released freely to the public. Unfortunately for us both are called Hyperlapse, developed by Microsoft & Instagram.

Instagram's Hyperlapse bares a strong resemblance to existing video stabilization algorithms in that it tries to warp each video frame in order to remove subtle camera shakes. The Instagram algorithm does have some subtle difference to existing methods, for example, it does not use image analysis to determine frame correction (like Adobe After Effects), but rather makes use of the camera's in built gyroscope. This technique is both innovative and problematic in that the gyroscope on each camera/smart phone is different and so the algorithm would need to be adjusted per device type. Furthemore this technique by definition does not lend itself to prerecorded videos.

With these limitations acknowledge the following video highlights and contrasts an extreme case where Instragram's algorithm does a relatively good job of job of warping each frame to produce a smooth video, but that smoothness comes at the sacrifice of cropping to relatively small area (stabilization window) leaving blank pixels around the edges of the frame.

Microsoft's Hyperlapse

We present a novel video stabilization method which models camera motion with a bundle of (multiple) camera paths. The proposed model is based on a mesh-based, spatially-variant motion representation and an adaptive, space-time path optimization. Our motion representation allows us to fundamentally handle parallax and rolling shutter effects while it does not require long feature trajectories or sparse 3D reconstruction. We introduce the ‘as-similaras-possible’ idea to make motion estimation more robust. Our space-time path smoothing adaptively adjusts smoothness strength by considering discontinuities, cropping size and geometrical distortion in a unified optimization framework. The evaluation on a large variety of consumer videos demonstrates the merits of our method.

Looking closer at the mathematics behind this Hyperlapse technique we see three stages of analysis Scene Reconstruction, Path Planning and finally Rendering.

Scene Reconstruction creates an approximate 3D model of the world, using structure for motion. A k-d tree is used to create a match graph which in turn helps detect and remove redundant frames (e.g. when the camera has stopped). Path Planning finds a camera path through the video that follows the original path while simultaneously promoting smoothness. There are several competing objectives that are factored in two stages:

Stage one

  • Spatially smooth
  • Close to input camera path

Stage two

  • Smooth rotation
  • Rendering quality

Rendering is where this new technique differs the most, instead of using a single frame to render a single output frame it uses multiple. In this way each frame can be rendered beyond the narrow stabilization window. Rendering can be divided into the following stages:

The most compelling thing about Microsoft Hyperlapse technique is that it is widely available on multiple platforms.

The associated technical paper is a fascinating read and can be found here. The following is my humble attempt at a Hyperlapse video (aided by my niece):