How does AR work?

In a nutshell, AR relies on image recognition to allow a virtual object to be scaled and rotated in such a way that it matches the environment it’s intended to be a part of. This illusion, especially when overlaid on top of live video from a camera on the mobile device, provides a convincing experience that allows the real world to be augmented with virtual objects. So how does it work?
The first step is for the AR app to work with an object that is anchored in the real world, and from which it can work out it’s own position relative to it. This is where the trigger image comes in, as it’s this that provides the anchor point for the virtual object.

The trigger image is pre-processed to calculate detail points that results in a unique electronic fingerprint for that image. That fingerprint is then embedded into the AR app. The fingerprint needs to have a random enough distribution of detail points for the AR app to work out it's orientation and distance in relation to it.

It does this by generating a new fingerprint from the camera feed, and then comparing this fingerprint to the known fingerprints stored in the app. To try and get a match, complex calculations are performed to scale and rotate the fingerprint derived from the camera feed in various ways, as the device camera could be looking at the trigger image from any number of positions. If a match can be found, then the AR app will know which trigger it’s seeing, and more importantly, based on the scaling and rotation required to get that match, it can work out it’s own relative position and orientation to the trigger image.

Once the device knows how far away it is from the trigger, and how the trigger image seen by the camera has been rotated to get a matching fingerprint, it’s able to use this information to scale and rotate the virtual object in a way that makes it appear to be anchored to the trigger image. This complex process of image recognition and complex mathematical calculations are performed 30 times a second, to keep up with the frame rate of the camera, and allow the virtual objects position and rotation values to be constantly updated as the device moves around the trigger image.

In addition, where the hardware supports it, sensor information from gyroscopes and accelerometers are factored into the equation. This helps to maintain a lock on the trigger image in case the image tracking fails briefly, perhaps due to quick motion, lighting changes or other environmental factors. This ensure a solid tracking experience, and maintains the illusion of a virtual object existing in the real world.
As you can see, AR is a very computationally heavy technology, which also relies on many hardware systems to provide a seamless experience. Mobile devices have only become powerful enough to deliver a convincing experience during that last few years, and with the decreasing cost of this technology, it’s now an experience that most mobile device consumers can enjoy