How does a Kinect, or Asus Xtion work? Take a look at the front:

The elements of a depth sensor
The elements of a depth sensor

Some depth sensors have an RGB (Red Green Blue) camera, some don't. But that's fairly unremarkable and let's ignore it for now.

There are two other elements which must always be present for depth sense: An IR (Infra-Red) projector, and an IR camera.

The IR projector projects a pattern of IR light which falls on objects around it like a sea of dots. We can't see the dots because the light is projected in the Infra-Red color range:

Depth sensors work by projecting a pattern of dots in infra-red
Depth sensors work by projecting a pattern of dots in infra-red
Credit (and more info): Matthew Fisher

But the IR camera can see the dots. An IR camera is essentially the same as a regular RGB camera except that the images it captures are in the Infra-Red color range. So nothing too fancy going on there, still no actual depth sense.

The camera sends it's video feed of this distorted dot pattern into the depth sensor's processor, and the processor works out depth from the displacement of the dots. On near objects the pattern is spread out, on far objects the pattern is dense.

The sensor internally builds a depth map
The sensor internally builds a depth map

This 'worked out' depth map can be read from the depth sensor into your computer, or you can just take the feed directly from the IR camera, it's up to you. When calibrating the RGBDToolkit, during the correspondence calibration phase we must take a feed from both the depth map and the IR camera feed.