How does a Kinect, or Asus Xtion work? Take a look at the front:
Some depth sensors have an RGB (Red Green Blue) camera, some don't. But that's fairly unremarkable and let's ignore it for now.
There are two other elements which must always be present for depth sense: An IR (Infra-Red) projector, and an IR camera.
The IR projector projects a pattern of IR light which falls on objects around it like a sea of dots. We can't see the dots because the light is projected in the Infra-Red color range:
But the IR camera can see the dots. An IR camera is essentially the same as a regular RGB camera except that the images it captures are in the Infra-Red color range. So nothing too fancy going on there, still no actual depth sense.
The camera sends it's video feed of this distorted dot pattern into the depth sensor's processor, and the processor works out depth from the displacement of the dots. On near objects the pattern is spread out, on far objects the pattern is dense.
This 'worked out' depth map can be read from the depth sensor into your computer, or you can just take the feed directly from the IR camera, it's up to you. When calibrating the RGBDToolkit, during the correspondence calibration phase we must take a feed from both the depth map and the IR camera feed.