Not necessarily. This is a pretty tough problem. Naively yes you can just compute x-plane distance but what if the target is moving diagonally through the visual plane? Then it is very distorted. Or what if the object is rotating, moving through at least one plane where the two cameras can't track?
Direct Linear Transformation is the standard way to do this using a control object (object of precisely known size and measurements) and cameras that don't move. Cameras film the object (usually a big cube), the object is removed, the cameras stay stationary, and anything that goes through that object's field of view can be tracked in all three dimensions. However, the cameras must not be co-linear and must be able to see a point on the object with both cameras, so this means no two front facing cameras, and realistically 4-5 cameras (though 3 can be done sometimes).
Direct Linear Transformation is the standard way to do this using a control object (object of precisely known size and measurements) and cameras that don't move. Cameras film the object (usually a big cube), the object is removed, the cameras stay stationary, and anything that goes through that object's field of view can be tracked in all three dimensions. However, the cameras must not be co-linear and must be able to see a point on the object with both cameras, so this means no two front facing cameras, and realistically 4-5 cameras (though 3 can be done sometimes).