DeepTAM: Deep Tracking and Mapping

angry_octet · on Aug 8, 2018

I started reading this thinking "oh not another deep learning / SLAM paper with no code or data". Ironically they reference and build on the concepts from DTAM:

https://github.com/magican/OpenDTAM.git

If you don't publish code your paper will have less impact. We want to test your algorithm on different data, not painstakingly reimplement it from incomplete descriptions.

There are literally thousands of papers in the field every year, you want to be accessible not baroque.

phiresky · on Aug 8, 2018

The same authors did publish code for another recent paper [1]. So it doesn't look like they are adverse to publishing their code in general.

[1]: https://lmb.informatik.uni-freiburg.de/people/ummenhof/depth...

angry_octet · on Aug 8, 2018

That is positive. It just needs to become the norm.

angry_octet · on Aug 8, 2018

The conference where it is one of many... https://eccv2018.org/program/sessions/

blt · on Aug 8, 2018

SLAM is one of the few computer vision problems not conquered by deep learning yet... guess that's changing.

namibj · on Aug 8, 2018

How about reconstruction a high-resolution mesh from pictures/video? The best I know yet is to use patch-based reconstruction of a depth map and feed them to floating-scale surface reconstruction, or a similar patch-size-aware poisson-style mesh generator. Is there any code handling the depth-map reconstruction using deep learning which you know of? Feeding it precise camera parameters/un-distorted views as well as precise locations of these views with some known-matching points (artifacts of previous processing steps) is not a problem. Even pre-selecting only somewhat well-matching views is not a problem, as that is likely better done out-of-core anyway, due to the size of datasets where nice things become possible (e.g., capturing a small part (100 * 100 m, 5 stories, or equivalent surface area) of a neighborhood in sufficient precision to max out the resolution of likely all VR headsets you can get for this year's Christmas).

The current outlook from what I found to reconstruct such depth maps is bleak as far as speed goes, with the alternative being a drop in density/resolution too low to be useful (capturing is cheap but not free). If there is some magic based on deep learning I'd like to forgo having to split part of these algorithms to e.g. an FPGA or so (the parts that sort and arrange the patches that should be hit with brute-force number crunching), as it seems from what I understand about them that it's near-impossible to decide these arrangements efficiently on a CPU or even GPU, seeing how sparse the math and how dense/wide the branching is. (I'm considering feeding lists of to-be-compared patches to a GPU and the results back, the technique is somewhat similar to Dijkstra's algorithm for deciding in what order to compare and what starting values to use in the iterative optimization of depth and surface normals, and that branching currently takes 80% of CPU time without the actual number crunching even using vectorization, combined with GPU speed I expect two magnitudes of improvement as a minimum, and hope for closer to three)

blt · on Aug 9, 2018

there are lots of papers that attempt to reconstruct depth maps from monocular RGB images. It's not really my field but here is google scholar search for recent papers that cite a seminal older paper on the topic: https://scholar.google.com/scholar?as_ylo=2017&hl=en&as_sdt=... that should be a decent start.

If you know the depth of a few pixels in the image, e.g. from a sparse keypoint-based SLAM / visual-inertial odometry system, then you can do better: https://arxiv.org/pdf/1709.07492.pdf

If you already have accurate camera positions, you can use something like occupancy grid mapping or poisson reconstruction to build the mesh.