harharveryfunny t1_j9aydo9 wrote on February 20, 2023 at 4:29 PM

Here's the key, thanks to CHatGPT:

Data preparation: First, the training data is preprocessed to convert the 2D images and camera poses into a set of 3D points and corresponding colors. Each 2D image is projected onto a 3D point cloud using the corresponding camera pose, resulting in a set of 3D points with associated colors.

harharveryfunny t1_j9b30et wrote on February 20, 2023 at 5:00 PM

Not sure why this got downvoted given that it's correct. ChatGPT is also well capable of explaining how this mapping is learnt (using a view-consistency loss mapping from the 3D voxels back to a 2D view and comparing to image).

tdgros t1_j9b43pe wrote on February 20, 2023 at 5:07 PM

it's downvoted because it doesn't add anything to the conversation, OP has already stated that they know what info is input, they just don't know where to get it from. Someone already answered correctly at the top.

harharveryfunny t1_j9bf30y wrote on February 20, 2023 at 6:18 PM

OP's question seems to be how to get from 2D images to the 3D voxels, no? But anyways if they've got their answer that's good.

Edit: I guess they were talking about camera position for the photos, not mapping to 3D.

tdgros t1_j9bfds3 wrote on February 20, 2023 at 6:19 PM

Just read the post!

>However, the paper itself builds a network that gets as an input 5D vectors (3 location coordinates+2 camera angles) and outputs color and volume density for each such coordinate. I don't understand where do I get those 5D coordinates from? My training data surely doesn't have those - I only have a collection of images.