Submitted by RepresentativeCod613 t3_zrfy75 in MachineLearning

It's only been a month since OpenAI released ChatGPT, and yesterday they launched Point-E, a new Dalle-like model that generates 3D Point Clouds from Complex Prompts. As someone who is always interested in the latest advancements in machine learning, I was really excited to dig into this paper and see what it had to offer.

One of the key features of Point-E is its use of diffusion models to generate synthetic views and 3D point clouds. These models use text input to generate an image, which is then used as a reference for generating the 3D point cloud. This process takes only 1-2 minutes on a single GPU, making it much faster than previous state-of-the-art methods.

While the quality of the samples produced by Point-E may be lower than those produced by other methods, the speed of generation makes it a practical option for certain use cases.

If you're interested in learning more about this new model and how it was developed, I highly recommend giving the full paper a read. But if you're more into reading the gist of it, I added a link to an overview blog I published about.

The blog: https://dagshub.com/blog/point-e/

The paper: https://arxiv.org/abs/2212.08751

I'm sure I have yet to reach all the insights while writing the blog, and I'd love to get your thoughts about the model and how OpenAI developed it.

109

Comments

You must log in or register to comment.

master3243 t1_j152mmj wrote

The abstract puts this project into perspective, their methods are much faster but still doesn't beat the state of the art.

> While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https: //github.com/openai/point-e

11

RepresentativeCod613 OP t1_j17t7g0 wrote

Tho' for 3D rendering, running time is a major consideration, and they've managed to reduce it by almost 10X.

2

busbysbsbsusbsbsusbs t1_j16zjan wrote

Why do they choose to represent this in point clouds, rather than a mesh or voxels? it seems like that would require more points/computation for less aesthetic quality

1

prato_s t1_j17o53b wrote

I tried it on my own img2img generative art. Meh results, mostly due to the cap on number of points (capped at 4096 points). Unless one tinkers around that limit, hard to get a decent point cloud (and a resultant mesh from that).

1

ninjasaid13 t1_j17pb7z wrote

>It's only been a month since OpenAI released ChatGPT

more like 3 weeks.

But I'll wait until Point-E 2 until i start playing with it.

1

MangoMo3 t1_j151jmx wrote

Dude I was saying to my friends that they should make a dalle for 3d modeling! It will be awesome for 3d printing custom minis!

−3