thatpizzatho t1_irdrzbk wrote on October 7, 2022 at 8:04 AM

This is amazing! But as a PhD student, I can't keep up anymore :/

sparkinflint t1_irepk4y wrote on October 7, 2022 at 2:32 PM

Imagine only having a bachelor's in traditional engineering 😅

Longjumping_Kale1 t1_irg17le wrote on October 7, 2022 at 8:34 PM

I hope traditional stands for software 😂

sparkinflint t1_irhpz2q wrote on October 8, 2022 at 6:41 AM

...industrial 🫠

[deleted] t1_iredowd wrote on October 7, 2022 at 12:53 PM

[removed]

hardmaru OP t1_ircu02e wrote on October 7, 2022 at 1:46 AM

The original paper from Google Brain was out less than a week ago, and discussed on r/MachineLearning:

DreamFusion: Text-to-3D using 2D Diffusion

https://old.reddit.com/r/MachineLearning/comments/xrny8s/r_dreamfusion_textto3d_using_2d_diffusion/

Within a week, someone made this working implementation in PyTorch, which uses Stable Diffusion in place of Imagen.

Saw more discussion about this project on Hacker News:

https://news.ycombinator.com/item?id=33109243

WashiBurr t1_ird47y0 wrote on October 7, 2022 at 3:13 AM

Wow that was fast. Can't wait to play with it.

GOGaway1 t1_irdgxts wrote on October 7, 2022 at 5:23 AM

Same

master3243 t1_irdi8o7 wrote on October 7, 2022 at 5:40 AM

In the paper, Appendix A.4 for deriving the loss and gradients,

I don't see how this is true (eq 14) https://i.imgur.com/ZuN2RC2.png

As the RHS seems to equal (2 * alpha_t) * LHS

I'm also unsure how in the same equation this happens https://i.imgur.com/DHixElF.png

dkangx t1_irdmnsp wrote on October 7, 2022 at 6:41 AM

Well, someone’s gonna fire it up and test it out and we will see if it’s real

master3243 t1_irdoq0o wrote on October 7, 2022 at 7:12 AM

Empirical results don't necessarily prove theoretical results, in fact most Deeplearning research (mine included) is trying out different stuff based on intuition and past experiences on what worked until you have something that achieves really good results,

Then you attempt to formally and theoretically show why the thing you did is justified mathematically.

And often enough, once you start going through the formal math you get ideas on how to further improve or different paths to take on your model, and thus it's a back and forth.

However, someone could just as easily get good results with a certain architecture/loss and then fail to justify it formally or skip certain steps or take an invalid jump from one step to another, which results in theoretical work that is wrong but works great empirically.

master3243 t1_irddrxw wrote on October 7, 2022 at 4:45 AM

Is this an implementation of the model architecture/training or does it have a final/checkpoint model that I can use for generation right now?

DigThatData t1_irds2na wrote on October 7, 2022 at 8:05 AM

it's a method that utilizes pre-trained models, you can use it right now

master3243 t1_irhlkct wrote on October 8, 2022 at 5:37 AM

Pretty cool, I just tested with the prompt

"a DSLR photo of a teddy bear riding a skateboard"

Here's the result:

https://media.giphy.com/media/eTQ5gDgbkD0UymIQD6/giphy.gif

Reading the paper and understanding the basics of how it worked, I would have guessed that it would have a tendency to create a Neural Radiance Field where the front of the object is duplicated over many different camera angles, since updating the NeRF from a different angle the diffusion model will output an image that closely matches an already created angle from before.

I think imagen can prevent this simply because of it's sheer power such that even if given a noisy image of the backside of a teddy bear it can figure out that it truly is the backside and not just the front again. Not sure if that made sense, I did a terrible job articulating the point.

[P] Stable-DreamFusion: A working implementation of text-to-3D DreamFusion, powered by Stable Diffusion

Comments