Bellow is what I know about Stable diffusion

The base model of the Stable Diffusion is orignially trained for removing noises in images.
With a given training image, a series of images are created by repeatedly adding noise to the previous image.
The model is trained to revert this process, removing noises repeatedly to create the original image.

Can't this training method be used for training a reverse engineering model?

A model that can create C, C++, or some language code from a binary code?

Make compiler to output not only the binary code but also every code that occurs in the middle steps; hence, make a series of code that begins from the original source code and ends to the binary code(or just assembly code.).
Train a model to revert each code to its previous code in the series.
A model that can retrieve a source code from a binary code is created.
Maybe, it can be trained and updated further, to accept text instruction, like Stable Diffusion. Modifying the source as instructed in the text.

Is this not plausible?

Or are there already some researches on this idea?

Comments

You must log in or register to comment.

howtorewriteaname t1_jdmy2ik wrote on March 25, 2023 at 4:27 PM

#2,355,014

This could definitely work, given that you have the right data and in great amounts. I believe that is the biggest challenge for this kind of model, more than the learning method.

Dylanica t1_jdo5jnf wrote on March 25, 2023 at 9:40 PM

#2,359,673

A sequence based model like a transformer (what GPT is based on) would probably work better for this particular task. In fact. GPT-3/4 would probably be pretty darn good at this task right out of the box.

elbiot t1_jdo7ndu wrote on March 25, 2023 at 9:56 PM

#2,359,874

Compilation isn't a process of noising and diffusion doesn't have any relevance here. An LLM is what you would use

OraOraP OP t1_jdpahia wrote on March 26, 2023 at 2:56 AM

#2,363,943

Replying to elbiot (#2,359,874)

I didn't mean to use the model used in stable diffusion process for reverse engineering.

I was just thking this step-by step reverting training process could be used in some model for reverse engineering.

OraOraP OP t1_jdpc3du wrote on March 26, 2023 at 3:10 AM

#2,364,086

Replying to howtorewriteaname (#2,355,014)

Just crawling open source codes and compiling the code with the special compiler would produce a massive amount of training data. If the special compiler I mentioned in the post is easy to make.

elbiot t1_jdpgqoz wrote on March 26, 2023 at 3:53 AM

#2,364,623

Replying to OraOraP (#2,363,943)

I'm just talking about diffusion models in general and the concept of denoising. LLMs are what you would use, not the way you'd train a diffusion model but the way you'd train an LLM

mikonvergence t1_jduf732 wrote on March 27, 2023 at 7:41 AM

#2,386,548

You are definitely stepping outside of the domain of what is understood as denoising diffusion because it seems that your data dimensionality (shape) needs to change during the forward process.

The current definition of diffusion models is that they compute the likelihood gradient of your data (equivalent to predicting standard noise in the sample), and then take a step in that constant data space. So all networks have the same output shape as input.

Perhaps you can use transformers to handle evolving data lengths but as far as I can tell l, you’re entering uncharted territory of research.

I can recommend this open-source course I made for understanding the details of denoising diffusion for images https://github.com/mikonvergence/DiffusionFastForward

OraOraP OP t1_jdufnll wrote on March 27, 2023 at 7:48 AM

#2,386,602

Replying to mikonvergence (#2,386,548)

I didn't mean to use denoising process directly to reverse engineering. I was just thinking the idea of `step-by-step reverting` could be used in some ML model for reverse engineering.

Though you have a point. Unlike denoising process, reverse engieering would require change of dimensions in the middle steps, making it more difficult than denoising.

mikonvergence t1_jdufvlv wrote on March 27, 2023 at 7:52 AM

#2,386,623

Replying to OraOraP (#2,386,602)

Right, I am the denoising diffusion as a term for a wide range of methods based on reversing some forward process. Some interesting works (such as cold diffusion) have been done on using other types of degradation apart from a Gaussian additive noise.

And yeah, the change of both content and dimensionality requires you to put together some very novel and not obvious techniques.