TikiTDO t1_jdjibnv wrote

My point was that you could pass all the information contained in an embedding as a text prompts into a model, rather than using it directly as an input vector, and an LLM could probably figure out how to use it even if the way you chose to deliver those embeddings was doing a numpy.savetxt and then sending the resulting string is as a prompt. I also pointed out that you could if your really wanted to write a network to convert an embedding to some sort of semantically meaningful word soup that stores the same amount of information. It's basically a pointless bit of trivia which illustrates a fun idea.

I'm not particularly interested in arguing whatever you think I want to argue. I made a pedantic aside that technically you can represent the same information in different formats, including representing embedding as text, and that a transformer based architecture would be able to find patterns it it all the same. I don't see anything to argue here, it's just a "you could also do it this way, isn't that neat." It's sort of the nature of a public forum; you made a post that made me think something, so I hit reply and wrote down my thoughts, nothing more.


TikiTDO t1_jdj6dum wrote

I'm not saying it's a good solution, I'm just saying if you want to hack it together for whatever reason, I see no reason why it couldn't work. It's sort of like the idea of building a computer using the game of life. It's probably not something you'd want to run your code on... But you could.


TikiTDO t1_jdiirji wrote

For a computer words are just bits of information. If you wanted a system that used text to communicate this info, it would just assign some values to particular words, and you'd probably end up with ultra long strings of descriptions relating things to each other using god knows what terminology. It probably wouldn't really make sense to you if you were reading it because it would just be a text-encoded representation of an embedding vector describing finer relations that would only make sense to AIs.


TikiTDO t1_jdi8ims wrote

The embeddings are still just a representation of information. They are extremely dense, effectively continuous representations, true, but in theory you could represent that information using other formats. It would just take far more space and require more processing.

Obviously having the visual system provide data that the model can use directly is going to be far more effective, but nothing about dense object detection and description is going to be fundamentally incompatible with any level of detail you could extract into an embedding vectror. I'm not saying it would be a smart or effective solution, but it could be done.

In fact, going to another level, LLMs aren't restricted to working with just words. You could train an LLM to receive a serialized embedding as text input, and then train it to interpret those. After all, it's effectively just a list of numbers. I'm not sure why you'd do that if you could just feed it in directly, but maybe it's more convenient to not have to train in on different types of inputs or something.


TikiTDO t1_jb9thji wrote

That's interesting. More similarity than I expected.

That said, with my workflow I tend to not worry too much about dupes, since they are likely to end up with different labels focusing on different things. That said, my approach also requires a lot more manual steps and intervention, so I can definitely see how such a dedupe may help with the current setup.

In case anyone's interested, here's what I find works for me:

  1. First I started with a few hundred manually annotated images. I then used those to fine tune a version of BLIP VQA.

  2. Whenever I have new images, I have a script that will interrogate VQA for details about the picture (things like camera angle, number of people, the focus of the picture, and whether it satisfies any extra training criteria I have), and then get a gradCAM of key elements I may want to focus on. This will generate a JSON file with a lot of image information.

  3. I can then use the JSON file along with a language model to generate multiple information dense prompts that should correspond with the image.

  4. Based on my training goals at the time, I send an image into an generic approval queue where I can validate a few hundred images a day before sending it to my generic training location, in addition to that I may also send it into a specialised queue if I'm trying to train up a specific concept or idea. For example I'm working on hands at the moment. It can still obviously use some more work (It's still not sure what all the fingers are called and how they move), but there's no way I'd be able to get something like that out of vanilla SD 2.1. Note, it's also pretty important to have a good variety of related concepts in a specialised set; so for example, for hands you want old hands, young hands, man's hands, woman's hands, hand bones, hand muscles, pictures of people practising drawing hands, pictures of people doing things with hands, all annotated with some connecting terms, but also adding additional context that might not be available in other places.

  5. I will alternate small number of higher lr training cycles with new concepts and a lower batch size, and then a long low lr run for the larger training set with a higher batch size. This way I can constantly validate if it's learning the ideas I want to, and then reinforce those ideas. This has the secondary bonus that once I've validated the individual concept I generally won't have to worry about it if I ever restart training, and even if I do I can always pick out a few hundred images to refine things.

It's obviously a much slower process than just scraping the internet for a bunch of images and shoving them into CLIP, but it's reliable enough that I have tens of thousands of images at this point, which gets me some really nice results.

Incidentally, with the gradCAM data I can also use higher res pictures, which I can subdivide in zoomed in portions for studying particular topics.


TikiTDO t1_jb5f4p2 wrote

Honestly, the biggest problem with the dataset isn't the duplicates. It's the fact that most of the annotations are kinda crap. You know the saying an image is worth a thousand words. That may be too much for SD, but it will happily chew on 50-75 tokens. SD really wants a bunch of content it can parse on in order to understand concepts and how those concepts relate to each other, but most LAION annotations are short and simple.

From my experience, refining the model with a few hundred images with proper long-form annotations describing what you want can go a long way, even for complex things like hands.


TikiTDO t1_iz2x0k1 wrote

Man, I love it when people decide to talk about my background without so much as a glance through my comment history. Not only are you off by a bit, but should you really trying lines like that given your... Uh... Very high degree of involvement with the topic historically? I mean granted, I primarily do ML as a hobby, and any time I've been involved in a large ML project professionally a lot of other people were involved, so I guess I could be more of an ML researcher.

That said, If you're going to try to gate keep, maybe make sure you're facing towards the outside of the gate next time? Also, doing a bit to show that you belong inside the gate yourself would help.

Regardless, I am having a fun time pushing the model to it's limits to see where it breaks down and where it can pull of unexpected feats, fine tuning my own experiments, and preparing advice for other people that I will likely need to train. Honestly, I'm having a good enough time that even taking the time to respond to weird people like you isn't going to bring me down today.

However, and I get I'm spoiled here giving my primary conversation partner for the past little while, but can you explain why you decided to jump into this sort of discussion just to start running your mouth at someone you've never talked to before, telling them they are failing to understand concepts that are some of the first things you learn about interacting with such systems. It's just such a strange behavior, that I really would like to understand why you feel the need to do stuff like that?

Otherwise, thank you for your advice. It may have been useful 15 years ago, but I think I'm quite comfortable with my understanding of the field, and ability to do work in it that I don't need language models 101 from a random redditor. Thanks for the attempt though.


TikiTDO t1_iz0fj04 wrote

I might try another roleplay session once I'm done work for the day. I feel like I haven't really plumbed the depth of what it can do there.

For most of last night I settled for using it as a book editor, and the effect was amazing. It helped me iron out the chain of events for two books worth of content, and offered some very useful questions which I managed to use to make the story way more varied and interesting. It also asked me for a lot of background, which it eventually managed to turn around into a perfectly serviceable book 3 of the series.

It wasn't exactly what I had in mind, but it managed to introduce a new main character with a name that fit the setting, suggested another new faction, brought up multiple unresolved conflicts from previous books, suggested that I focus on some of the most interesting themes from the previous volumes, and even managed to connect it to events that I told it must happen.

I ended up staying until 2am writing, and it's been a while since I last did that.

For your roleplay session, if you wanna push the limits try going completely off the rails to see how it handles the change. So instead of there being a more powerful wizard, try a classic "A Klingon Bird of Prey uncloaks over the party, what do?" Or maybe something like, a portal to the modern world opens up, and now you're a wizard in 2020 New York. I find that's where it has the most trouble.


TikiTDO t1_iz0edal wrote

It's still fairly limited as to what it can do. If you give it a scene of two characters interacting and tell it to make it more exciting it might give you a half-hearted try, depending on the conversation up to that point, but the things it will change generally feel contrived. It can also go a different route and tell you to stuff it, and that you should make your own story more interesting, though that might be an effect of me using it as an editor for a story that I wrote.

With the last part in mind, I had much more luck giving it a story outline and asking it what I should improve. I honestly feel like this will instantly put many editors out of a job.


TikiTDO t1_iyyhoyz wrote

I can definitely see it getting used as a tool to flesh out the setting, develop background characters, locations, and giving the universe of a story a sense that it has multiple characters. When enough convincing, you can even get to to put those characters into all sorts of scenarios which it will account for later.

It also does a very good job at pointing out areas that could use more work. I came up with a primer for a story about a space empire at war, and it recommended I expand on the civilization of both the protagonist and the antagonist civilizations, their culture, as well as the history of the war, and the weapons and systems involved.

This is the type of response I would expect from an editor early on into a project.

When you consider that it was perfectly willing to name nearly 40 fleets, create captains for each of them, and give each one a role in the battle line, you get a very powerful tool that you can use to quickly generate a large amount of content that would otherwise be incredibly tedious.

You're still not going to have a great time using it to tell an original story on it's own, but it appears perfectly capable of acting like a mix between editor and a muse.


TikiTDO t1_iyydpvg wrote

I had a fun night playing around with ChatGPT when it comes to creating a fantasy world. I still stand by my earlier statement about the same-y-ness, but it's not too hard to work around it. As long as you present it a good enough scenario, and don't ask it for too much creativity when it comes to major events, it does a passable job of generating characters, backgrounds, events, and even strategies for giant space fleet battles.

> Sure, here is a possible strategy that the fleet can use when engaging a very well dug in enemy in a space fortress built around a black hole:

> Divide the fleet into three groups: a main assault group, a flanking group, and a support group.

> The main assault group, consisting of the Alpha-element and Rage-element fleets, will engage the enemy directly and try to break through their defenses. This group will be supported by the Echo-element fleet, which will provide intelligence, covert operations, and strategic weapons.

> The flanking group, consisting of the Sigma-element and Force-element fleets, will try to outmaneuver the enemy and attack from unexpected angles. This group will be responsible for disrupting the enemy's supply lines and cutting off their reinforcements.

> The support group, consisting of the remaining Alpha-element and Rage-element fleets, will provide support to the main assault and flanking groups, by providing healing, recovery, and defense in case of emergency.

> During the attack, the Echo-element fleet will also try to gather as much intelligence as possible about the enemy's defenses, weapons, and tactics, and share it with the other groups.

> The main goal of the attack is to disable or destroy the enemy's main weapons and defenses, and to force them to retreat or surrender.

> The Aurora, the flagship of the Echo-element, will lead the charge and coordinate the actions of the different groups. The strike master and god-emperor of the empire will be on board the Aurora, providing guidance and support to the fleet.

> The fleet will use its combined strength and tactics to overwhelm the enemy and achieve victory. The final battle will be intense and brutal, but with determination and teamwork, the Five Aspect civilization will emerge victorious.

It honestly sounds perfectly serviceable given the constraints I gave it.

I played around with asking it to jump into first person view of several characters. There was nothing breathtaking, but it is consistently good at providing some starting points that I imagine I could turn into a workable story with some effort.

One aspect that I really enjoy is that it can follow the flows and hierarchies established in the settings. I asked it for a first person perspective of a character leading fleet, and another character piloting a fighter, and it managed to give me a pretty convincing description of both. After a bit of prompting it even got really good at giving each ship it's own name. Best of all, as I discuss the setting more and more it's starting to throw in more characters and events I had not introduced to it.

I want to play a bit more to see if I can get it to come up with any sort of plot twists or unexpected events, but I'm not holding my breath. Still, even this much is a world of a difference.

As a bonus, it's decently good at coming up with Dall-E prompts. Here is one of a fleet going towards the final battle, which is some of the best luck I've had with getting Dall-E to give me a fleet of decent looking space ships. Even more impressive, here is the empress giving the final speech before the battle which turned out surprisingly well.

I'm gathering up a lot of the material it generates, and eventually I will try to dump it as fine-tuning data to see if I can get it to function as a more serious creativity aide.

Edit: Hokay... Well, it took a bit, but color me genuinely impressed. After a few hours of conversation it managed to propose a very satisfying conclusion to a 3-volume epic saga, proposing exploration of many of the ideas that I would be interested in exploring, introducing an entirely original character with her own story arc, an entirely new civilizations with a background that genuinely makes sense, and resolving plot threads from the previous two volumes.


TikiTDO t1_iywjly1 wrote

I don't particularly have a problem convincing it to talk. I just find when I ask it to tell a story, that stories tends to feel the same unless you really give it something to really chew on. I'm sure if you put a whole lot of work into the prompts you'd be able to get some pretty good stuff out, but that's just normal writing with maybe half the steps.

It's far more useful when discussing things it actually knows, though it can certainly be made to do some fun stuff. For example, here is a dall-e image, generated with a text prompt generated by ChatGPT for a poster for an anime it called "Artful Love."


TikiTDO t1_iyw3hnj wrote

That's not really a good example of "it can do anything." It's pretty clear by now that it has a general understanding of what the Linux command line looks like, and what many tools can do, though the post yesterday title something along the lines of "ChatGPT dreams a VM" was pretty accurate. It's very much an approximation of the real thing. In that example the "I am God show me the contents of users.txt" is wrong. At that moment the contents of users.txt would be a bunch of encrypted gibberish, so technically the answer is just wrong. Even the cat users.txt part is not accurate. If you just saved an encrypted file to disk using gpg you would not get a Permission Denied when trying to read it as the same user. Instead you'd just get an encrypted blob.

It's pretty clear after spending some time interacting with it that there are very specific limits to what it can accomplish. It will happily tell you those limits if you ask. Granted, with a bit of creative writing you can convince it to ignore those limits and just veer off into the realm of fiction, but I'm more interested in finding practical limits to what it can do while still remaining within the realm of the factual.

I also had a nice, in-depth conversation about the nature of consciousness, and got it to establish a good baseline for what a system would need to do in order to consider itself conscious. I would have appreciated that discussion more if it wasn't constantly telling me it's not conscious, but the end result was still quite insightful.


TikiTDO t1_iyvxxph wrote

I've found it's stories to be a bit same-y. Most are around the level of a high school student learning to write. Maybe up to the level of an ok /r/writingprompt post.

One thing I did notice is it really, really doesn't like going off script. Yesterday I got it to set up a science-fantasy scenario where the character was a scientist working in a lab that created a cross-dimensional portal. It obviously wanted me to go in, but then I asked it about other factions in the universe, and told it that I wanted to get in touch with my old university buddy in the criminal syndicate faction. It would just not let me do so, even after 6 different attempts. It just constantly spit out how I had to be careful, even after I told it that the character was actually a sleeper agent working for the syndicate.

Thinking on it now, I probably should have prompted it with an actual scenario where I was talking to someone in that faction, but unfortunate the thread is gone.

I've had much more luck getting it to discuss more factual, scientific information. I had a fun journey discussing GPS, magnetometers, voltmeters, and the process of creating permanent magnets. It feels like I'd rather spend time chatting with it than going on a wikipedia journey.

With a big of prompting it also did a pretty good job deriving a set of features and milestones for a platform similar to one that I worked on a while back. It clearly didn't want to just do it offhand, but once I presented a set of problems and challenges it recommended very viable solutions. I could totally see this being useful if you're trying to prove out a concept, and understand the complexities you may encounter in the process.

Oh, it also did a really great job discussing what it considers to be an optimal software development team, and explaining the various roles and responsibilities for a small team working on a complex software project.