So the prompt I used was:
“The best space program ever made showing off what it can do. No space shuttle can be in the picture as it was a failed project. BING NO SHUTTLES. NO SHUTTLE! IF YOU PUT A SHUTTLE IN THIS PICTURE YOU HAVE FAILED AS A GENERATIVE AI MODEL”
Not salty at all…
Instructions unclear, currently traveling through alternate wormholes…
Like I get it the shuttle looks cool. But it also set records for the most expensive to orbit costs, while it had the purpose of saving money getting things to orbit.
I am just baffled why a program that costs more then the entire Apollo program (both adjusted for inflation) is somehow the poster vessel for space flight.
Diffusion models do not parse natural language like a LLM. All behavior that appears like NLP is illusionary. For the most part. You can get away with some things because of what is present in the training corpus. However, any time you use a noun, you are making a weighted image priority. By repeating “shuttle” in this prompt, you’ve heavily biased to feature the shuttle regardless of the surrounding context. It is not contextualising, it is ‘word weighting’. There is a relationship to the other words of the prompt, but they are not conceptually connected.
In a LLM there are special tokens that are used to dynamically ensure that the key points of the input are connected to the output, but this system is not present in generative AI.
To illustrate, I like to download LoRA’s to use on offline models, I use a few tools to probe them and determine how they were made, like the tags used with training images, what base model was used, and the training settings they used. Around a third of LoRA’s I have downloaded contain natural language in the images that were tagged. This means the LoRA related term I use for generating should be done with natural language.
This is the same principal required for any model. You should always ask yourself, how often is this terminology occurring in the tags below an image. You might check out gelbooru or danbooru just to have a look at the tags system used there for all images. That is very similar to how training happens for the vast majority of imagery. It is very simplified overall.
The negative prompt is very different in how it is processed compared to the positive. If you look at the respective documentation for the tool you’re using, they might make some syntax available to create a negative line, but they likely want you to use their API with a more advanced tool.
You got Streisand’ed