Had some recent inspiration that led to adding a new mode to the Gallery - "This from That" along with worming our house AI, Meg, into the Gallery. I will write more about Meg in another post. For the purposes of this one, just understand "she" is an AI persona based on a Large Language Model that runs on one of my servers.

This from That, like most Gallery modes, is built as a few arrays that are pulled from randomly each generation.

{% if is_state('input_boolean.gallery_mode_this_from_that', 'on') %}

Artistic photo showing a detailed view, 
{{ ( [ "bulldozer", "skyscraper", "truck", "pizza", "mouse", "tree", "forest", "planet", "table", "bear", "rescue", "toothbrush", "old chair", "modern art", "CRT television", "computer", "shrek", "pickle", "wedding cake", "three hams", "puzzle", "the family pet", "the family", "mom", "flower garden", "spaceship", "meteor", "meteorite", "meteor crater", "racetrack", "hurricane", "explosion", "biggest regrets", "greatest treasure in all the land", "one million dollars", "stapler", "the end of the world", "apocalypse", "coffee mug", "forest gump", "park bench", "small village", "humpty dumpty", "trash", "one man's trash", "another man's treasure", "the humpty dance", "classic art", "birthday cake", "dinner", "the world", "dining room set", "bedroom furniture", "dungeon", "the floor is", "mushroom", "statue", "defensive perimeter", "fortress", "waterfall", "river", "ocean", "lake", "flag", "pond", "crater", "mountain", "house", "castle", "final resting place", "beautiful woman", "sexy man", "victory", "nicolas cage", "olympic sport", "swimsuit", "tuxedo", "tamales", "sandwich", "scenic vista", "eye bleach" ] | random ) }}

{{kodi}}, made entirely from

{{ ( [ "butter", "cheese", "chocolate", "milk", "cottage cheese", "lizards", "pickles", "slime", "boobs", "hair", "yarn", "lapis lazuli", "lava", "magma", "granite", "marble", "secret sauce", "slurm", "muck", "butterscotch", "caramel", "evil", "diamond", "ice cream", "flesh", "water", "silicon", "silicone", "layers", "onions", "flour", "grits", "oatmeal", "fluffalo", "mayonaise", "sour cream", "cream cheese", "tomato sauce", "small pebbles", "deepest regrets", "cake", "oobleck", "ketchup", "mustard", "win", "devastation", "maple syrup", "strawberry syrup", "glue", "rat king", "clay", "wood", "ginger", "turmeric", "oil", "obsidian", "glass", "crystal", "wax", "pudding", "ducks", "numbers", "ceramic", "huts", "bones", "emerald", "tile mosiac", "money", "dough", "bread", "noodles", "mushrooms", "silly things", "yogurt", "concrete", "coal", "snow", "sherbert", "cottonballs","ice", "fire", "vapor", "smoke", "birds", "marbles", "curves", "smiles", "fruit rollups", "leaves", "birds", "owls", "toothpaste", "sheep","sand", "starstuff", "imagination", "fantasy", "sheer will", "tar", "sexiness", "heat", "cold", "alpaca", "feathers", "wool", "raisins","beer", "honey", "pineapples", "mystery meat", "nicolas cage", "shaving cream", "fabric", "felt", "llamas", "purple and green" ] | random ) }}

{% endif %}

This is the whole function as it stands right now.

So essentially, if the mode is on, it will add the first line, "Artistic photo(...)", then pick randomly from the first array, "Rescue", then add the middle line, "made entirely from" (the {{kodi}} there is a variable that is defined only if we are watching a show or movie at that moment. If we are, it will use the title of the show or movie in the prompt), and finally pick randomly form the second array, "llamas." This gives us the prompt, "Artistic photo showing a detailed view of, rescue, made entirely from llamas"

Bob's Prompt: "Artistic photo showing a detailed view of, rescue, made entirely from llamas"

Now this was a lot of fun, and we made some interesting images from this setup (or rather, bob did. I just sat on my ass and giggled). However, after a couple dozen generations, I wanted something a bit more.

Enter Meg -

Meg is great to have around when you need more words. LLMs have been described as "calculators for language" - I kinda like that analogy.

I wont get into the nitty gritty of how Meg works here, but here are some of the things I tried:

take the following description of art and restate it with more description words. add words describing the colors of things, the sizes of things, their positions relative to one another. add missing details that contextually fit within the scene. do not describe sequences or future events. do not remove or change any existing text, only add.
here is the text - {{prompt}}

This "worked" in that she produced output that Bob took and made a picture with, but the pictures seldom resembled what I was looking for with the Gallery, so I revised her instructions:

You are a visual interpreter. your job is to take a list of inferences and details, and use them to describe a single moment in time to those who cannot see it. Do not respond in prose. just list the details needed for someone to illustrate the scene. be creative and descriptive and include descriptions, colors, actions. Avoid non-visual details like motivations, sounds, or thoughts. don't respond as a numbered list. Here is your input - {{prompt}}

This worked okay as well, but again, changed the image too much. I also tried things like telling Meg to behave as a kindergarten student just learning to describe pictures, an art critic, and even an "ai art prompt engineer". None of those worked paticularily well.

Then I came up with this one:

You are a helpful professional writer ai whose job is to help expand vocabulary. Take the following art description, and produce a list of adjectives and nouns that could provide more detail to the description. Respond only with a list of details. Output no other text. Do not number or format your list. Do not output JSON. Just state words. Here is your description - {{prompt}}

This actually worked very well. So well that I added a call to Meg with this prompt for just about everything Bob does around the house. (Still working out a good way to do that with Fine Art, that will require a bit of a rewrite). Suddenly Bob was innundated with details for images that were more or less synonyms for what the prompt was already describing. The output looks a bit like this:

Input Prompt: Artistic photo showing a detailed view, ocean, made entirely from mushrooms

Meg's Output: Detailed, Artistic, Photo, Ocean, Mushroom, Realistic, Organic, Intricate, Scenic, Imaginary, Abstract, Creative, Surreal, Fantasy, Mysterious, Realistic, Imaginative, Vivid, Expressive, Detailed, Impressive, Coastal, Maritime, Creamy, Whimsical, Mushroomy, Intriguing, Striking, Unique, Inventive, Ethereal, Bold, Intricate, Dramatic, Vibrant, Elegant, Ambitious, Nautical, Surreal, Fanciful

In the code, I glue the two together. Meg's output is just tacked onto the input prompt, with some regular expression for formatting.

Which, while kind of all over the place, made for a nice image:

Bob's prompt: "Artistic photo showing a detailed view, ocean, made entirely from mushrooms, Detailed, Artistic, Photo, Ocean, Mushroom, Realistic, Organic, Intricate, Scenic, Imaginary, Abstract, Creative, Surreal, Fantasy, Mysterious, Realistic, Imaginative, Vivid, Expressive, Detailed, Impressive, Coastal, Maritime, Creamy, Whimsical, Mushroomy, Intriguing, Striking, Unique, Inventive, Ethereal, Bold, Intricate, Dramatic, Vibrant, Elegant, Ambitious, Nautical, Surreal, Fanciful"

I've been pretty happy with both additions so far. Here are few more fun images from this mode (we wont talk about the last one):