Code: https://editor.p5js.org/Janeliu13/sketches/s0pXjNGKr
Demo:
copy_6AC873E3-3A71-4068-8326-7ABD68211D0B.MOV
https://editor.p5js.org/Janeliu13/full/s0pXjNGKr
For this week’s assignment, I wanted to build an interactive doodle story generator inspired by one of the examples shown in class. It was the example where the speech would turn into doodles on the screen and create a visual story. In my sketch, users can draw in four panels (Panel 1: An animal) (Panel 2: A Setting) (Panel 3: Food) (Panel 4: An Object). The doodles would be classified by the pre-trained CNN (DoodleNet), and it would generate a short story using the predicted words.
An issue I came across was that DoodleNet was classifying all four panels simultaneously in the beginning. The classifier would keep running even after I switch a panel to draw on. To fix this, I tried to make it so that the classification would stop before starting a new one. Another small issue was thinking of a way to show the story after the doodles are classified. At first the approach was to simply dump the text at the bottom of the canvas. But I felt that the text had no visual hierarchy to show the part of the story that the user contributed to (the specific animal, setting, food, object). So I created a function to apply a slightly darker colors and bold lettering to only the classified words. This was actually more complex than I anticipated, and I debugged with Claude. I had to manually track the current X and Y position for each word, calculate text widths, and handle line breaks.
Overall, working with DoodleNet was a lot of fun. I was able to see how the model can recognize rough drawings in real time and it really shows the power of convolutional layers to extract important features from pixel data. For example, drawing when a drew the candle sticks and fire dots on a cake shape, the model immediately knew to classify it as a birthday cake over a regular cake. Watching the confidence score fluctuate was also interesting because sometimes just adding the smallest detail such as dots on a strawberry or whiskers on a cat will make the confidence score higher. But something I’ve noticed is that there are a few specific words that it keeps guessing no matter what I draw such as “Cactus” “Beach” “Animal Migration” “Rain”. Sometimes, as I made my doodles more elaborate, the classifier’s predictions actually became less accurate. Maybe it is because too many overlapping features confuses the network.
Resource: Doodle Classification with DoodleNet (Built on top of this example)
Here are lists of some of the possible things to draw for the purpose of my doodle story game
Animals: ant, bear, bee, bird, butterfly, camel, cat, cow, crab, crocodile, dog, dolphin, dragon, duck, elephant, flamingo, fish, frog, giraffe, hedgehog, horse, kangaroo, lion, lobster, monkey, mosquito, mouse, octopus, owl, panda, parrot, penguin, pig, rabbit, raccoon, rhinoceros, scorpion, sea turtle, shark, sheep, snail, snake, spider, squirrel, swan, tiger, whale, zebra
Settings: beach, bridge, campsite, castle, church, fence, garden, hospital, house, lighthouse, mountain, ocean, park, pond, river, sea, lake (via ocean), barn, tent, tree (like a forest), bush, grass, cloud (sky), sun (outdoors)
Food: apple, banana, bread, broccoli, cake, carrot, cookie, donut, grapes, hamburger, hot dog, ice cream, lollipop, mushroom, onion, peanut, pear, pizza, potato, sandwich, steak, strawberry, watermelon, birthday cake
Objects: ball, basketball, baseball, soccer ball, beach ball, skateboard, bicycle, book, boomerang, bucket, calendar, camera, candle, car, clock, compass, computer, cup, drums, eyeglasses, fan, flashlight, guitar, hammer, hat, key, laptop, ladder, microphone, paintbrush, parachute, pencil, pillow, roller skates, scissors, shoe, shovel, telephone, television, toothbrush, trumpet, umbrella, violin, wheel, yoga