storybook

A program that uses generative models on a Raspberry Pi to create fantasy storybook pages on the Inky Impression e-ink display

Hardware

Raspberry Pi 5 8GB. Certainly possible with other hardware, but may be slower and require simpler models.
Inky Impression 5.7". Code can be modified to support other resolutions.
SD Card. 32GB is probably the minimum. Use a bigger one to support experimenting with multiple models and installing desktop components if desired.

Image the SD card with RPi OS, then boot and update the OS
Enable I2C and SPI interfaces: sudo raspi-config
Install Ollama
Pull and serve an Ollama model. I find that Mistral and Gemma models work well. ollama run gemma:7b
Build/install XNNPACK and Onnxstream
Download an SD model. I find that Stable Diffusion XL Turbo 1.0 works well.
Clone this repository. git clone https://github.com/tvldz/storybook.git
Create a Python virtual environment: cd storybook && mkdir .venv && python -m venv .venv
Activate the environment: source .venv/bin/activate
Install the Inky libraries. Follow these instructions for RPi 5 compatibility: pimoroni/inky#182
Install requests and pillow: pip install requests pillow
Modify the constants (paths) at the top of main.py to match your own environment.
execute main.py: python main.py. Execution takes ~5 minutes.

Currently, the program just renders a single page at a set interval. It would certainly possible to ask Ollama to generate multiple pages for a complete "story", and then generate illustrations for each page. The entire "story" could be saved locally and "flipped" through more rapidly than discrete page generation.
The output lacks some diversity, with many of the same characters and themes. This may be improved with a higher quality prompt, modifying the model temperature, or creating a prompt generator that randomly generates prompts from a set of themes, characters, creatures, artifacts, etc.
The current font doesn't look great on the display. Finding a better font, or perhaps rendering the page horizontally instead of rotating it might have a better result.
Fitting the text on the screen doesn't always work, since I'm requesting that the model limit itself and naively splitting the output programmatically.
This would be easily modifiable to create other things like sci-fi stories, weird New Yorker cartoons or off-brand Pokemon.
This may be thermally taxing on the RPi. Inferrence consumes all CPUs for many minutes, then sits idle for the set interval.
The code isn't very reslilient but seems to work reliably.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
.gitignore		.gitignore
CormorantGaramond-Regular.ttf		CormorantGaramond-Regular.ttf
LICENSE		LICENSE
README.md		README.md
main.py		main.py