CHeck out my painting recommender project here

Building a full stack painting recommender app turned out to be more challenging and rewarding than I initially expected. Here's how I improved my web app.

Web app Idea

TLDR: Tinder for paintings

Walking into a local museum to look at paintings might be fun, but you either get tired and want to sit down after a few hours and are constrained to visiting local art museums. So I decided to build a web app that randomly displays one painting at a time from a dataset of ~37,000 paintings. The user can either like the painting or view the next painting. Depending on user liked paintings, they may get shown similar paintings to the one they've liked.

The Initial Design

My first version of the paintings recommender was quite basic:

Used one central MongoDB database to store liked paintings
Limited dataset of ~4000 paintings (some quality lacking) from 128 artists
Fetching images too slow (100ms - 3s)
Intended for local deployment, did not have user sessions to differentiate user activity

Fetching paintings

It turns out many of these art APIs fetch data relatively slow for the requirements of this app. Although it is possible to preload a few images in the background, the recommendation piece would be difficult to compute if a user clicks "Like" and immediately clicks "Next". Therefore storing 37,000 paintings in an S3 bucket with transfer acceleration https://aws.amazon.com/s3/transfer-acceleration/ fixed this latency problem.

Repurposing Usage of MongoDB

The initial design only saved the liked paintings to the database. Assigning user sessions to then add to liked paintings presents a scalability issue. Storing all liked paintings in one database, and differentiating for each user can be slow to retrieve necessary rows given a large number of entries.

I decided to repurpose the MongoDB into a painting store. Each entry containing some metadata about the painting, and the image to the painting referenced to S3 entry.

Adding Redis Cache

To store every user's liked paintings, The app now creates a unique user id stores in localstorage, and creates a redis cache key of that user id, mapping to a set of values. The set of values contain the ids of all liked painting by that user. Of course, the id is the primary key of the database. The TTL for each redis entry is set to 1 week.

Creating the recommendation system

As a user likes paintings, the next proceeding ones will either be a random painting, or similar to their liked ones. This will be done by querying a vector database. I opted for ChromaDB because of its very low cost, and gets the job done. Since there were some metadata of each painting, all painting metadata were embedded to 1536 dimensions using text-embedding-3-small from OpenAI. Then used cosine similarity to compute similar vectors for the recommendation.

Adding New Funcitonality

With the current craze about GenAI, I also had try to add it to this project.

Given that a user can browse and get recommended paintings, could one possibly generate a mashup painting of their liked paintings? Perhaps view a painting they like the most, but hasn't been painted yet?

Gathering metadata between different paintings, and with some prompt refining, Google's Imagen model was used to generate paintings.

Deployment

To bring the application to life, I chose a combination of AWS services for a scalable and cost-effective setup.

The core backend, which includes the Node.js API server and the Python-based ChromaDB recommendation service, is hosted on an AWS EC2 instance. This provides the persistent compute power needed for the main application logic.

For the AI image generation feature, I opted for a serverless approach. The logic that calls Google's Imagen API is encapsulated in an AWS Lambda function. This function is triggered via an AWS API Gateway endpoint, meaning it only runs when a user requests to generate a new painting, making it highly efficient.

System Diagram of application — System Diagram of Application