I was inspired by this comment https://news.ycombinator.com/item?id=43745615 and built a simple workflow to process all my photos: for each photo it generates a text description, a list of keywords, and the mood.
My plan for the next step is to detect faces, ask the user to label the most occurring faces, and then label all images accordingly. This step seems a bit harder than just feeding the image through Gemini and asking it to create labels.
I doubt it's useful to anyone but myself at this point, but it does work on my machine (taking 30+ sec per image). here's the git repo https://github.com/gdoct/batchscan
My plan for the next step is to detect faces, ask the user to label the most occurring faces, and then label all images accordingly. This step seems a bit harder than just feeding the image through Gemini and asking it to create labels.