Manager: @Brett Butterfield

Execution: 42 Hackathon - AI Track: Computer Vision Execution

Computer Vision Tool - Wiki Summary

We will be exploring multiple Computer Vision (CV) subjects including object detection and recognition from photos and videos. And creating a functional photo search engine form image embedding to service and retrieval from a Chroma Vector Store.

Description

This is a fun and challenging project that uses various Computer Vision (CV) related open source AI models to identify and label media in real-time for eventual search in a vector DB. Experience fine-tuning of LLMs and enabling a functional framework to build on your dream RAG or media search project.

Goal/Values

You will be able to quickly set up and interact with a number of photo and video streams that can be analyzed by open source models and datasets to see how media is by CV models, producing a summary of what the model predicts in the end.

Key value here is that you can experiment with how you can improve this process by tuning existing code/models or introducing new models that helps you to get better results!


🕵️ Agent Deploy/Test/Experimentation

Results, Findings & Improvements

This is where you document findings & learnings relating to the existing features/modules. Think of it as our shared Wikipedia focused on the inner workings of this module.

Found a really cool open-source LLM on Ollama or Yolo that turns out it works excellent on the evaluation data?

If you discover something valuable that you think the team should know about too, add that here as well!

Document away, and use references to resources (sites, documents, dashboards) while possible!

This is what we did this week:

  1. Object recognition and labeling in real-time on EDGE devices:

    https://github.com/brettb/cere-vision

  2. Image tagging, OCR and image search based on content and colors: ‣

    https://github.com/brettb/cere-vision-search

  3. Uploading media to the CERE SDK for private storage and sharing (TBA)


Quickstart guide