Agnes Prototype: AI-Powered Raw Material Sourcing (Spherecast Hackathon)

Published:

🤖 AI Disclosure: The content here is created with AI assistance and reviewed by me. I believe in transparency about how content is made!

Note: Agnes is a product originally conceived and developed by Spherecast. This page describes a prototype I vibe-coded independently during their hackathon event — it is not affiliated with or representative of Spherecast’s actual Agnes product.

Background

The Spherecast Hackathon challenged participants to build an AI pipeline addressing fragmented raw material sourcing in the Consumer Packaged Goods (CPG) industry. The core problem: large CPG brands often purchase the same raw ingredients across multiple product lines through different suppliers, losing out on bulk discounts and making compliance harder to track.

I vibe-coded a prototype during the hackathon that explored how AI could help identify consolidation opportunities and surface sourcing recommendations from a Bills of Materials database.


What I Built

The prototype is a Python-based pipeline with a Gradio web interface. It works against a provided SQLite database containing 61 companies, 876 raw materials, 40 suppliers, 149 BOMs, and 1,528 BOM components.

RAG Pipeline

The retrieval layer uses a hybrid approach:

  • **FAISS ** (HNSW index) for dense semantic search via all-MiniLM-L6-v2 embeddings (384-dim)
  • **BM25 ** (Okapi) for keyword matching on chemical names and regulatory codes
  • A cross-encoder reranker (ms-marco-MiniLM-L-6-v2) to re-rank the top results
  • Hybrid scoring: 0.65 × vector_score + 0.35 × bm25_score

The knowledge base was built from 20 scraped regulatory documents (FDA 21 CFR 111, USP, NSF/ANSI 173, Halal/Kosher, Non-GMO).

Reasoning Layer

Sourcing evaluations are generated by Gemini Flash (gemini-flash-latest via google-genai SDK) at temperature 0.2, with structured JSON output. Each verdict includes an evidence trail citing the specific regulatory sources used.

Gradio UI

A five-tab web UI (agnes_ui.py, port 7860) covering ingredient substitution evaluation, portfolio health assessment, decision history, session logs, and a database explorer with live SQL reads.

Web Scraping (Planned, Not Fully Implemented)

A Playwright-based scraper was designed to fetch real supplier data with robots.txt compliance and rate limiting. During the hackathon, compliance data was hardcoded/synthetic due to time constraints — the scraping layer is aspirational infrastructure for a future iteration.


Honest Scope

This was a hackathon prototype built under time pressure. Notable limitations:

  • Compliance data was hardcoded, not dynamically scraped
  • Supplier trust scores used seeded random data, not real ERP data
  • Full portfolio analysis (143 ingredients) takes ~12 minutes sequentially
  • Storage is SQLite, not production-grade

Explore the full codebase: github.com/el-musleh/global_suppliers-for-spherecast