Try it live: The chatbot runs on Hugging Face Spaces with a clean Gradio UI.
👉 https://huggingface.co/spaces/Sanuwar/meet-sanuwar

🚀 What This Project Does

Meet Sanuwar is a personal Q&A chatbot built with only Python. It converts a single Markdown profile into a searchable index and answers career questions with strict context grounding—no vector database, no heavy frameworks.

  • Reads activities.md → builds embeddings → saves data/retrieval_index.json
  • Retrieves top-k chunks with cosine similarity
  • Answers with a concise, context-only prompt
  • Logs leads and unknown questions to simple CSVs

Why it’s different: Minimal code, transparent retrieval, production-friendly guardrails, and easy deploys to Hugging Face Spaces.

🤖 Agentic AI Workflow (Pure Python)

This career chatbot is a multi-step, agentic LLM workflow implemented without frameworks. The model has narrow, purposeful autonomy: it chooses between answering from context or invoking small tools when needed.

  • Perception → Reasoning → Action loop
    • Perception: Retrieve top-k chunks from activities.md (cosine similarity).
    • Reasoning: Apply a strict, context-only prompt with synonym + timeframe logic.
    • Action:
      • If the answer is known → reply concisely.
      • If unknown → log the question to unknown_questions.csv.
      • If the user shares contact → record name/email to leads.csv.
  • Autonomy with guardrails
    • The LLM never invents facts; it answers only from the provided Context.
    • “Niceties” (hi/thanks/bye) are handled conversationally without tools.
    • Year/timeframe questions synthesize overlapping roles (e.g., 2020 transitions).
  • Why this is agentic
    • The model decides when to answer vs. when to call tools (logging, lead capture).
    • Each tool is a tiny, auditable Python function (CSV writes)—no external services.
    • The loop is transparent and easy to extend (e.g., add email alerts or a task queue).

🏗️ Architecture Overview

Index Builder

Parses headings → creates chunks → generates OpenAI embeddings → writes data/retrieval_index.json.

Chat Engine

Cosine search → top-k context → concise answer with strict RAG prompt (synonyms + timeframe logic).

Lightweight Logging

Captures name/email to leads.csv and unknown questions to unknown_questions.csv—no database required.

⚡ Key Capabilities

🎯 Context-Only Answers (Strict RAG)

The bot answers only from the provided context—no hallucinations. Synonym mapping handles phrases like “professional experience” → Industry/Research/Teaching.

```text User: “Which company does Sanuwar work for?” Bot: “Humana (2020–Present).”

Updated: