2/08/2025

RAG based AI ChatBot






 In this blogpost we are trying to explain how to create RAG (Retrieval-Augmented Generation) ChatBot using Python technologies.




In both diagrams explained, how RAG is working behind the chatbot application.


Key Take Aways


What is an Embedding?

An embedding is a way to represent text, images, or other data as a list of numbers (a vector) so that computers can understand relationships between them. Instead of raw words or sentences, embeddings store meaning in a mathematical format that makes it easier for AI models to process and compare information.


  • In our case we use OpenAI "text-embedding-3-small" mode for data embedding.

What is OpenAI's text-embedding-3-small?

text-embedding-3-small is an embedding model released by OpenAI, optimized for creating vector representations of text.

🔹 Purpose:

  • Converts text into dense vector embeddings, making it easier to perform semantic searches, recommendations, and RAG applications.
  • Smaller and more efficient than larger models, making it great for fast and cost-effective applications.

🔹 How It Works:

  1. Input a text sentence into the model.
  2. The model converts the text into a high-dimensional vector representation (a list of numbers).
  3. These embeddings can then be stored in a vector database for fast similarity searches.

🔹 Use Cases:

  • RAG systems (fetching relevant documents to improve AI responses).
  • Semantic search (finding related content).
  • Clustering and classification (grouping similar documents).
  • Recommendation engines (finding similar items).

What Does "Retrieval" Mean?

Retrieval means finding and fetching relevant information from a source. In AI and computing, retrieval typically refers to searching for and retrieving data from a database, a document store, or the internet to use in answering questions, making recommendations, or improving AI-generated responses.

Vector-Based Retrieval (Used in AI & RAG)

  • Instead of searching for exact keyword matches, vector retrieval finds similar meanings using embeddings (numerical representations of text).
  • Used in semantic search, chatbots, and recommendation systems.

Example:

  • You search "best Italian food" → The system retrieves restaurant reviews related to "pizza", "pasta", and "Italian cuisine", even if they don’t contain the exact words.



What is Augmenting?

Augmenting means adding or enhancing something to improve its quality or functionality. In the context of AI, augmenting usually refers to providing additional information or modifying input data to improve results.

Augmenting Embeddings (Vector-Based AI)

  • Embeddings represent words, sentences, or images as numerical vectors.
  • Augmenting embeddings can involve combining multiple embeddings, adding contextual data, or enriching vectors to improve search and recommendation results.

Example:

  • If you’re building a semantic search engine, you can augment query embeddings with synonyms or extra context to improve search accuracy.


What Does "Generative" Mean?

The term "generative" refers to a system or model that can create (or generate) new content based on learned patterns. In AI, Generative AI refers to models that can generate text, images, music, code, or even videos from a given input.


Generative vs. Traditional AI

🔹 Traditional AI → Analyzes and classifies data (e.g., spam detection, fraud detection).
🔹 Generative AICreates new data similar to what it has learned (e.g., ChatGPT, DALL·E).

For example:
✅ A traditional AI model might classify an image as "dog" or "cat."
✅ A generative AI model can create a new image of a dog or cat that has never existed before.


  • In our case we use "gpt-3.5-turbo" LLM model to generate final response.


Explaining Prompting with a RAG Chatbot

A RAG (Retrieval-Augmented Generation) chatbot is an AI assistant that retrieves relevant information from an external knowledge source before generating a response. This helps the chatbot provide more accurate, updated, and fact-based answers.


🔗 How Prompting Works in a RAG Chatbot

Step 1: User Sends a Prompt (Query)

  • The user asks a question.
  • Example:
    "What are the benefits of AI in healthcare?"

Step 2: Retrieval (Fetching Relevant Data)

  • The chatbot searches its document store, database, or vector embeddings for relevant text.
  • It retrieves medical research papers, articles, or previous chatbot conversations about AI in healthcare.

Step 3: Augmentation (Adding Retrieved Data to the Prompt)

  • The chatbot enhances the original prompt by including the retrieved information.
  • Example of an augmented prompt:

    "User asked: 'What are the benefits of AI in healthcare?' Retrieved knowledge: 'AI helps in early disease detection, robotic surgeries, and personalized treatments.' Generate a response based on this information."

Step 4: Generation (AI Creates the Response)

  • The chatbot uses both the prompt and retrieved data to generate a factual and informed response.
  • Example AI response:
    "AI in healthcare provides several benefits, including early disease detection, AI-assisted surgeries, and personalized treatments. For example, AI-powered imaging tools can help detect cancer at an early stage, improving patient outcomes."


**************************  How To Guide  *****************************


Enough of theoury!, let's do some practical stuff 😀


Steps:


1. Create a vector database index in pinecone vector database.

Index name : langchain-doc-index

Then save the PINECONE_API_KEY in .env file.




2. Create an OpenAI account and save the OPENAI_API_KEY in .env file.





3. Create a langsmith account and save the LANGCHAIN_API_KEY in .evn file.




  • You can use google account to sign up for all the three accounts.


Now .env file should be like below.





4. Install Python 3.11


sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.11
5. Install pip in linux.
 sudo apt install python3-pip
6. Install pipenv
pip install --user pipenv
7. Go to project location and run below command to create pip virtual environment.

$ pipenv --python /usr/bin/python3.11
8. Create Pipfile.lock based on requirements.txt
$ pipenv install
9. Activate project's virtual environment
$ pipenv shell
10. Install dependencies
$ pipenv install urllib3
$ pipenv install langchain_community
$ pipenv install streamlit


11. Run Ingestion module to send embedded data to vector database index.

$ python ingestion.py
This will index scrapped data from langchain documentation, inside below folder






  • If you try to edit single record you can see both text and embedded represenation of a record.





12. Bootup ChatBot UI and ask a question.


$ streamlit run streamlit_app.py


Now we can ask a question.



Now you can see chatbot give an answer from content that we embedded into vector database index.


No comments:

Post a Comment