In this blogpost we are trying to explain how to create RAG (Retrieval-Augmented Generation) ChatBot using Python technologies.
Key Take Aways
An embedding is a way to represent text, images, or other data as a list of numbers (a vector) so that computers can understand relationships between them. Instead of raw words or sentences, embeddings store meaning in a mathematical format that makes it easier for AI models to process and compare information.
- In our case we use OpenAI "text-embedding-3-small" mode for data embedding.
What is OpenAI's text-embedding-3-small
?
text-embedding-3-small
is an embedding model released by OpenAI, optimized for creating vector representations of text.
🔹 Purpose:
- Converts text into dense vector embeddings, making it easier to perform semantic searches, recommendations, and RAG applications.
- Smaller and more efficient than larger models, making it great for fast and cost-effective applications.
🔹 How It Works:
- Input a text sentence into the model.
- The model converts the text into a high-dimensional vector representation (a list of numbers).
- These embeddings can then be stored in a vector database for fast similarity searches.
🔹 Use Cases:
- RAG systems (fetching relevant documents to improve AI responses).
- Semantic search (finding related content).
- Clustering and classification (grouping similar documents).
- Recommendation engines (finding similar items).
What Does "Retrieval" Mean?
Retrieval means finding and fetching relevant information from a source. In AI and computing, retrieval typically refers to searching for and retrieving data from a database, a document store, or the internet to use in answering questions, making recommendations, or improving AI-generated responses.
Vector-Based Retrieval (Used in AI & RAG)
- Instead of searching for exact keyword matches, vector retrieval finds similar meanings using embeddings (numerical representations of text).
- Used in semantic search, chatbots, and recommendation systems.
✅ Example:
- You search "best Italian food" → The system retrieves restaurant reviews related to "pizza", "pasta", and "Italian cuisine", even if they don’t contain the exact words.
What is Augmenting?
Augmenting means adding or enhancing something to improve its quality or functionality. In the context of AI, augmenting usually refers to providing additional information or modifying input data to improve results.
Augmenting Embeddings (Vector-Based AI)
- Embeddings represent words, sentences, or images as numerical vectors.
- Augmenting embeddings can involve combining multiple embeddings, adding contextual data, or enriching vectors to improve search and recommendation results.
✅ Example:
- If you’re building a semantic search engine, you can augment query embeddings with synonyms or extra context to improve search accuracy.
What Does "Generative" Mean?
The term "generative" refers to a system or model that can create (or generate) new content based on learned patterns. In AI, Generative AI refers to models that can generate text, images, music, code, or even videos from a given input.
Generative vs. Traditional AI
🔹 Traditional AI → Analyzes and classifies data (e.g., spam detection, fraud detection).
🔹 Generative AI → Creates new data similar to what it has learned (e.g., ChatGPT, DALL·E).
For example:
✅ A traditional AI model might classify an image as "dog" or "cat."
✅ A generative AI model can create a new image of a dog or cat that has never existed before.
- In our case we use "gpt-3.5-turbo" LLM model to generate final response.
Explaining Prompting with a RAG Chatbot
A RAG (Retrieval-Augmented Generation) chatbot is an AI assistant that retrieves relevant information from an external knowledge source before generating a response. This helps the chatbot provide more accurate, updated, and fact-based answers.
🔗 How Prompting Works in a RAG Chatbot
Step 1: User Sends a Prompt (Query)
- The user asks a question.
- Example:
Step 2: Retrieval (Fetching Relevant Data)
- The chatbot searches its document store, database, or vector embeddings for relevant text.
- It retrieves medical research papers, articles, or previous chatbot conversations about AI in healthcare.
Step 3: Augmentation (Adding Retrieved Data to the Prompt)
- The chatbot enhances the original prompt by including the retrieved information.
- Example of an augmented prompt:
Step 4: Generation (AI Creates the Response)
- The chatbot uses both the prompt and retrieved data to generate a factual and informed response.
- Example AI response:
************************** How To Guide *****************************
- In this example, I am using Ubuntu 22.04 with Python 3.11
- Please checkout https://github.com/dhanuka84/rag-based-chatbot/tree/main project.
Steps:
- You can use google account to sign up for all the three accounts.
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.11
5. Install pip in linux.
sudo apt install python3-pip
6. Install pipenv
pip install --user pipenv
7. Go to project location and run below command to create pip virtual environment.
$ pipenv --python /usr/bin/python3.11
8. Create Pipfile.lock based on requirements.txt
$ pipenv install
9. Activate project's virtual environment
$ pipenv shell
10. Install dependencies
$ pipenv install urllib3 $ pipenv install langchain_community$ pipenv install streamlit
$ python ingestion.py
- If you try to edit single record you can see both text and embedded represenation of a record.
$ streamlit run streamlit_app.py
Now you can see chatbot give an answer from content that we embedded into vector database index.