Building a Conversational AI Using OpenAI, Faiss, and Flask
A step-by-step guide for next generation fullstack developers
In this blog post, we'll dive into a Python script that builds a conversational AI. We're using OpenAI's Language Model (LLM), the Faiss library for efficient similarity search of vectors, and Flask to create a web server that communicates with our chatbot.
Setup
Before we dive into the script, let's list down the Python libraries we'll need. You can install them with pip:
!pip install langchain
!pip install unstructured
!pip install openai
!pip install python-dotenv
!pip install faiss-cpu
!pip install tiktoken pyngrok==4.1.1 flask_ngrok requests
Also, you need to set up your OpenAI and Ngrok API keys in your environment variables as follows:
from dotenv import load_dotenv
import os
os.environ['OPENAI_API_KEY'] = '<YOUR_OPENAPI_KEY>'
!ngrok authtoken '<YOUR-NGROK_TOKEN>'
load_dotenv()
API_KEY = os.environ.get("API_KEY")
Loading Custom Data
The first step is to load the data to be used with the LLM. Langchain provides lots of APIs to load multiple data formats. Check out the documentation on what can be loaded.
This script uses the langchain.document_loaders
module to load data from an Excel file:
from langchain.document_loaders import UnstructuredExcelLoader
loader = UnstructuredExcelLoader("./sample_data/customDataOnExcel.xlsx")
docs = loader.load()
Then, the loaded data is split into chunks using the RecursiveCharacterTextSplitter
class:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=500)
documents = text_splitter.split_documents(docs)
This is done for better processing and to manage the token size limit of the LLM. More explanation is beyond the scope of this tutorial.
Do follow me on Twitter I deep dive into these in my threads.
Embeddings
Instead of storing text data as-is in the database, we convert the text into vector representations, or "embeddings." This script uses the OpenAIEmbeddings
class from the langchain.embeddings
module to generate embeddings using OpenAI's API:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)
Loading Vectors into Vector Database (FAISS)
After creating vector embeddings, the script stores them in a database using the Facebook AI Similarity Search (Faiss) library:
from langchain.vectorstores.faiss import FAISS
import pickle
vectorstore = FAISS.from_documents(documents, embeddings)
with open("vectorstore.pkl", "wb") as f:
pickle.dump(vectorstore, f)
The database can then be loaded as needed:
with open("vectorstore.pkl", "rb") as f:
vectorstore = pickle.load(f)
Preparing Prompts
Prompts help define the identity and conversation flow of the LLM. Here, a prompt template is defined and instantiated:
from langchain.prompts import PromptTemplate
basePrompt = """
Put your prompt here
{context}
Question: {question}
Answer here:
"""
PROMPT = PromptTemplate(template=basePrompt, input_variables=["context", "question"])
Setting up the LLM and Chains
The script sets up an instance of the OpenAI language model and a retrieval question-answering chain, which retrieves relevant documents based on the user's input:
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
llm = OpenAI(openai_api_key=API_KEY)
Setting Up the Conversation Memory
A Conversation Buffer Memory object is created to keep track of the conversation history:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, output_key="answer")
It is then used in a Conversational Retrieval Chain, which is a complex chain that combines document retrieval and conversational memory:
from langchain.chains import ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(
llm=OpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=API_KEY),
memory=memory,
retriever=vectorstore.as_retriever(),
combine_docs_chain_kwargs={"prompt": PROMPT},
)
Running the Python Web Server
Finally, the script sets up a Flask web server and uses Ngrok to make the server publicly accessible. The server hosts an API endpoint that allows a client to interact with the chatbot:
from flask import Flask, request, jsonify
from flask_ngrok import run_with_ngrok
app = Flask(__name__)
run_with_ngrok(app)
@app.route('/submit-prompt', methods=['POST'])
def generate():
data = request.get_json()
prompt = data.get('prompt', '')
query = prompt
print("Question Asked: ", query);
response = qa({"question": query})
print("Sending Response...")
data = {"response": response["answer"]}
return jsonify(data)
if __name__ == '__main__':
app.run()
In this walkthrough, we have covered how to build a conversational AI using OpenAI, Faiss, and Flask.
This setup allows us to use OpenAI's Language Model more efficiently and effectively, providing a seamless conversational experience.
Summary
Here is the entire code for your reference. I would suggest you to make the edits as necessary for your use case and hit me up on Twitter if you have any doubt.
If I could provide you any value do drop a follow on Twitter. I talk extensively on Fullstack Engineering, JavaScript and Developer Growth.