Building a Conversational AI Using OpenAI, Faiss, and Flask

Building a Conversational AI Using OpenAI, Faiss, and Flask

A step-by-step guide for next generation fullstack developers

Jul 3, 2023·

4 min read

Play this article

In this blog post, we'll dive into a Python script that builds a conversational AI. We're using OpenAI's Language Model (LLM), the Faiss library for efficient similarity search of vectors, and Flask to create a web server that communicates with our chatbot.


Before we dive into the script, let's list down the Python libraries we'll need. You can install them with pip:

!pip install langchain
!pip install unstructured
!pip install openai
!pip install python-dotenv
!pip install faiss-cpu
!pip install tiktoken pyngrok==4.1.1 flask_ngrok requests

Also, you need to set up your OpenAI and Ngrok API keys in your environment variables as follows:

from dotenv import load_dotenv
import os

!ngrok authtoken '<YOUR-NGROK_TOKEN>'

API_KEY = os.environ.get("API_KEY")

Loading Custom Data

The first step is to load the data to be used with the LLM. Langchain provides lots of APIs to load multiple data formats. Check out the documentation on what can be loaded.

This script uses the langchain.document_loaders module to load data from an Excel file:

from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader("./sample_data/customDataOnExcel.xlsx")
docs = loader.load()

Then, the loaded data is split into chunks using the RecursiveCharacterTextSplitter class:

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=500)
documents = text_splitter.split_documents(docs)

This is done for better processing and to manage the token size limit of the LLM. More explanation is beyond the scope of this tutorial.

Do follow me on Twitter I deep dive into these in my threads.


Instead of storing text data as-is in the database, we convert the text into vector representations, or "embeddings." This script uses the OpenAIEmbeddings class from the langchain.embeddings module to generate embeddings using OpenAI's API:

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)

Loading Vectors into Vector Database (FAISS)

After creating vector embeddings, the script stores them in a database using the Facebook AI Similarity Search (Faiss) library:

from langchain.vectorstores.faiss import FAISS
import pickle

vectorstore = FAISS.from_documents(documents, embeddings)

with open("vectorstore.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

The database can then be loaded as needed:

with open("vectorstore.pkl", "rb") as f:
    vectorstore = pickle.load(f)

Preparing Prompts

Prompts help define the identity and conversation flow of the LLM. Here, a prompt template is defined and instantiated:

from langchain.prompts import PromptTemplate

basePrompt = """
    Put your prompt here
    Question: {question}
    Answer here:

PROMPT = PromptTemplate(template=basePrompt, input_variables=["context", "question"])

Setting up the LLM and Chains

The script sets up an instance of the OpenAI language model and a retrieval question-answering chain, which retrieves relevant documents based on the user's input:

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

llm = OpenAI(openai_api_key=API_KEY)

Setting Up the Conversation Memory

A Conversation Buffer Memory object is created to keep track of the conversation history:

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, output_key="answer")

It is then used in a Conversational Retrieval Chain, which is a complex chain that combines document retrieval and conversational memory:

from langchain.chains import ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(
    llm=OpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=API_KEY),
    combine_docs_chain_kwargs={"prompt": PROMPT},

Running the Python Web Server

Finally, the script sets up a Flask web server and uses Ngrok to make the server publicly accessible. The server hosts an API endpoint that allows a client to interact with the chatbot:

from flask import Flask, request, jsonify
from flask_ngrok import run_with_ngrok

app = Flask(__name__)

@app.route('/submit-prompt', methods=['POST'])
def generate():
    data = request.get_json()
    prompt = data.get('prompt', '')
    query = prompt
    print("Question Asked: ", query);
    response = qa({"question": query})
    print("Sending Response...")
    data = {"response": response["answer"]}
    return jsonify(data)

if __name__ == '__main__':

In this walkthrough, we have covered how to build a conversational AI using OpenAI, Faiss, and Flask.

This setup allows us to use OpenAI's Language Model more efficiently and effectively, providing a seamless conversational experience.


Here is the entire code for your reference. I would suggest you to make the edits as necessary for your use case and hit me up on Twitter if you have any doubt.

If I could provide you any value do drop a follow on Twitter. I talk extensively on Fullstack Engineering, JavaScript and Developer Growth.

Did you find this article valuable?

Support Sagar by becoming a sponsor. Any amount is appreciated!