Using OpenAI with custom data

An important use case of large language models (LLMs) is to answer questions from a specific corpus. Tools like ChatGPT are great for conversation, but not so great for retrieving information from a specific set of knowledge. To cover up for this, two strategies can be used: forcing ChatGPT to account for a specific corpus (which is passed through an index), and fine-tuning. Which of these strategies is better is anyone’s guess at this point, with little more than anecdotical evidence in this direction.

Using Llama Index, we can easily incorporate a corpus. First, let’s connect to OpenAI and ask a slightly inappropriate question:

import os
  
# add your openai api key here
os.environ['OPENAI_API_KEY'] = 'Your OpenAI API key here'

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import OpenAI

# Necessary to use the latest OpenAI models that support function calling API
service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo-0613"))
data = SimpleDirectoryReader(input_dir="./data/en/").load_data()
index = VectorStoreIndex.from_documents(data, service_context=service_context)

chat_engine = index.as_chat_engine(
    chat_mode='openai',
    verbose=False
)

chat_engine = index.as_chat_engine(chat_mode="openai", verbose=True)

response = chat_engine.chat('Are Russians welcome in Georgia?')

The data/en/ folder contains an artice from The Guardian, regarding Russian refugees in Georgia.

Unsurprisingly, if not force, ChatGPT replies in its usual politically correct style:
As an AI language model, I don't have personal opinions. However, I can provide information. The relationship between Russia and Georgia has been complex due to historical and political factors. In recent years, there have been improvements in diplomatic relations and efforts to promote tourism and cultural exchanges between the two countries. Generally, Georgian people are welcoming and hospitable to visitors from all over the world, including Russians. However, it's important to note that political tensions and historical events may still influence individual attitudes. It is always recommended to respect local customs, laws, and traditions when visiting any country.
If we push it a little bit by forcing it to use the query engine:

response = chat_engine.chat('Are Russians welcome in Georgia?', function_call="query_engine_tool")

Then we get an answer that better reflects the article:

=== Calling Function === Calling function: query_engine_tool with args: { "input": "Are Russians welcome in Georgia?" }
Got output: Russians are not universally welcome in Georgia. The influx of Russian emigrants since the start of the war in Ukraine
has overwhelmed the city of Tbilisi, taxing its housing and social infrastructure. The invasion of Ukraine has eroded the distinction
between the Russian state and its people in the eyes of many Georgians. Graffiti expressing anti-Russian sentiment has appeared across Tbilisi,
reflecting the negative attitudes towards Russians. The Georgian government has also faced criticism for its handling of Russian opposition-linked
figures seeking entry into the country. While some Russians have been allowed to settle in Georgia, the reception towards them is mixed.

Author: Pablo Maldonado

I am a consultant, technical trainer and lecturer in automation, data science and AI. My students, either at universities or companies, appreciate my hands-on approach coming from my experience at several projects across industries like financial services, marketing, and HR. Since 2017, I have been leading workshops for 30+ clients in 10+ countries including major companies such as Shell, Renault, PwC, O’Reilly Media, O2, La Poste, as well as institutions like the European Investment Bank, the Czech National Bank, the Australian Government, and many others. I have written a couple books (on Shiny, and on Deep Learning). I hold a PhD in Applied Mathematics from the Sorbonne Université in Paris, with a specialization in Game Theory and Markov Decision Processes.

Leave a Reply