How do Large Language Models help my chatbot?
TL:DR Think of an LLM as an education. We study language, sciences, maths, humanities, et al. in school and that is the human equivalent of an LLM.
When I think of LLMs, I recall Keanu Reeves in The Matrix, sitting in the chair, plugged in and instantly knowing karate.
Now imagine you want to create your own artificial intelligence (AI) model to power a chatbot to respond to questions from your customers. You run a chain of florists. Would knowing karate help your chatbot? Perhaps, if you want it to suggest a knockout combination of roses and lilies, guaranteed to kickstart your celebration with a roundhouse.
Or let’s say I’m travelling to France. I want to communicate with people but I don’t speak French (I speak English, American, Canadian and Australian). Do I back up my conversational knowledge with a French interpreter or a Japanese interpreter?
Why do I need a large language model?
Before you invest your time reading what LLMs are, you should understand why you’ll use them.
If I wanted to create a bot from scratch, how would I do it? I could code my bot to handle the responses to different questions.
Q: What day is it?
A: It’s Friday
Easy. But what if the question is phrased in a different way?
Q: Which day of the week is it?
Do I need to code every possible way users might ask every single question? This is where natural-language processing (NLP) comes in. AI that can understand natural-language input and lets my code know what is being asked, or even answers the question for me.
LLMs can do this hard work for you.
What are LLMs?
Now you understand why using an LLM would be useful. What are they?
LLMs, such as OpenAI’s GPT-4o, are trained on huge volumes of data, enabling them to understand and generate human-like output. This is a form of natural language processing and the models have made tasks such as question answering, language translation, sentiment analysis, summarisation and conversational agents immensely more effective.
Azure offers a large number of commonly used LLMs as foundation models in the Azure Machine Learning model catalog (I told you I speak American). They are called foundation because the models are already trained on large bodies of data and can be utilised for many different AI applications. Different models have different licencing and costs associated with them. Many are open-source and publicly available through communities like Hugging Face.
Choose your LLM carefully. Some models are better at translating text whereas others excel in answering questions. To illustrate this concept, you might come across models with squad in their name. This can refer to Standford Question Answering Dataset which is a large body of Wikipedia articles with associated questions and answers. These models are trained (practiced) in answering questions about data.
Another set of LLMs, called Virchow, is produced by Paige AI and is good at analysing pathology images to aid in cancer diagnostics. It’s unlikely to be a suitable model for a chatbot about horticulture.
In the next blog post I’ll look at how you can extend the ‘education’ of your chosen LLM so it knows about your own data.