Understanding Large Language Models

To build effective LLM-based applications, it’s essential to understand some concepts of what a large language model is. The explanation below should help you with enough knowledge to get through the workshop. If you want to learn more, please refer to the book.

The basics of an LLM

A large language model (LLM) is a deep neural network built to generate a text sequence based on an input text sequence. We call the input text sequence a prompt and the output sequence a response.

When you want to generate output with an LLM, you need to give it a sequence of tokens. You can obtain these tokens by splitting the text into small chunks. These small chunks or tokens can be a delimiter, a small word, or part of a larger word. You only get one token as output from the model. If you want to produce a complete sentence, you will have to repeat the prediction process. This time, however, you input the original input sequence with the newly created output token to obtain the next token.

An LLM has no internal state or notion of a conversation. So, while you think you’re sending a single prompt to ChatGPT, you send the whole conversation over each time you ask a question.

LLMs can typically process input and output sequences up to a maximum number of tokens. The limit for the number of tokens in the input and output is called the context window size. After a while, you’ll reach the limit of the context window if you chat long enough with an LLM. Usually, the LLM-based application will decide what happens if you hit the context window limit. The application can choose to discard old messages or to summarize the conversation.

Making LLMs sound more human

While using ChatGPT and other tools, entering the same prompt twice yields a different response. This variation in response is because of how LLMs predict the next likely token. They assign a probability to each of the possible tokens that come next in a sequence. They then use a sampling technique to randomly select a token from the set of tokens with the highest probability and return it. This sampling technique makes the LLM sound more human, but it’s a statistical pattern-matching machine.

You can control the randomness by setting a temperature for the model. A higher temperature makes the model behave more randomly, while a lower temperature makes it behave less randomly.

Training LLMs

LLMs are special because they’re trained on a massive amount of data scraped from the internet and books. The training process involves three steps that are important to understand so you have a good grasp of the power of an LLM:

First, we take sentences from the training dataset and hide one word. The LLM is then asked to predict the missing word. This step is called pretraining the model. After this step, it knows how to produce sequences of text.
Next, we teach the LLM how to perform various tasks like writing code, poems, TODO lists, or following reasoning tasks. This step is called the post-training step. After this step, the LLM is better at reasoning and completing simple tasks. However, it does require one more training step to be a reasonable artificial being.
The final step in the training process involves aligning the LLM with preferred language responses. In this step, we teach the LLM not to be racist and to follow a specific writing style. After this step, the LLM is ready to be used in production. The alignment step uses an algorithm called reinforcement learning with human feedback.

Calling tools with LLMs

Many LLMs can call functions if you provide them with the necessary metadata. Tool calling works because they added the functionality in the post-training step of the training process. It works like this:

You invoke the LLM with your prompt and tools. Each tool needs to have a name, description, and a set of parameter descriptions.
When the LLM detects that it should call one of your tools, you’ll get a tool_call response. You need to handle this response.
When you get a tool_call response, you need to call the tool and provide the result as an extra chat message to the LLM.
The LLM uses the tool output to generate a final response.

Tools are a powerful concept we’ll use in this workshop to build our agent and give it access to content search. While search is one of the most used tools in the toolbox, it’s not the only kind of tool that an LLM can use.

You can provide the LLM with tools that look up information and tools that manipulate something in the environment like switching on a light or sending an email. There’s no limit to what you can do with tools.