Context integration
In this module, we’ll explore how to use the information we processed in the previous module as extra context information for the agent.
The pattern we’re going to use is called Retrieval Augmented Generation (RAG). The agent retrieves extra information based on the prompts and uses it to augment context given to the model. This results in the LLM generating a more grounded response.
Semantic search
The RAG pattern is often implemented by using semantic search for the retrieval part.
Semantic search utilizes vector databases to understand and retrieve information based on the meaning and context of the query rather than just matching keywords. This method allows the search engine to grasp the nuances of language, such as synonyms, related concepts, and the overall intent behind a query.
Semantic search excels in environments where user queries are complex, open-ended, or require a deeper understanding of the content. For example, searching for “best smartphones for photography” would yield results that consider the context of photography features in smartphones, rather than just matching the words “best,” “smartphones,” and “photography.”
For other references, see plugins for RAG
Semantic search uses the same structure as the chunking lab. For chunking, we saw that we need to convert the content of the document into an embedding. The idea of integrating context uses the same structure, we’ll convert the question into an embedding then use the vector database to find the most relevant chunks.
Labs
Lab 1: Implement semantic search
Goal: Learn how to integrate context into our prompts and use the vector database to generate more accurate and relevant responses.
In this lab we’ll connect the vector store we created earlier to the chat application.
Exercise: Integrating Context
Make sure to copy the starter code from code/module-03 folder. You can also continue working on the files from the previous module.
Also download the book’s raw markdown files as dataset.
Place these files in the apps/indexer/InfoSupport.AgentWorkshop.Indexer/Content
directory if you haven’t downloaded the content
into the Content
directory before. You’ll need these files for the indexer to work correctly.
The first step is to integrate the vector store into the apps/chat/InfoSupport.AgentWorkshop.Chat
application
so we can use it to provide the agent with context information.
-
Add the following packages to
apps/chat/Infosupport.AgentWorkshop.Chat
project.Terminal window dotnet add package Microsoft.SemanticKernel.Connectors.Qdrant --prereleasedotnet add package Aspire.Qdrant.ClientThis enables us to communicate with the Qdrant vector store we created in the previous module. We’ll need to include some configuration to make this work.
-
Configure the Qdrant client in the chat application. Add the following statement to the
apps/chat/InfoSupport.AgentWorkshop.Chat/Program.cs
file below the line where it saysAddAzureOpenAIClient("languagemodel")
.builder.AddQdrantClient("qdrant");This statement enables the Qdrant client dependency in the application. We’ll need it for the next step where we configure the vector store connector for Semantic Kernel.
-
Configure the vector store connector. Add the following statement to
apps/chat/InfoSupport.AgentWorkshop.Chat/Program.cs
. Add it below the line where we configure theKernel
object.builder.Services.AddQdrantVectorStore();
Lab 2: RAG as a function
Goal: Learn how to provide context information to an agent.
In this lab we’re going to use the configured vector store with a feature called function calling to provide relevant context information to the agent.
Function calling in Semantic Kernel
Function calling is a feature that almost any LLM has. Alongside your prompt you provide metadata listing functions that the LLM can call.
Functions come in two variants:
- Data retrieval functions
- Task automation functions
For each function you want the LLM to be able to call, you need to provide a name, a description what the tool does, and information about the parameters that the LLM must provide data for.
The LLM doesn’t call functions itself, instead you get a response from the LLM indicating that you should make a function call. This gives you the opportunity to validate the proposed function call and invoke the function if you approve it.
Semantic Kernel handles function calls for you while providing you with the necessary controls to limit what the LLM can do.
You can provide the Kernel
object with one or more collection of functions
called plugins. This enables you to reuse related functions across multiple
agent applications if you like.
Plugins are a key component of Semantic Kernel. If you have already used plugins from ChatGPT or Copilot extensions in Microsoft 365, you’re already familiar with them. With plugins, you can encapsulate your existing APIs into a collection that can be used by an AI. This allows you to give your AI the ability to perform actions that it wouldn’t be able to do otherwise.
In the context of RAG, using a plugin allows the LLM to perform a vector search and retrieve information when it decides to do so. This means that the information in the vector store is not used in every question that is asked.
Exercise: RAG as a plugin
In this exercise, we’ll learn how to implement the semantic search we did in the previous lab as a plugin. You’re going to replace the previous implementation of RAG with a plugin.
-
Add a new file
TextUnit.cs
to the directoryapps/chat/InfoSupport.AgentWorkshop.Chat/Models
with the following content:using Microsoft.Extensions.VectorData;namespace InfoSupport.AgentWorkshop.Chat.Models;public class TextUnit{[VectorStoreRecordKey]public ulong Id { get; set; }[VectorStoreRecordData]public required string OriginalFileName { get; set; }[VectorStoreRecordData(IsFullTextIndexed = true)]public required string Content { get; set; }[VectorStoreRecordVector(1536)]public ReadOnlyMemory<float> Embedding { get; set; }}This class is an exact copy of the
TextUnit
class we used in the indexer application and will provide the necessary structure for the retrieval function that we’re going to use. -
Modify
apps/chat/InfoSupport.AgentWorkshop.Chat/Services/AgentCompletionService.cs
to include a constructor parameter for theIVectorStore
.public class AgentCompletionService(Kernel kernel, ApplicationDbContext applicationDbContext, IVectorStore vectorStore) : IAgentCompletionService{// ... Rest of the code}We’ll need this vector store to setup the text search tool.
-
Modify the
apps/chat/InfoSupport.AgentWorkshop.Chat/Services/AgentCompletionService.cs
to include a new function calledConfigureAgent
with the following content:private ChatCompletionAgent ConfigureAgent(){var contentCollection = vectorStore.GetCollection<ulong, TextUnit>("Content");var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();var textSearch = new VectorStoreTextSearch<TextUnit>(contentCollection, embeddingGenerator);kernel.Plugins.Add(textSearch.CreateWithGetTextSearchResults("BookContent"));var agent = new ChatCompletionAgent(){Name = "Mike",Instructions = EmbeddedResource.Read("instructions.txt"),Kernel = kernel,Arguments = new KernelArguments(new AzureOpenAIPromptExecutionSettings{FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()})};return agent;}Make sure to remove the private field
_chatCompletionAgent
from the class. We no longer initialize the agent in the constructor.The new
ConfigureAgent
method is quite a bit larger than before. The main difference is that we now perform a lookup in the vector store for the Content collection and connect it to a vector text search component. We then connect the text search as a plugin to the kernel object of the agent. This enables the agent to perform the text search. -
Modify the
GenerateResponseAsync
method to use the new agent initialization method. Add the following code to the start of the method:var agent = ConfigureAgent();Then replace all references to
_chatCompletionAgent
with the newagent
variable.
What is left now is to start the application and see the results. You can run the application from hosting/InfoSupport.AgentWorkshop.AppHost
.
After the application host is started, you can go to http://localhost:3000/ to view the frontend and start chatting.
Summary and next steps
In this module we covered how to implement semantic search for the content we indexed in the previous module. We then learned the difference between semantic search in the context of a prompt and as part of a function where the agent can choose to use the search or answer the prompt directly.
This marks the end of this workshop. If you’re interested in learning more about Semantic Kernel, we recommend reading the book Building Effective LLM-Based Applications with Semantic Kernel!