The ultimate guide to using vector search with knowledge graphs
**Moderator (Andy Ellicott):**
Hello, everybody. Welcome to today’s webinar on using vector search with knowledge graphs. For the next 30 to 45 minutes, you’ll be hearing from two experts in the field: Rowan Curran, who’s a Senior Analyst at Forrester Research, and Adam Hevenor, Director of Product Management at Aerospike.
Rowan focuses on AI, machine learning, and data science. He covers research into generative AI technologies, as well as AI/ML platforms, cognitive search, and synthetic data. Prior to Forrester, he spent a number of years in the software development field, working for the Commonwealth of Massachusetts. We’ll also be hearing from Adam, who’s responsible for defining and managing the evolution of the Aerospike Database Management System. He has spent the vast majority of his career-defining and managing distributed systems infrastructure, bringing it to market in areas such as vector database management, container orchestration, observability, machine learning infrastructure, and more.
Next slide. Our agenda today will start with a look at the current state of AI application development—who’s doing what. We’ll follow that with a short primer on generative AI retrieval-augmented generation, better known as RAG, and vector data. We’ll also talk about how knowledge graphs are used in AI applications, followed by a discussion of the data infrastructure technology required to manage AI data. We’ll close with Q&A. If you have questions during the talk, please enter them into the Q&A panel, and we’ll get to as many of those as we can at the end. Since one of the most popular questions is whether the slides and recording of this will be made available to people, the answer is yes. Look for a link to those in your inbox a day or two following the event.
Next slide. Our expectation today is that after listening to Rowan and Adam share their experience and advice, you should have a solid understanding of the data management requirements that AI applications present. They’re quite unique and involve a lot of new knowledge required to help you make the best data infrastructure choices for your AI systems. So without further ado, let’s jump into it. Rowan, do you want to start by giving an overview of AI application development? It seems like it’s all about RAG today—do you want to talk about what you’re seeing?
**Rowan Curran:**
Yeah, so there are many other areas aside from RAG that we’ll talk about throughout the conversation, but I think a good place to start is probably the most common architecture we’re seeing in the AI application development space today, especially in the wake of the excitement around generative AI. Obviously, there are lots of traditional machine learning use cases, but retrieval-augmented generation (RAG) is really the core way we’re seeing a lot of folks extract value from things like large language models (LLMs) and other foundation models.
So what is retrieval-augmented generation? At its core, you can boil it down in two very simplistic ways. One way to look at it is as a simple search, similar to what we do in all different contexts of our lives. It’s just a different way of retrieving information from an indexed set of content. The other way of looking at it is as a single- or few-shot learning approach to getting better behavior out of a large language model. Essentially, when you give a large language model a task or query, like “Hey, answer this question,” if you give it the information to answer that question or a basis to answer it, the model will be more effective in generating a useful output.
When we look at the architecture of this, it’s actually pretty simple and straightforward. At the core, you have your question in the upper left, which can be anything. For example, “What is a good strategy in AI?” That is turned into a dimensional vector embedding, which is a number that corresponds to your query. Then that’s matched against a data source to extract an answer. So it’s like doing a search—while not the same mathematically, you can think of it as analogous to a standard search that finds similar results. You pull back those results and combine them with your original prompt and perhaps some additional context. For example, if this is a customer query on your website and they’re looking through your inventory for some type of commercial search use case, you might want to combine the search results with that customer’s propensity to buy another insurance policy.
Then that content is all given back to the query and used to generate an answer for the end user, grounded in the content you pulled back from the database. What this allows enterprises to do is to take advantage of the power of foundation models, particularly large language models, and ground them in specific enterprise data, directing their behavior toward the answers you want.
We’re seeing this applied across enterprises horizontally. There are tons of implementations for customer service and support use cases, whether for customer service agents serving external consumer customers or B2B customers, or for internal customer support use cases like internal help desks. What we’ve seen from folks who have implemented this type of application architecture, or who have combined it with their traditional search architectures—using a combination of keyword search and vector search to pull back the correct content—is that they’re getting very good results. We’re seeing folks achieve results like 50%, 60%, 70% deflection, and increased customer satisfaction on top of that from these knowledge retrieval use cases.
It’s really no surprise why retrieval-augmented generation has become the de facto standard for building generative AI-infused applications. But what’s important to recognize is that this can be done very badly or very well. While this is the core architecture we’re looking at here, a lot of what will drive success in specific scenarios comes from the additional pieces that come into play. Beyond the vector embeddings, there are things to think about, such as how you’re chunking up your information—how you’re breaking up long-form documents to get the correct similarity results. Also, the additional prompting context you add in that initial model prompt or filtering and reranking your results will really improve the outcome of that final human-like answer with citations.
So, while there’s a core architecture common across all retrieval-augmented generation architectures, there are a lot of additional details and nuances that determine how good the architecture is. I’ll preview that later on. Retrieval-augmented generation was developed to support vector similarity search and answers from large language models. But we’re entering a stage where retrieval-augmented generation is being used colloquially to refer to any type of retrieval occurring in the context of answering a question with a large language model.
I’m starting to see some folks use retrieval-augmented generation with text-to-SQL generation or just a standard keyword search, which I think is a reasonable approach. I don’t see anything wrong with that, but it’s important to be aware of how we started with a very specific technical definition, and now the market may be taking it in a more colloquial direction.
**Adam Hevenor:**
And I’ll add to that. There are a lot of parallels with web technology that people talk about. I do think RAG is the architecture that’s like having a website in the early 2000s—it can go so many ways. There are so many things you can do with a website, and as the client-server architecture has evolved, it’s now a whole application. People think of browsers as operating systems, and we’re entering that same trajectory with the RAG architecture. We’re just at the beginning.
I’m excited to talk about why we’re seeing this now, what trends are unlocking these capabilities, and what tools are unlocking these capabilities. But, as Rowan said, there’s a full spectrum. It’s super easy to build a RAG, just like it was easy to build a website in 2000. Well, “easy”—it’s probably “air quotes easy” to get started, but it’s also quite hard to perfect a product and bring something to market that’s going to solve a real problem. But like I said, there’s a whole set of new tools we’re going to talk about, and I’m excited to educate on how to use them and what’s available.
**Rowan Curran:**
Adam, I love that metaphor. It made me think of something that’s very apt here. In the same way, I and many others were putting together our own websites when we were teenagers in the early 2000s that we thought were quite good, no company would have wanted to use those websites for their corporate website. Similarly, I’m seeing plenty of people put together these really complex RAG architectures that are interesting, but when you really want a robust, enterprise-secure approach, you want a partner and good guidance on how to do it effectively.
I think that’s a very apt metaphor with the early websites, where people weren’t thinking about permissions, access controls, and all that, which is super crucial in these architectures.
**Adam Hevenor:**
Both of you mentioned vectors, vector embedding, vector data. Adam, do you want to talk a little bit about that in more detail? What’s a vector, and how does it work in vector search?
Sure, let’s hop to the next slide. A vector embedding is a statistical representation of any piece of data. In the example you see here, this is using a simple compression model to take an image—you can barely make it out here of Abraham Lincoln—and convert that into a set of integers, which you see on the right-hand side of the screen.
This is important because it allows us to take this data and make it easily searchable and use algorithms to find things that are similar to other things. In vector searching, it’s called a similarity search. One of the things really accelerating capabilities here is the availability of not only the language models but what are referred to as embedding models.
To power the language models and retrieval-augmented generation systems, you also need the ability to take these pieces of data, convert them into searchable vectors, and perform a similarity search. That piece of data can be any type of unstructured data. Here, we’re looking at an image embedding model, but you can do this with text embedding models, video, audio files, and more.
There are a wide variety of ways to access these models. All the AI companies you hear about—OpenAI, Anthropic, Mistral—they all offer embedding models as a service. Additionally, companies like Meta are investing in making open-source models that offer embedding capabilities available.
Every time I do one of these, I go check the current count of models on Hugging Face, and it’s over a million now. Hugging Face is a go-to place to get especially open-source models. The availability of embedding models is unlocking a lot of capabilities.
**Moderator (Andy Ellicott):**
Rowan, you started off by giving us a conceptual overview of a RAG application. To help people understand what’s involved in AI application infrastructure choices, could you go into the architecture of a generative AI application and help people understand some of the infrastructure choices they’ll have to make?
**Rowan Curran:**
Sure. This slide here is showing one of the pilot architectures we’re seeing emerge as folks want to do more than just retrieval-augmented generation with their AI applications. Typically, where we are today is folks doing RAG on its own—either just for that knowledge retrieval piece or for some other type of information delivery—or they’re doing what I sometimes call RAG-plus, where you have retrieval-augmented generation plus one or two singular actions within an external system. These RAG-plus architectures often involve writing a ticket into some enterprise portfolio management system, help desk ticketing system, or similar.
We’re also seeing RAG-plus with folks who are building retrieval-augmented generation on top of their existing machine learning-based AI applications. For example, both in factories and in heavy industries, we’re seeing companies take predictive maintenance algorithms that have been implemented for years and extend them. Instead of just alerting factory floor workers to a problem, they now say, “Hey, we’ve detected an issue and already run a similarity search. Here’s the potential resolution to the anomaly.”
At the cutting edge of this, people are starting to use this to automatically write maintenance requests into scheduling and backend systems, with a human’s approval before final launch. This flow allows potential issues to be resolved more efficiently. So RAG-plus is where we’re seeing a lot of cutting-edge innovation today.
But if we’re talking about the bleeding edge, this architecture is what we’re seeing from very sophisticated companies. You can see a central data model retrieval layer, sometimes called a planning model, which often involves a language model developing a plan for solving the user query or executing the workflow that has been launched. From that point, it interacts with external tools—API calls to internal or external systems—to either execute actions or retrieve useful information.
For example, you might retrieve time-series or remote sensing data from semi-structured data, and because this core data model is connected to something like a knowledge graph, it semantically understands and relates different pieces of information. This is more sophisticated than what’s possible with just standard similarity search. It also incorporates source attribution, parsing content, and other enterprise-critical tasks.
On the right side of the diagram, we see folks getting more sophisticated with output generation. This is an example of what’s called “LLM as a judge,” where content is generated twice (or more), and then another large language model judges which of the outputs is the best to provide the end user. This content could be anything: a response to a customer, a maintenance procedure, a piece of code, or even a recommendation for advertising content.
We’re also starting to see RAG being used to augment real-time applications. This is where the bleeding edge of this technology is emerging.
**Moderator (Andy Ellicott):**
Great. Since the topic of the webinar is vector search and knowledge graphs, how do you see knowledge graphs joining up with vectors at the database level? Where are you seeing that happening?
**Rowan Curran:**
I think knowledge graphs are crucial for many of these next-generation AI use cases. There isn’t a particular use case where they’re necessarily bad, but there are specific use cases where they’re more evidently useful and valuable today. For example, information retrieval, product planning, and development, or research in healthcare and biosciences are areas where retrieval-augmented generation combined with knowledge graphs can provide additional context. Knowledge graphs allow you to give models better grounding, improving the quality of their outputs.
**Adam Hevenor:**
I always think about it from a data infrastructure standpoint and in terms of what data is available. If you have a known set of structured data, that’s when you’ll want to use a graph. A typical example is when you know that a particular airport has a flight route to another airport. In such cases, graphs give you a really powerful query language for traversing the data. Graph data can be very useful in AI, but it requires structured data.
**Moderator (Andy Ellicott):**
So how do graphs and vectors work together in a database environment? Adam, do you want to speak to that?
**Adam Hevenor:**
Sure. Starting from unstructured data, vectors give you a statistical representation of any piece of unstructured data, and you can use similarity search to find relevant data points. This is useful in cases like customer support or product recommendations, where you need to find relevant snippets of information from a large corpus of unstructured data.
On the other hand, graphs represent structured data, where relationships between entities are well-defined. The two approaches—vector search for unstructured data and graphs for structured data—can be used together to create multimodal AI applications. This allows you to get the best of both worlds: unstructured data exploration using vector search and precise structured data queries using graph databases.
Aerospike, for instance, allows you to combine key-value data, vector search, and graph data all in one system. This multimodal approach is increasingly important for enterprises looking to integrate various data types into their AI infrastructure.
**Moderator (Andy Ellicott):**
We’re nearing the end here, so before we jump into Q&A, let’s get some closing thoughts. Rowan, for anybody starting their AI application development journey, what would you suggest? What skills should they acquire, and what resources would you recommend to speed up their learning curve?
**Rowan Curran:**
First and foremost, it’s crucial to get your data in order. That’s always a priority, but it’s especially important when building a RAG architecture. Having a bunch of unstructured data is great, but having it well-tagged, well-governed, and with good metadata is even better. That will make it easier to embed and retrieve the right content for your AI models.
You should also consider how you chunk up your data. If you have large documents, breaking them down into smaller, semantically meaningful chunks is essential for getting good results from vector search. For example, in a legal context, you might chunk documents by clause to ensure more granular retrieval of relevant sections. Different chunking strategies will work better for different use cases.
Lastly, understand that building a successful RAG architecture is a process. You may need to fine-tune your model over time to achieve optimal accuracy. While it’s not as intensive as training a model from scratch, it still requires effort.
**Adam Hevenor:**
To add to Rowan’s points, if you’re thinking about starting an AI project at the enterprise level, begin by evaluating your data. Determine how you can leverage both your unstructured and structured data effectively. On the developer side, if you want to start experimenting with these technologies, there are plenty of resources available. Hugging Face is a great place to explore open-source models, and GitHub has many example applications you can try. Experimenting with small projects will help you understand the tools better.
A fun way to start is by building something personal—like a photo search engine for your personal photos. I did something like this to build a local image search for my vacation photos, using an OpenAI image embedding model. By playing around with personal projects, you’ll get familiar with the tools and be able to think more strategically about how to apply them in your enterprise context.
**Moderator (Andy Ellicott):**
Unfortunately, we’re out of time. Thanks again, Rowan and Adam, for sharing your expertise today, and thanks to our audience for joining us. We’ll follow up on any questions we couldn’t get to, and we’ll make sure you receive the slides and recording soon. We hope to see you again at a future webinar. Thanks, everyone!
About this webinar
When vector search is combined with other search methods, it can greatly improve the capabilities and accuracy of AI applications. In this webinar, industry experts explain how to combine vector search, knowledge graphs, and key-value queries in different database application architectures (e.g., RAG, GraphRAG) to address popular use cases like recommendation engines, personalization, fraud detection, and others.