From data to insights: Visualizing Aerospike Graph with G.V()
Please accept marketing cookies to view this content.
Ishaan Biswas:
Hello and welcome. Thanks for joining this webinar, From Data to Insights: Visualizing Aerospike Graph with G.V(). I'm Ishaan Biswas, Director of Product Management at Aerospike. I'm responsible for AeroSpike Graph. I'm also joined here with Arthur from G.V() Let him introduce himself here.
Arthur Bigeard:
Hi everyone. Hi Ishaan. Thanks for having me. As Ishaan mentioned, my name's Arthur Bigeard. I am the founder of G.V() Originally, I come from a background of identity and access management where I've served, developed a bit of a passion for a graph database I've then poured into the making of the G.V() Software that we'll be presenting today. With that said, do you want to tell us a bit more about Aerospike Graph today?
Ishaan Biswas:
Sure. In this webinar, we'll talk about how do you harness the full potential of Aerospike Graph, especially when it comes to visualizing your graphs. We'll talk through some examples of how to build applications using a graph database. Whether you're a developer, an architect, or a data professional, this webinar will equip you with the knowledge and resources necessary to leverage these two tools effectively to build your graph applications. Aerospike Graph is a high performance graph database that's built for really large data volumes, but it provides predictable low latency, extreme query throughput within an industry leading total cost of ownership. This is the architecture of Aerospike Graph. The key point here that people might appreciate is the Aerospike graph is built in a way where you can independently scale compute and storage for maximum efficiency of resource usage, either in the cloud or on-prem. You can run especially your online transactional processing queries in real time at any scale. The scale goes around two dimensions, whether it's extreme throughput or with extremely large data volumes. The really nice thing is that you can now leverage Gremlin as the query language to talk to Aerospike Graph and express your business logic using Gremlin. We'll talk through several examples of this in this webinar.
With that, I'll hand it back to Arthur to talk about G.V() a little bit and share its key capabilities and features.
Arthur Bigeard:
Thanks Ishaan. What I'll do to introduce G.V() here is I'm just going to share my screen so you can visually see what it looks like. But just to get started with it, G.V() is a Gremlin integrated development environment. What that means is it's a tool that you can use to connect to your database's Aerospike graph in particular in order to query, visualize and edit your data. The idea is that as a developer or a technical user, when you're connecting to a database, the first thing that you want to do is use a tool that will allow you to better understand how it works and how it looks.
G.V() is a software executable. You can install it on Windows, MacOS, or Linux. If you're familiar with database IDs, you can think, for instance, as an example of MySQL Workbench or Microsoft SQL Server. It's a software that delivers the same type of features, but for Gremlin and Apache TinkerPop-based databases. Just to give you a quick example here, I'm just going to demo a very small data set, and we're going to go over some very basic features of G.V(). For instance here I've got a small graph called TinkerPop Modern that has a little bit of data. One of the things that you can do immediately with G.V() is visualize the data model for your data. The idea is that as an engineer, when you're working on a graph database, you're going to want to design your data model and you're going to want to visualize how it looks.
For instance, here we've got a small and simple data set with two vertices and a couple of edges. Now with G.V(), the most basic thing you can do, of course, is to just write your queries and query your data. I'm just going to do a really, really simple example here. I've got a little query editor opened right there, and I've got a little query that I'm going to type a little bit on and apologies in advance for any typos. But as you can see, as I'm typing my query, G.V() will do a number of things. First of all, it will provide suggestions to the user so they know what available Gremlin steps they can use during their query, along with a little handy documentation there as extracted from the Gremlin documentation to help users write their queries with all the information available about the different steps that they're going to be using and how they work.
Now, on top of that, as you're writing your queries, G.V() is capable of making context-aware suggestions that are based on the data model that you're using so that as you're writing it, you don't have to guess the name of your labels or your property keys. G.V() is just going to offer you all that right there. For instance, here, I'm going to type a really, really simple query. I'm just fetching our person vertices and the edges are connecting in and out of them. As I'm running the query, you can see that I'm going to return a graph visualization, which is of course really useful with graph databases and that's one of the reasons why graph databases are so useful is this really handy visualization there. In this visualization, there's a whole lot of things that we can do, and I'm not going to go over them entirely right now, but you can quickly do the properties on your elements or edit them. For instance, here, I can change that to Python, for instance, and save that, edit it directly against the database. I can also search the graph.
Now this is a rather small graph, so there's not much need for actually searching it, but on larger visualizations that comes really useful. It also comes with a number of other useful visualizations. For instance, a simple tabular view, which is something that everyone that deals with data will be used on a day-to-day basis. On top of all that, there's a number of other features that are really useful. For instance, as you're running queries on G.V(), you'll get access to query history. You can go back on queries that you've written before, you can format your query, you can translate them to a target language. You can also save your queries in G.V() so that you can access them later and save them directly on your database so that users, they're connecting their graph databases to G.V() gain access to the same reporting features. With all that in mind and G.V() offering all these features, I want to just swing back to Ishaan there and ask, how do you go about building a graph application with tools such as Aerospike Graph and G.V()?
Ishaan Biswas:
That's a very pertinent question. G.V() has been super helpful in doing many of these steps that we just talked about. What we've seen working with customers is there's usually four or five steps, discrete steps when you're building a graph application. The first one is actually understanding your problem statement really well. So define the business process that you want to improve, introduce or make more efficient. That means just understand what questions are you going to ask of the data. Whether you're building a customer 360 application or an identity graph application, you need to have very crisp definition of your problem statement.
Once you know that, you also then need to understand what are the data sources that you have access to. Depending on the organization you're in with and the levels of access you'll have will be different. You need to understand how you're going to get the data, and more importantly, what data you'll have access to. That's these two questions go hand in hand. The more data you have access to, the more questions you can ask of it. Once you know these two things, that will help. The next step is to define the graph schema. That's typically even without a graph database that most natural thing people will do is just whiteboard your schema. The nice thing about graph databases is that whatever you're white boarding with as a team about your schema, you can exactly replicate that in a tool like G.V() to create your schema and then fit your data set into that as well.
Once you know your business process, you know the data sources you have access to and you have a schema, you write out some queries, and this is just a further distillation of the first step. You write out your queries in the Gremlin query language and you start playing around with your data set. We'll look at some examples later on to see how simple this is to do using G.V(). This really helps hone into the specific queries you're eventually going to run in your application. In this case, you might end up getting into an iterative cycle here, which is good because then you can modify a schema. You might think about new data sets or data sources you can add and so on.
Once you have these two things established like a graph schema and some queries, you need to create a sample data set so you can script. There are tools available, third party tools available to help you create data sets. So you can build your own tools or script out just a fake data generator and run these sample queries and just see functionally if things work out for you. Then of course, it's scale, test and iterate. You scale your data set to larger and larger data volumes. You test it to see if it's performant enough, and then you irate. You find issues whether in your schema, in your queries or in the database, wherever it is you find issues and then you iterate till you reach your desired performance metrics.
That's in general what we've seen people do. Arthur, can we go through a data modeling exercise? I believe you have a data set or a data model that you can walk us through.
Arthur Bigeard:
Absolutely. Obviously today our time is a little bit short and we're not going to go through the entire process so what we've got for you today is a Aerospike graph here that I've got running locally loaded with some sample data. Now as far as the steps go, we're going to jump in straight to the querying stage. But before we do that, I'm just going to quickly present the data set that we're going to be working with. The data set that we're working with today is really simple. It is a collection of beers, breweries, styles, and categories. You can see here in the graph view, the relationships are really straightforward. For instance, a beer is brewed by a brewery, it has a brewery. A beer also has a category and a style. Along with all this, we've got a number of properties to describe these different elements.
For instance, a beer has a name description, has a strength. A brewery similarly has an address, a name, a city or a country. Style and category, both have names. Before we jump into querying, I just want to quickly show you some exploration of our data set. We've loaded this Aerospike graph instance with a little bit of sample data, and we can use G.V() to visualize that data interactively. What I'm going to do here is I'm just going to open our graph data explorer. Now, the Graph Data explorer is actually just going to allow users to visualize their data right away without having to write a Gremlin query. You can see here it's loading some of the data that's available on the database, and we can visually see how the data relates to one another.
For instance, here in red, are breweries and in orange are beers. If I zoom in, for instance, on a specific brewery, which I'll pick this one right here, we can see that this brewery here, FXMAT Brewing is brewing a number of beers. We've got quite a few there. Similarly, if we look at, for instance, the German Lager category of beer, we can see here that we've got a few that are available with that category or also that this specific one has the traditional German style box style, which I don't know what does, but I'm sure exists. Now with this Graph Explorer view, we can also load some data interactively. We're just going to quickly do a little check here. We can load some data, load some relationships, and see how your data correlates.
Now with all that being said, at this stage of the process, we have a sample data set and we have a data model that is available again right here. Why don't we just try and write some queries for following on the steps that Ishaan outlined earlier. One of the things that Ishaan mentioned there was to try and write down your queries, not in the Gremlin language, but rather just verbally. What is your query trying to achieve? What we can do in G.V(), which is quite cool, is there's an open AI integration that allows you to generate those Gremlin queries from those text prompts. The idea is that to start our design process with our queries, rather than starting from scratch writing a query from 0.0, what we can do is we can get a little hand from ChatGPT to start generating our query, test it out, and see how it works. What I'm going to do is I'm just going to run a little ChatGPT prompt there on G.V() It's going to generate query from the prompt based on the data model of the database, and we'll see how that works out.
Let me just enter this just now. What we're going to say is, our beers have styles and categories, but what if I want to find out all the beers and the breweries that make beers of a specific style? For instance, there's a style called the fruit beer, which as the name suggests is a style of beer that contains fruit. Now if I want to find all the breweries that make fruit beers, I can type a little prompt here and see what sort of query we get out of it. Let me just type this out real quick. I'm going to go for all beers that sell fruit beer, find the breweries that make them but I'm just going to go ahead and generate the query right here. I'm going to give a second for ChatGPT to think about, and there we go. As soon as it's done, it's going to give us a low query there.
I'm just going to jump straight in and run it to see what happens. There we go. We found a number of breweries here. Now, by default it's going to our graph view, but I think in this instance what makes more sense is possibly to go to this Vertex view here, which is just going to show all of these breweries in a really simple to understand table format. For instance, if I look at it right here, those are the breweries in our data sets that create fruit beers. If I scroll a little bit, we can get the name of the breweries. You can also see the descriptions right there, and you can see the websites. Really quick, really easy. We've just got our data set, we've got our data model, we've run a little text prompt, and what we've got out of it is a Gremlin query that works.
There's a number of different outputs that we can use here, but I believe in this instance, the Vertex view will be the most useful. However, if we want to slightly modify our query and say just get the names of the breweries rather than the full graph visualization, we can do that. We can just modify the queries and make sure that it just returns the name property. I'm just going to quickly change my query here, just like that. I'm going to rerun it and there we go. Those are the actual names of our breweries right there.
Now, why don't we just go ahead and give a try to another text prompt and then see if we can better understand our query, run through it and debug it. But you can see how G.V() helps you not only design your queries, but also just troubleshoot them and make sure that you understand how they run. What I'm going to do is I'm just going to type out another prompt here. Similar style. We're just looking to leverage the relationships in the graphs to produce a query that makes good traversing of the data. I'm going to say, let's see, lists all the styles of beers that the Thirsty Dog Brewing brewery brews. Sorry, it's a bit of a tongue twister. This one. Let's give it a try and let's see what happens here.
Again, we're going to give it a second for the query to generate. There we go. That's our query generated. We're going to try and run it again, see what happens. We've got nothing. Now, this is interesting, right, because in our dataset, surely all of our breweries are making beer. We don't have any breweries that don't make anything. But what we're going to do is we're going to use G.V()'s query debugging features to try and better understand if something went wrong here and if we've got the right query. G.V() allows you to run through your query step by step and this is really useful because if you write a query and you're not sure if perhaps something has gone wrong because you're not getting the result you're expecting, you can use this query debugging tool to just step through the query.
What we've got here is the query broken down into steps. I'm just going to take it step by step quite simply. The first step is going to be G.V()haslabelbrewery, and sure enough, if you're not familiar with the Gremlin query language, it's a very basic query that just says find me all the vertices with the brewery label. What we've got here is a low graph visualization that represents just that. We've got a total of 1 3, 2 9, 1,329 breweries in our data set. Now at this point, there's no surprise. This is just working. The next step is to then filter out these breweries to just have the Thirsty Dog Brewing company. If you go and click on that step, you'll see, once again, no surprise, we've got our Thirsty Dog Brewing Vertex, returned by the query. Our next step is going to be to navigate the Hans Brewery relationship.
I'm just going to click here, and then there's the problem. Our relationship here is not leading to our actual beers. You see here, it's navigating out from the brewery to the beers. Now, there's a few ways we can troubleshoot this, but first of all, let's go back to our data model. If we go back to our data model, we can see that the relationship between beer and brewery is Hans Brewery, but we can see also that the beer goes out to the brewery, not the other way around. If we go back to our query, and we can actually see this in G.V() quite easily, if I just trigger an auto complete here, you'll see that it suggests the label, but it's not highlighted so whenever GWV makes a suggestion, it's going to be made in such a way that if it makes sense within the context of the data, if the relationship exists, then you'll see here that the suggestion is going to be highlighted further.
For instance here you can see that the Hans Brewery relationship between our brewery and beer does exist, and the problem was that it was an in relationship rather than out relationship. What we can do now is just resume our debugging process. The last time we got this far, now we're going to try and go back to the step that we've just modified to see how much further we can get. There we go. Now we've got some beer. At this point of the query, we've successfully found all of the beers made by the Thirsty Dog Brewing company, and the next step is that we want to find the styles of these actual beers. So we're just going to go over to the next step here and we'll see once again, play issue. We're not seeing the data that we want.
We're just going to speed this up a little bit, but if you go again here, you'll see that the relationship is once again in the wrong order. We're just going to quickly fix that and confirm. As you can see that the beer at this stage of the query has three different relationships are all available. Has brewery, has category, has style, and the issue once again was the direction of the Vertex. One quick mention here as well by the way, is that when you're writing your query, you can easily see the data model on the right-hand side, and it just helps confirming what you've written matches what is actually in your data model. But let's just go back to our debugging once more. We're now up to this stage here and we'll see that we've got 14 different styles of beers that are returned by this query. What that means is, going back to this one, we've got 14 beers, I believe, 17 apologies, 17 beers made by this brewery of 14 different styles.
We're just going to continue stepping through our query because it's going smoothly now, and we're starting to get the results that we're expecting. The next step is to get our style name from the Style Vertex. We'll see here that we've got our list. Now, we will notice quickly there that we've got some duplicates, and this is where this little step here is going to come in handy and make sure that we've got a unique lead lists of beer styles that are returned, which you can see right there. There we go. This is a query that's just been generated initially by ChatGPT based on the data model that G.V() understands from your Aerospike graph data. Then with G.V()'s tools, if the query does not meet our criteria, which in this case it didn't because it didn't actually return the data that we wanted, we can easily debug it and find out what went wrong.
With all that mentioned, I think what we can do now is probably get back to some of the other features that G.V() does. I briefly mentioned that the ID allows you to view the documentation so you can see here that as I'm stepping through the query, I can also see the Gremlin documentation available on the right-hand side. And it's a great way of just confirming that we understand as we're writing the query, how the language works and what we're to expect from the language. There you go. Once again there's a description of our Vertex Steps steps. All this really simplifies the process of designing your queries.
With all that demonstration done, I think what we can do is wrap up and see if anyone's got any questions about all of this. Thanks for joining on the demo.
Ishaan Biswas:
That was super helpful, Arthur. One thing I've learned is you have a potential career as running a bar in the future. You can stock the required beers for that season.
Arthur Bigeard:
I should maybe make a data set for gin distilleries. There's quite a lot in Scotland where I'm based, but there's also a fair amount of beers. I maybe should revise this data set. I don't think it's representing my hometown too well.
Ishaan Biswas:
So this G.V() thing doesn't go too well you know what your next step is. One question here is, in this data set, how do I know how many breweries I have, how many styles I have, and so on. Is there an easy way to find that?
Arthur Bigeard:
That's a good question. The cool thing about Aerospike Graph is that it actually provides something called a graph summary API. The thing with graph databases, they're quite often unconstrained, and that means that users can insert any data. But the disadvantage of that is it makes it difficult sometimes to track the data model. And what Aerospike Graph does is it gives us this graph summary API that actually keeps track of the data in the database and indicates quite easily what is available in it.
You can see here, for instance, a raw output of the graph summary as displayed by Aerospike Graph, and it will tell us a number of useful information. Now of course, in the context of G.V(), because we integrate directly with Aerospike Graph, you can actually see all this visually from the data model editor. What you're seeing here on the screen is leveraged directly from the Aerospike Graph summary API, and that allows to load the model for your database instantly without having to actually discover the data. For instance, in this example, you can see that there's a total of a bit over 5,000 beers in our data set to be over 1,000 breweries, and you can also see the counts of relationships. For instance, we got over 4,000 pastile relationships.
Ishaan Biswas:
Cool. The next question I see here is how do I get started with Aerospike Graph and G.V()?
Arthur Bigeard:
That's a good point. First of all, we're partnering. G.V() And Aerospike Graph are in a partnership, and the goal of this partnership is to ensure that Aerospike Graph users get the best value of the database. And to do that, what we're offering along with Aerospike Graph's own 60-day trial is a 60-day trial for free of G.V() as well. That means that if you're wanting to start with graph databases for free and get plenty of time to play with it with the right tooling, you can sign up for both a free trial of Aerospike Graph for 60 days and a free trial of G.V() for 60 days. You can download G.V() from the G.V() website, which is GV.com, and you can get your Aerospike graph trial from the Aerospike website.
Ishaan Biswas:
Cool, thanks, Arthur. The next question is, are there other tools that Aerospike Graph integrates with? I think this is in the context of visualization. I can take this question. There are a lot of open source libraries available out there, which you can easily integrate with Aerospike Graph, given that we integrate with the Gremlin query language to visualize your graphs. That part's not very hard, but as we've seen, the G.V() is a much more advanced and sophisticated tool in that you can do many parts of your application development process using G.V() There are certainly other tools that you can also use with different levels of functionality that might be, depending on the needs of your application and your development environment.
Arthur Bigeard:
Not to ring our own bell here at G.V(), but we like to think that G.V() is the most feature rich Gremlin ID that you'll see out there. If you really want to get the best value off your Aerospike Graph database, look no further than G.V() because it will offer the most features you can get out of any of these tools available out there.
Ishaan Biswas:
Similar, I'll turn back to you, Arthur for the next question. What are the most common use cases you see people using G.V() for?
Arthur Bigeard:
This is an interesting one because different customers have different problems. You have some companies, for instance, that are just started with graph databases. Then you have other companies that have been doing graph databases for a very long time. And where I find it really interesting is these companies that are trying to get started with graph. For instance, I've got a customer that has used G.V() to do the entire end-to-end of their graph project. They've gone from ideating their graph database all the way to presenting it to stakeholders, creating their queries and creating their production environments around it.
Typically my favorite situation is where I've had customers that tell me, Hey, we've got our graph database. We've started using it with G.V(). We've designed our data model using G.V(). We started playing with our graph on G.V(), visualizing it. We then use all of that information, use all these cool visualizations that you see there to present it.
For instance, our external stakeholders. Folks are not necessarily technical that might not understand the concept of graph data so easily and they've been able to use G.V() to just represent quite visually what a graph is. That's the great thing is you've got it right there under your eyes. This is what a graph is so when you're trying to present graph data, for instance, to a non-technical audience it becomes a great presentation tool to use. But on top of all that, engineers get to design their entire queries through G.V(). They can also optimize them with our profiling and debugging functionality so they've got a lot that they can do here.
Ishaan Biswas:
Thanks Arthur. I mean, just to add to that, we definitely see that's the main reason we've partnered, that this partnership exists between Aerospike and G.V() is because we think G.V() has very nice complimentary capabilities that help Aerospike graph and likewise. Aerospike Graph provides really high performance graph queries for large datasets. But to get to that point, you have to first build your application and do demos to people in your organization and that's where G.V() comes in to kind of kickstart your application development process.
Arthur Bigeard:
That's right, and this is just a simple partnership of Aerospike providing the database and G.V() providing the ID, so you get that full developer experience where you get the database and the ID to develop with it.
Ishaan Biswas:
I think that's all the time we have for questions. That was from Data to Insights: Visualizing Aerospike Graph with G.V(). We hope you enjoyed this session and we look forward to the next one. Thanks
Arthur Bigeard:
Thanks everyone.
About this webinar
This webinar explores how our high-performance graph database – Aerospike Graph – works with G.V(), an all-in-one Gremlin Integrated Development Environment (IDE). Experts Ishaan Biswas, Director of Product Management at Aerospike, and Arthur Bigeard, founder of gdotv, Ltd., share how this powerful combination of tools empowers organizations to meet today’s demands for real-time insights with seamless scalability.In this webinar, you will learn:
How to harness the full potential of Aerospike Graph, with best practices for graph data modeling and optimizing gremlin queries.
Setup, data exploration, Gremlin query debugging, and powerful data visualization using G.V().
Why G.V() is an ideal tool for developers and architects to get started with Aerospike Graph and unlock the potential of their enterprise-scale datasets.
Whether you’re a developer, architect, or data professional, this webinar will equip you with the knowledge to leverage these tools effectively for your data-driven applications. Watch the webinar on-demand now.