Cracking the code: Defeat signal loss in AdTech with graph databases
Please accept marketing cookies to view this content.
George Demarest:
Welcome to the webinar, Cracking the Code: Graph Databases in the Age of Signal. We're going to be talking about graph databases and we're going to be talking about AdTech. And for that panel discussion, we have two experts with us from Aerospike. First is Daniel Landman, global director of AdTech at Aerospike. Welcome, Daniel.
Daniel Landsman:
Thanks, George. Great to be here. Looking forward to the conversation.
George Demarest:
And we have Ishaan Biswas, director of product management for Aerospike Graph. Ishaan is one of those responsible to bringing Aerospike Graph to market, working on the product, working on go-to market. So welcome, Ishaan.
Ishaan Biswas:
Hey, George. Hey, Daniel. Nice to be meet you guys here.
George Demarest:
So Cracking the Code, it's a good title because there's a lot of code and a lot of acronyms and whatnot in the AdTech space and we're happy to have Daniel with us here to crack those codes and to talk about the topic. So first, Daniel, why are identity graphs important in marketing and advertising in AdTech?
Daniel Landsman:
Yeah, I think it's important to give a little context as to how we got here and what I've been a part of in the industry up to this point to give everyone a little more historical context. So I've been in the industry for over a decade now, helped grow and scale a couple mobile programmatic exchanges. I've been with Aerserv for around two years or so and have really seen the evolution of the marketplace. I think it's definitely something that's of focus right now for many different AdTech providers due to the fact that there's signal loss. Actually read a statistic the other day that said 47% of the open internet is already unaddressable because of things like Safari deprecating cookies and Apple deprecating device IDs. With Google coming forward and saying that they're going to deprecate cookies, it's going to make for some large changes in the marketplace.
It's really important to actually maintain the capability to be able to target users in the open internet. I actually did a webinar a little while back with a partner of ours and we discussed my personal approach to taking care of that. I think there's going to be a link to that a little bit later on, but before we get too deep into it, I want to have Ishaan come in here and talk a little more about the technology and the graph data model that we believe is actually going to be able to really help everybody continue to address audiences at scale for the open internet. Ishaan?
Ishaan Biswas:
Yeah, thanks Daniel. What Daniel just said, what's happening is the predictable signals of identifying users is going away or is decreasing a lot. So if more and more companies are starting to look at probabilistic models than more deterministic ones that you have cookies. So with that, what you can use a graph data model for is you can get a holistic customer view. So you can consolidate views from different touch points that you have with your customers. You can do cross device tracking or consolidate individuals from multiple signals you're seeing in the open internet. And this 360 degree view allows marketers to understand the customer behavior more comprehensively, allowing them to tailor more personalized ad and effective campaigns. The other thing you can do is, I can mention briefly, is you can use these graphs to do cross device tracking, even in the absence of cookies and deterministic signals. So you get a much more holistic view.
On the demand side, then you can do better attribution modeling and ad campaign measurement to optimize your marketing spend. So all across the board in the AdTech and martech space graphs really allow you to very simply model your customer behavior and customer journeys and optimize your ad spend and provide personalized advertising. So to summarize all of this, people have started to look at graph database technologies that can process enormous amounts of data and get access to that in milliseconds and do that with predictable latencies. That's what people really want.
George Demarest:
Thanks, Ishaan. I want to move to a second question. We have been talking about cookie deprecation and coming cookie apocalypse and all that stuff. We've heard it for a few years now. Daniel, can you put into a perspective what does it mean for the leading AdTech companies, practically speaking?
Daniel Landsman:
Yeah, I think it goes back to some of the things that Ishaan touched on. You have to focus on ROAS. I think that there's media mix modeling that's coming into the conversation a lot more. For those on the call who don't know what ROAS is, it's return on advertiser spend to metric that you could track in order to understand how the dollars that you're spending is helping you grow top line. I think it's something else to touch on as well, as Ishaan mentioned, attribution and a lot of the times people have been using last click attribution and working with multitouch attribution, now with signal loss is becoming ever more challenging. As Ishaan alluded to, the ability to leverage probabilistic models to understand the customer journey through their user flow and see how they're interacting with the ad creatives and the units that you are pushing out into the ecosystem dependent on channel, is going to be really important.
And I think the companies who are most effective are going to be able to holistically look at that customer journey, pulling signals from everywhere and anywhere that they possibly can and what I like to call the layer cake approach, leaning into first party data and second party data and third party data, wherever you can get it as signal loss continues, then moving down the chain into things like probabilistic and attention. For the buy side explicitly, you have the ability to go wider, but with the attribution of these media impressions, it's important to continue to focus on the deterministic data.
The farther you get away from the deterministic data on the attribution front, the more speculative that measurement becomes. And doing things like panels or brand lift surveys I think can help fill those gaps. But either way, you have to continue to look at the maximum amount of data possible, and I personally believe if you do it in a shorter period of time, you can make more effective decisions and through making more effective decisions, you can change direction or optimize creatives faster at scale. So hope that helps clarify, George.
George Demarest:
Yeah, so I guess the obvious question for Ishaan then is how do graph databases in particular help with that?
Ishaan Biswas:
We hear this question very often from customers. They're trying to figure out this how to solve this business problem and how do you use graph databases for it. I first touch upon what you should look for in a graph database and then later we'll cover what specific steps you can take. The first thing is this is more simple to get out there, is you need a graph database that can help you do data modeling. So how do you model users, devices, cookies as vertices and edges, and the property graph data model is really good at that. The good thing is most graph databases out there in the market can do this. So this is well established technology, so that's the easy part.
The second thing is whatever technology or database you use, it should support a model native query language, which specifically means a graph query language like Gremlin or Cipher or any of the other graph languages because these languages are designed to help you write queries that help you traverse over the graph, so it makes it easier to express relationships and traverse over the graph and retrieve your data efficiently. So again, these two are the easy parts.
Now where it starts becoming harder is how do you scale this. Scalability is critical for identity graphs as you're continuously growing data with new user interactions and other touch points that you're getting with your users. Your chosen graph database should provide horizontal scalability to accommodate increasing data volumes, both for data volumes and for query throughput while maintaining low latency. And you should be able to do this with pretty much real-time performance. So in milliseconds, single-digit or lower double-digit milliseconds, you should be able to get any data you want in your graph.
So there are several other factors to consider, but those are the few things, just ease of use and a model native or a graph native query language and the ability to express your graph and model your graph, model your data as a graph and just really high performance characteristics. All that said, you can't break the bank if you're trying to build a solution and it's really expensive, that wouldn't work for you either. So it all has to be done within a cost-effective manner as well.
George Demarest:
So I think we're actually able to demonstrate, we've been able to benchmark the Aerospike Graph. It was announced and released in June and we have done some internal benchmarking, but how else can we demonstrate our capabilities, Ishaan?
Ishaan Biswas:
Yeah, I actually have a demo here and I'll ask my friend Daniel to help me with validating what we've built here. Let me quickly bring up my screen.
Daniel Landsman:
So Ishaan, is what you're going to show us actually representative of the real world?
Ishaan Biswas:
I'm going to ask you that as a question so you tell me. This is what I've learned from talking to our customers, but what I've done here is built a dataset that models an identity graph. You have a person and there are multiple attributes of that person that have gotten from multiple different data sources. So like you talk to multiple customers as well and you're more in tune with the AdTech space, you tell me, does this look like what you see most people do? So what I have here is a person, and just for fun, I've modeled what dessert this person likes, what vehicle this person drives, what devices this person has, also what household this person is a part of. Do you think this kind of data modeling is helpful in the AdTech space?
Daniel Landsman:
Yeah, absolutely. From my experience throughout years being in the industry, when you're stitching data together, it's important to understand the psychographic and demographic data of said individual that you're targeting for advertising purposes. And the reason for that is because you can do different things like household targeting or dynamic creative optimization, for instance, to better elicit responses from them. It's important to pull all the information that you can together to round out the picture of that individual, because if you're targeting soda drinkers for the sake of discussion and that information isn't appended to the specific identifier that one has, or a set of identifiers that one has for a set individual, it makes it challenging to include them in the audience segment. So it's really important to try to get, in my opinion, as much data as you can within the realms of possibility to make the most effective decisions at scale for audience targeting.
George Demarest:
I want to talk about scale for a moment. I'm staggered by some of the customers that we have in the space. One of our AdTech customers mentioned that they do more database transactions in a day than NASDAQ, New York Stock Exchange and Visa does in a month, just staggering numbers and very high requirements for availability and latency. Daniel, any favorite stories that you can think of, of one of our customers?
Daniel Landsman:
I have lots of favorites. For this conversation in particular, it's probably best to focus more on graph, but to touch on some of the clients that we do have, we see tens of millions of queries a second at petabytes of scale and hundreds of terabytes at the edge. So I think that there's a lot of clients that have been with us for a very long time that have grown their business with us. We've had clients that haven't gone down for over a decade, but this is where we came from and this graph that we're talking about is where we are now and I think where the industry is going. So Ishaan, maybe we can focus in on the graph portion of the conversation.
George Demarest:
Sure, yeah. So that raises the question, we have been in this space for a while and been using our NoSQL database and so now why is Graph better than other models for tackling this?
Ishaan Biswas:
I actually did a webinar recently with our chief developer advocate at Aerospike, Tim Fox, which goes into a lot more detail here as to we talk about a specific application where we're building an application using a graph versus key value. So I highly recommend people check that out. But to put it simply, just this example here, I have a person and I've accumulated a lot of information about this person from multiple different sources. And it's really easy to traverse this graph. I can start from a device, for instance, and I can write a very straightforward query that finds this device vertex goes to this person and let's say gives me some demographic information about that person. But I can also then go and find out information about what vehicle this person drives or what beer this person likes for that matter. So it becomes really easy and we'll see some examples later to see exactly how you do it.
In contrast, you do this on key value, you can still do it, but the challenge is we are writing a whole lot of code to do something that we've already done and we've optimized for. So the benefit is you can collaborate more closely with your developers as a business stakeholder and your developers understand your business process as opposed to understanding and spending all their time in figuring out how to model this as a key value. And even once they do it, it's usually really inflexible as you want to do different kinds of queries later on. So as you get more data sources, you have to stick to this data model that you've built for a specific purpose. So the inflexibility really kills most of these solutions over time, even if it can scale. What we hear very often is if you can do this simply for us and maintain the scale and performance that Aerospike guarantees, it's a no-brainer for people to adopt a graph data model.
George Demarest:
The queries that you use, are they difficult? If someone is used to another query language for example, what are they going to encounter?
Ishaan Biswas:
Yeah, that's a great question. So let's do a little bit of a workshop here real quick. Daniel, you see a lot of specific use cases in AdTech and what people want to do. Why don't you describe a couple of these use cases and I'll see if I can figure out a queries real quick and show how to do this.
Daniel Landsman:
Yeah, sure. One of the large advertising segments is automotive. And let's say I'm a car company and I want to understand what's the best way to show a creative to someone.
Ishaan Biswas:
Okay, let's try that. So lost my share. Let me share again.
Okay, let me describe what I'm about to show you. I'm going to start with a device ID. So that's a signal I have that I get when I need to display this ad. What I'm going to do is start from a device ID and find this device ID in my graph. So this graph is enormous. This just shows a very small subsection of the graph, but this has thousands of people with hundreds of edges between them connecting different components or information about the person. I'm just going to zoom into one of them. So what I'm going to do is start from this device, traverse over to this person and then find some information about this person, let's say a vehicle model. This person looks like he drives SUV and an Audi SUV, but irrespective of the brand, we are going to find what kind of vehicle this person drives today and some preferences around them and then provide that information back to you.
So then you can personalize the ad. And in the age of generative AI, you can easily create a personalized ad based on some facts about the person. So I built this query previously, but I'm going to walk through it step by step. I'm using this a short plug for our partner here. There's a tool I'm using called G.V(). It's a gremlin IDE. Gremlin is the language we support in Aerospike Graph today. So I'm going to find this person. So G.V(), and has device ID, that's the common way to express finding a vertex with a certain device ID. So let's run this part of the query first. I can see a visual view of this. So I have found this device ID. Now I want to traverse into the person. So remember the device ID is the edge vertex and that's connected to a person.
So I'm going to click on this and that's going to take me to the person vertex. From this person vertex, I'm going to find out just for fun what kind of dog this person has, what music this person likes, and what kind of car this person drives today. So I'm going to run this and I'll find all the information, this specific pieces of information. So looks like this person likes Billy Idol and flute, and has two cars. One is a minivan and the other one is a wagon. And also this looks like this person has a Pembroke puppy. His name is Marley.
Now it's really easy for me to get all of this information so I can run this query really quick and I can get all of this information programmatically. I'm just showing this step by step, but you can embed this in your application and you get this programmatically. You pick the information from here, throw it into a generative AI model, and you can very easily create a personalized ad in a matter of milliseconds or seconds when you're creating a creative. So that's just one example of what you can do with this.
Daniel Landsman:
Hey Ishaan, quick question. As you're going through this, I'm thinking how can you tell which car they are looking for? Is that possible? Or do you have to make the assumption that they're looking for a car?
Ishaan Biswas:
That just depends on the signal that you have. If that person is on your website looking for a car, that's a signal that you've got. And so now you can get all this information about this person. So you already know this person has a minivan and a wagon, and you know the fact that this person is on your website looking and researching about cars, so you can give them options and see what eventually leads to convert. So the message here is you can do a lot of fast iterations and eventually lead to a high conversion rate.
Daniel Landsman:
That's super helpful, thank you.
Ishaan Biswas:
So give me another example of someone who's not into cars. Give me another example and we can try that out.
Daniel Landsman:
Yeah, sure. It's starting to be fall and I think everybody enjoys a pumpkin spice latte every now and again. So maybe we could take a look at a household of individuals and see which ones drink pumpkin spice lattes.
Ishaan Biswas:
Okay, let's try that. One thing I wanted to show here in the previous example, I'm showing how long this query took. This took seven milliseconds to do this, to hop query. With Aerospike, what you can expect is you'll get this seven millisecond or even lower perhaps, no matter what your data size is. So whether you have one terabyte or you have a hundred terabyte of data, you'll get this kind of performance. Now that's really powerful because now you can glom a lot of data together about the person and get a lot of value out of it.
So let's talk about this example. What I'm going to do is go back to this instance or this identity graph of one person. Now I know that this person is associated to a household, so what I'll do is I start from a device ID, that's my signal to see who this person might be, go to this person, go to this household, and then see who are the other people in this household. Find out what coffee preferences they have. So I don't know this person has coffee, but looks like this person doesn't have a coffee vertex associated, but let's find another one that does.
So okay, it looks like this person likes coffee. This person has a few parameters for coffee, about coffee preferences. I'm going to run this query, and this is a little more complicated query, but I'll walk through it again. What I'll do is, again, I'll find this person using the device ID, go back to the person vertex using this instep, then go out that lives in edge so I can go to the household. From the household, I now have one household, I could have multiple people living in that household. So I'll traverse into that household and again, traverse into the person connected to that household and find out what coffee they like. And I'll run a filter in there saying, who likes pumpkin flavored coffee?
So let me run this query. And okay, so it looks like for this person, whose device ID is this, there were two other people in their household and it could be that one other person is that person himself or herself that likes coffee. So what you can do is based on signal, let's say it's a couple living in a household, based on that signal that you'll get from one person, you can target the other person and the other person says, "Hey, I got this really good offer from Starbucks and we can go get a pumpkin spice latte."
This person is a married female. And then actually, let's play around with and verify that this person actually likes pumpkin spice latte. So this person likes coffee, and sure enough they have a flavor preference of pumpkin, and I'm sure the other vertex found here, they also have the same preference. This is just a very simple fun example to show what you can do with this, but the possibilities are endless. The more data you have, if you can employ that data to get a lot of value from it, that's really the message. And again, even this one, you can do this in milliseconds really, irrespective of the size of data you have.
George Demarest:
Very cool. So could you take a moment, Ishaan, to tell us what this tool we're looking at, it's beautifully visual, and what it does and what it doesn't do?
Ishaan Biswas:
This is a tool built by one of our partners called G.V(). It's a first of a kind tool. It's a IDE for a graph query language, and it also does visualization. So it's really nice for experimentation, finding out graph your data model or iterating on your data model, but also iterating on your queries. I spent some time building that query, but it was really easy for me to do that because I could go step by step and analyze how I'm traversing the graph.
George Demarest:
Great. So I'm going to ask you in a moment, Ishaan, to tell people how to get started with the product, but I want to pivot back to Daniel for a moment. Daniel, in terms of where we think this technology is going to land in the AdTech space, in terms of the solutions, is it the supply side? Is it the data management platform? Where do you think we'll be seeing graph databases?
Daniel Landsman:
Yeah, in general, we see all of the value chain using it. So we work with everybody from the agency side all the way through the publisher side, which means DSPs, SSPs exchanges, DMPs, CDPs, attribution companies and the like. Every single one of those companies has a slightly different role to play, but understanding the end users and the audiences is something that I believe that they all want to do. So I think that the graph solution that we're putting forth has applicability not only in that whole value chain, but also more broadly in the general martech ecosystem too. So I think that there's a lot of opportunity here for companies that are seeking more clarity around their customer journey and getting a better understanding of how to optimize with this signal loss that's happening today. So hopefully that helps clarify George.
George Demarest:
Yes, and that brings us to, if people want to get started with this ... I actually saw you install this, Ishaan, it looks actually fairly straightforward. So can you go through the steps it would take for a customer to get started with Aerospike Graph?
Ishaan Biswas:
Yeah, we have a free trial and we'll share links out to that to get started with Aerospike Graph. But more broadly, once whether you're using Aerospike Graph or any other tool for that matter, what's really important for you as a player in the AdTech ecosystem is to understand what you need to do after that. And it's really a collaboration between your business stakeholders, your data architects, your developers, and across the organization to build a solution that works well for your organization.
From what we've iterated on with our customers, we've figured out there's a good five step process, not very different than the popular Elon Musk's five step process to build products and services, but essentially is you define the business process for whatever you want to achieve. In the examples that we just walked through, we talked about dynamic creative optimization and household group targeting. You define those business processes and you do that in collaboration between the technical and the business stakeholders in your organization, and you write out the queries that you want to do in plain language; don't worry about the underlying graph query language.
Then you have to understand what data you have access to, who are the data brokers that you're working with, what kind of data you have access to. You need to understand that. So once you have these two things established, what is it the final goal you're trying to achieve and what the business process you want to implement and what the data you have access to, you can go about creating a draft schema similar to the schema that I showed. But it's going to be different for different organizations, depending on the data they have access to.
Then you create a sample dataset and you load the data into the database and then you start querying the database. So you write these queries based on the description that you've written out in step one, and then you query the database. You have to iterate on this a few times because you're not going to get it right the first time. You have to iterate step one to step five a few times till you get the desired performance goals. And then once you're happy with what you've established, you automate and scale. So you then bring in all of your data that you have, embed these in your applications, these queries in your applications and scale and put it out there. So there are obviously a lot of nuances to each of these, but that's in general a good five step process you can follow to get started.
George Demarest:
Great. Well that brings us to the end of our prepared part of this webinar. Before we go into questions and answers, just want to reiterate, this has been Cracking the Code: Graph Databases and the Age of Signal, from your friends at Aerospike and it's been Daniel Landsman and Ishaan Biswas. Sorry, slaughtering your name. So that brings me to the questions. So the first is for Daniel. What is the level of urgency you're seeing to figure out this problem and counter signal loss, as you put it?
Daniel Landsman:
Yeah, I think that there's a couple ways to look at this The industry as a whole has been getting prepared for this for many years, but because this has been delayed from the Google side, there hasn't been as much urgency. I think finally you're starting to see publishers and the greater ecosystem become much more serious. There's been a lot of ID companies that have come out over the last couple of years and I think people are really starting to take this seriously and the level of urgency has absolutely ticked up markedly, I would say, over the last 18 months or so.
In terms of the implementation of a solution, I think it's more urgent than it ever has before because I've heard things that publishers have lost up to 40% of their revenue with some of the signal loss that's already occurred. And I've also heard that it's been a little more or a little less, depending upon where they sit in the ecosystem. So I really think that is of urgency and the time is now to act and make decisions about how to effectively move your business from where you are today to where you want to be over the next couple of years to position yourself [inaudible 00:33:30] in order to maintain your revenue with this signal loss.
George Demarest:
Great. Next one is for Ishaan, and you've already touched on this a little bit, but Ishaan, graph databases have been around for over a decade. You have Neptune and Neo4j and others, and you're a relatively new entrant into this space. So what is different about Aerospike Graph and what new kind of value proposition does it provide?
Ishaan Biswas:
Yeah, great question. Earlier we spoke about what you should look for in a graph database and databases like Neo4j and Neptune and several others do several of those characteristics. So they have a query language, a model native query language that you can use. They support the property graph data model and so on. What most of these fail to do is providing predictable low latencies at any data volume with massive query throughput, all three of these, all within an affordable price point. So maybe one of them can do one of those things better or two of those. But the whole package of doing, getting this really high performance, which means massive throughput, predictable low latency at any data volume with affordable price point, we haven't seen any of these databases be able to do that.
So make your choice depending on your use case, but what we've seen and what we've experienced is some of these databases are really good at certain things and traditionally, not to take any specific names, but some of these graph databases have been better at analytics and business intelligence use cases. But as your data volumes grow and your latencies tend to rise, people steer away from using them from online transactional processing or OLTP queries or use cases, so where you really have to have latencies that are predictable because you have to adhere to SLAs. And that's the paradigm that Aerospike Graph is introducing is no matter your data volume, you can get these predictable low latencies.
George Demarest:
Actually, Daniel touched several times on the fact that many more data sources, obviously much more data volume, so that does play to our strength. The next question, back to Daniel. We heard from Ishaan how the techie should prepare for Aerospike Graph. How does the industry, how are the customers themselves prepared for the changes that you think are going to be coming about?
Daniel Landsman:
Yeah, great question, George. I think that they have to get the data strategy in place first and foremost. You have to understand who you're going to partner with, where that data is coming from, then how you're going to process it and make decisions off of it. So at the end of the day, you have to, as I go back to understanding what first party data do you have or have access to? What second or third party data do you have access to? What probabilistic attention and contextual data do you have access to? And then figure out who you want to work with, how are you going to ingest that data? What's it going to look like? And then actually make a model and build apps on top of that for decisioning, attribution, audience building, or what have you.
George Demarest:
Great answer. Thank you. That kind of brings us to the end of the prepared questions. I want to just cover the additional resources you see on the screen. You can try Aerospike Graph for free for 60 days. I believe that does include G.V(), so that's a good way to get started. There's a nice longer demo that we've videoed, Ishaan, and Aerospike Graph is a good one. The webpage has a lot of stuff on it. We actually have a big new white paper coming this week where we're going to be publishing the panel discussion on Aerospike Graph from Aerospike Summit. And then some more thoughts from Daniel on identity resolution. Daniel has another blog, so good stuff all around. I would definitely recommend going to the Aerospike product page and checking out, there's a bunch of other stuff there. So any final thoughts? Ishaan, I'll start with Ishaan. What are your final thoughts? What should people expect?
Ishaan Biswas:
Oh, wow. Okay. I would say a lot of things that we saw on this that we said and that you saw might seem unbelievable to several people, because people just haven't seen the solution out there as much. So my call to action to anyone listening to this or everyone listening to this is try us out. Sign up for the 60-day trial, get access to Aerospike Graph and try it out and see for yourself.
George Demarest:
Great. And same question to you, Daniel. What do you think people should in the AdTech space take away from today's webinar?
Daniel Landsman:
I appreciate you asking. So I would say get the data that you can, do what with it you can, and continue to expand that data footprint as much as possible, as quickly as possible. And obviously I echo Ishaan's sentiments, but it's about being able to get the data and operationalize that data, which I'd argue Aerospike is the best at. So hopefully that helps clarify.
George Demarest:
Yes, very much. Well, I want to thank you both for participating. You guys have been watching Cracking the Code: Graph Databases in the Age of Signal, a webinar from Aerospike, and we're going to end it right there. Thanks very much for your time.
Daniel Landsman:
Thanks everybody.
Ishaan Biswas:
Thanks.
About this webinar
Explore the evolving landscape of identity resolution in advertising and marketing. Join us for an in-depth webinar as we:
Examine the impact of signal loss
Delve into real-world use cases
Unveil the transformative potential of Aerospike Graph in AdTech and MarTech solutions
Don’t miss this opportunity to stay at the forefront of AdTech. This is a panel discussion with Daniel Landsman, Aerospike’s Global Director of AdTech and Gaming, and Ishaan Biswas, its Director of Product Management, with moderation by George Demarest, Aerospike’s Director of Product Management.