November 19, 2013

eBay is the 20th most popular site in the world and with $175 billion in sales in 2012, one of the largest e-commerce companies in the world. Satish Katiyar, eBay Advertising CTO, and Prakash Chandra, head of Engineering at eBay Advertising, spoke at a recent Scale Warrior of Silicon Valley meetup about how the eBay Audience Platform uses the Aerospike database – to bring holiday cheer to Retailers – throughout the year!


Know when someone will start shopping

According to Satish, eBay presents over 500 million ads per day. Prakash explained that all advertising, on and off eBay, is powered by the eBay Audience Platform. eBay uses the vast amount of its user and purchase intent data, along with third-party data from publisher and advertiser partnerships to deliver the right message to the right person across the Internet. If people look for products on eBay and then leave the site, eBay is able to re-target and bring those shoppers back to the eBay site.

Big Data:

  • 100 million active users who have engaged several times a month

  • 1.5 – 1.8 billion searches per month from people looking to buy stuff, making eBay one of the largest search engines with very valuable purchase intent data

  • 2 Petabytes of behavioral data from eBay, PayPal, StubHub and GSI Commerce

“That’s huge,” said Prakash, “Because we can not only see what people are buying today, we can go back in history and look at what these people have been buying year after year, look for the products that they are buying on a yearly basis. For the products which have a longer lifecycle, you can look back and say, ‘ These people bought the phone two years ago this month,’ because people buy phones in two-year cycles. For automobiles, this may be a longer cycle. There are very, very few companies who can actually say that we can predict cycles of when people are going to look for their next car. We are one of the companies that can do that. I don’t think that there’s anybody else who can talk about doing some of these things.” According to Prakash, 10 million people buy birthday gifts on eBay each month, and eBay is actually able to predict which 10 million people will be shopping for birthday gifts before they actually start shopping!

Real-time Big Data in Aerospike:

  • 20 TB of first- and third-party data

  • 200 billion behavioral events per day

  • 30 billion Ad requests considered per day

“There are 30 billion plus ad requests per day happening,” said Prakash, “And we want to serve all those requests.. when they come, within milliseconds you want to identify that person, understand what they have done in the last ten years…and then you say, ‘What is the best ad to show them at this time?’”.

No matter how or where they start surfing

eBay aggregates user behavior on and off eBay, online and on mobile, and has developed an ID Mapping technology powered by Aerospike for a unified view of the user. “eBay is able to tell when someone browsed something on their phone and then later bought it on their iPad. It can tell when people make purchases and how many ads they have to see before they will make a purchase or what time of day they tend to buy.”

“This [ID Mapping] service allows us to map them back and forth. We are talking about 100 million active users who have approximately about two billion cookies, even in the last six to eight months that we have been collecting data because people actually remove these a lot. They are on multiple browsers. They are on multiple devices. At eBay we are able to map them back because they have logged in from each of these devices at least once. The rest of the advertising companies are selling cookies and they have no idea of whether the person on the Firefox is the same as Chrome, even on the same computer. Forget different computers. Forget different devices. We are not only able to find them on the same computer, but on different devices.”

According to Prakash, “[ID Mapping] is one thing that we are very proud of. eBay is in a unique position to solve this problem.. we are able to connect [users] through eBay login…PayPal users, StubHub users, we can bring in all of that information and connect it to people.”

Sub-millisecond Speed & 400K – 800K TPS

“The two problems that Aerospike helps us solve are speed and scale. We were looking for a technology, and we actually did prototypes with .. Redis, Riak and what we were looking for is low latency in the sense that you always have sub-millisecond lookups.”

“When we are talking about 30 billion requests a day… you have 50 milliseconds to respond to an ad, but in order to look up an ID you don’t have that much time – you want to build the profile, update your system, segment profiles, and have your algorithms determine what is the right amount of money that I want to spend at this time. So there is a lot more going on than just identification. Identification needs to happen in the sub-millisecond timeframe.”

“30 billion ad requests translates to something like 400K TPS, that’s the average number. During peak time this number may double. That’s when we benchmarked a lot of these technologies and Aerospike kicks butt here.”

Scalable Operations

“There are a lot of technologies that can do clusters. But how many of them actually scale horizontally? In order to get your 2X, how many more servers do I have to ad? Am I scaling horizontally, or is there an exponential curve? And we have made some of those simple mistakes in past, in the technology that we were working on. At a certain point if you think, ‘Oh, this is great. It’s horizontally scaling,’ there comes a point where it doesn’t work anymore. So far we have been actually very good with Aerospike.”

“Talking about the scale of operations ease, automatic fail-over. Systems will go down; when you’re talking about big numbers you want something that fails over automatically. If you want to add new servers, you don’t want to bring down everything. At eBay, to bring down everything before we upgrade…that’s just not good enough for us.”

Cross Data Center Replication

“One of the most important things that we have done with so many other technologies is replication across data centers. When you are serving ads with those tight SLAs, you have servers across different time zones, and the same person comes to you from two different ones, then you want to understand if I have shown an ad to this person or not. We work with some advertisers who have very tight frequency counts, so the publisher may be working with us and they may route the request with some other network. One request may go to the East Coast, one request may go to the West Coast. When you are working under tight frequency capping you want to make sure that you have not shown an ad to this person before Therefore, you want to have this replication across datacenters in real-time.

So far Aerospike has been amazing for us. You can talk about strengths, we have measured this over and over again, 90 percent comes in less than two milliseconds. It scales immediately, rebalancing happens automatically. It doesn’t require as much of a hand-holding for sharding. You can add new nodes and it will automatically balance. You can update versions.

Of course there is room to grow and we are talking to Aerospike about that… But more important than anything else .. one thing that we have always been impressed with Aerospike is the knowledge that the team has on everything that we talk about. I hear this from my team every day. These people know what they’re talking about. They’re really helpful, and that’s the kind of company that we want to work with.

