Enabling Real-time Personalization via Machine Learning
About Sony Interactive Entertainment
Sony Interactive Entertainment is a multinational video game and digital entertainment company owned by Sony Group Corporation with headquarters in the U.S. and Japan. PlayStation is Sony’s popular video gaming brand consisting of game consoles, handhelds, a media center, smartphone, online services, and magazines. The company has sold over 545 million Playstation consoles globally. Its online service, the PlayStation Network, includes a virtual market to purchase and download games and multimedia, a subscription-based online service called PlayStation Plus, and a social gaming networking service called PlayStation Home.
Personalized decisions for 100s of millions of users
In 2016, after massive success with PlayStation 4, Sony decided to become a data-driven company. With huge amounts of data gathered from 103 million active users, 38.8 million Playstation Plus subscribers, and 5 million virtual headsets, Sony was well positioned to create a machine learning platform that their development teams could use to make models for personalization, enterprise reporting for business decisions, fraud detection, and more.
The challenge, however, was making the data accessible to the teams and data scientists. Sony needed a solution to handle hundreds of millions of users and several terabytes of data in a useful location, determine how to use the data to better understand their users, and then make decisions to personalize the customer experience. The platform needed to provide:
High availability, low latencies at scale
Needed to reliably handle hundreds of microservices, delivering millions of requests, more than 100 billion data events per day, throughout many database clusters and multiple regions.
Data integration and accessibility
Needed to bring together multiple data islands, formats, and create a common data dictionary that all teams could use to implement sophisticated use cases, like machine learning models.
Reasonable total cost of ownership
Wanted to avoid the expenses associated with vertical scale and be able to plan grow while managing costs.
A backend database for runtime decisions around 100M+ active users
Sony created a data ocean with federated data ownership for their internal teams, allowing each team that created or sourced the data to own and store that data as a data lake within their own cloud account. A centralized catalog allowed any team or data scientist to access the data to create machine learning models or reports to drive business decisions.
Benefits included:
Personalization
Rapid customer identification and authentication with their behaviors and preferences, then customize the user experience in a high-performance environment.
Fraud prevention
Avoid fraudulent transactions across platforms and during surges in real time.
Avoid fraudulent transactions across platforms and during surges in real time
Subsidize play-for-free games with in-app advertisements before transforming to pay-to-play.
Engaging Social feeds
Communicating in-game via chat, messaging and voice. Find friends in-game or via connected social applications. Follows, comments, ratings.
Achieving real-time personalization with a machine learning feature store using Aerospike
With Aerospike, Sony built a lightweight machine learning platform that allowed machine learning engineers to create, deploy and run their models, as well as manage workflows.
8 million TPS
Enabled more than eight million executions per second across database environments, which includes RDBMS and NoSQL systems.
Automatic sharding
Made sharding operationally less painful.
Low TCO
Small cluster was able to handle several terabytes of data.
Low latency
Under 10 milliseconds.
Slash data load times
12x reduction in re-indexing time while providing reliable access to fresh data.
Large scale
Handled 100B+ data events/day and 5TB+ data storage.
Testimonials
Additional resources
For a deeper understanding and more insights, explore these additional resources.
Get started with Aerospike
For high-performance, scalable data management and ultra-low latency, ideal for handling massive datasets and real-time applications.