We are excited to be a part of AWS re:Invent 2024. Visit us at booth #1844 in Las Vegas.More info
Blog

A new data-based approach to fraud prevention

Learn how Aerospike's real-time platform uses graph technology and AI to help federal agencies detect and prevent fraud, protecting taxpayer dollars.

Cuong Nguyen
Cuong Nguyen
Vice President Public Sector
October 20, 2024|6 min read

Few imperatives rank higher in importance and apply to as many federal programs as fraud prevention. Yet, in too many instances, agencies remain woefully behind when it comes to the technology investments needed to deal with fraud. Too often, the result is an inability to detect and prevent fraud – and, for that matter, other forms of improper payments – until after the fact.

Consequences of outdated fraud prevention

A case in point: Small business loans made under the Payroll Protection Plan, enacted during the opening months of the COVID-19 pandemic. Years later, it remains on the Government Accountability Office’s list of high-risk programs thanks to hundreds of millions of fraudulently obtained benefits. Many payments went to overseas-based hackers.

Social Security, Medicare and Medicaid, and unemployment insurance (administered through state-run programs) all remain susceptible to fraud. In fiscal 2023, such programs made an estimated $160 billion worth of improper payments—roughly the combined annual operating budgets of the Health and Human Services and Labor Departments.

Fraud prevention efforts

Agencies’ staff are aware of the improper payment problem and acknowledge it. Nobody wants it. Moreover, many agencies have fraud prevention programs powered by data and data analytics. 

But in the meantime, and for various reasons, the level of federal improper payments continues year after year. Besides the obvious forms of improper payments, agencies leak money through credit card transactions and industrial subsidies under various programs. Report after report and memo after memo from the Government Accountability Office and the Office of Management and Budget have exhorted agencies to knuckle down on fraud and improper payments.

To be sure, agencies take action when they discover fraud. The Justice Department regularly issues press releases about discoveries and arrests of perpetrators. The Pandemic Response Accountability Committee (PRAC) – consisting of several agencies’ inspectors general – in its most recent report, said it had launched more than 718 legal actions against parties responsible for some $800 million in fraud.

But discover, prosecute, and clawback may retrieve 10% of funds lost to fraud tied to a known perpetrator—a single digit or less of that total estimated fraud. How much better, then, to prevent fraud in the first place—or at least discover a larger number to pursue with prosecution and clawback?

The PRAC boasts a data analytics program with a risk-scoring model. The Centers for Medicare and Medicaid Services have extensive fraud prevention programs in place. Its Healthcare Fraud Prevention Partnership reported in its most recent biennial report to Congress that it recovered $11 million in mispayments and $19 in prevention. These are fine, but like the PRAC numbers, they’re a tiny fraction of estimated yearly losses of $100 billion.

Data analytics and artificial intelligence

Therefore, the panoply of federal fraud prevention programs is ripe for modernization. Activity is happening here and there. For example, a proposed bill in the Senate would mandate the Centers for Medicare and Medicaid Services embark on a two-year test to use artificial intelligence-based fraud detection and notification of people in whose names fraud is being committed. Both are techniques used by the credit card industry.

In a study earlier this year, researchers at Florida Atlantic University’s College of Engineering and Computer Science pointed out that too often, fraud detection amounts to “a limited number of auditors, or investigators, who are responsible for manually inspecting thousands of claims, but only have enough time to look for very specific patterns indicating suspicious behaviors. Moreover, there are not enough investigators to keep up with the various Medicare fraud schemes.”

The FAU researchers propose highly specific data sampling techniques that yield better fraud detection than random sampling. They used these techniques against program data produced by Medicare parts B and D.

Graphing technology for fraud detection

But that’s only one approach to big data analytics. A silo of data connected to a single program is a great resource. But people receiving federal benefits and those who try to defraud federal programs produce other data about themselves, either in other government programs, such as tax collection, private-sector activities, or both. 

Combining multiple big data sets into a data lake can yield more powerful results in real time, ahead of fraud occurrence. The key lies in using the data to map the network of people and entities the data elements represent, and the relationships between and among them. This is where graphing database technology comes in.

Graphing technology represents data elements as nodes and the relationships between them as edges. By mapping relationships and dependencies, a graphing database can pinpoint phenomena such as whether someone uses another person’s identity, who might be collaborating with whom, or individuals who might be using multiple identities.

Moreover, graphing technology's design means it can operate over extremely large data sets consisting of both structured and unstructured data without using NoSQL. Graphing techniques differ fundamentally from relational techniques, in which queries only confirm that two data elements have been joined.

Aerospike’s role in fraud prevention

Besides graphing, agencies must deal with sheer scale. Undergirding Aerospike’s graphing capabilities is a database technology that’s so powerful, scalable, and fast that it operates essentially in real time. This means agencies can detect and stop fraud as or before it happens, rather than combing through program data after 90% or more of the fraud is beyond reach.

The importance of speed in fraud prevention

The need for speed is crucial. Under standard computing setups, the greater the data, the slower the processing. This is all the more so as agencies apply artificial intelligence algorithms and large language models to their anti-fraud analytics.

The Aerospike data system can ingest large-scale data from multiple sources, all at sub-millisecond speeds. This includes streaming data, drawing down data lakes, social media, and sensor data such as from satellites.

Scalability and flexibility of Aerospike

The highly flexible database engine operates on-premises, in the cloud, or as a managed online service. Architected to run entirely in random-access memory if the user chooses, Aerospike technology operates as much as 300% faster than other available databases and with an uptime of 99.999%. Because Aerospike can scale to petabytes of data without a slowdown in processing, it helps agencies avoid server sprawl or undue cloud computing costs.

Federal use cases have shown that Aerospike reduces fraud exposure to a small fraction of what agencies otherwise face. It has also reduced false positives by enabling ten times more attributes in identity calculations without increasing processing times.

With Aerospike, federal agencies face the challenge of obtaining or sharing data with other agencies and commercial entities, so that they respect statutory and regulatory protections on privacy and stewardship. With anti-fraud efforts relying on Aerospike, limitations on speed and scale disappear. The potential for real-time fraud detection and prevention rises exponentially. Agencies solving the data-sharing challenge will have a nearly frictionless pathway to their real objectives of program integrity and safeguarding taxpayer dollars.

Free trial

Break through barriers with the lightning-fast, scalable, yet affordable Aerospike distributed NoSQL database. With this fully managed DBaaS, you can go from start to scale in minutes.