Detecting fraud in healthcare invoices
The Client
The client is a worldwide data vendor to the banking industry.
The Challenge
To detect fraud in healthcare invoices.
The Constraints
A Swiss bank dealing with fraud in healthcare meant that this was a very sensitive project to work on in more ways than one.
A considerable level of care had to be given to the management and protection of the bank’s data due to the sensitive nature of the business. Only once the data had been annonymised, obfuscated (all column headings removed) and encrypted did the client consent for us to have access.
The outcomes (i.e. which records related to fraud and which didn’t) were completely withheld.
The data was to be verifiably destroyed within seven calendar days.
The model must be able to compute fraud probability in real time – no slower than 1 record per second.
The Method
The constraints caused serious limitations on potential methods we could have used to identify fraud in this particular case. Without the outcomes, it was impossible to use supervised machine learning. Therefore, we turned to unsupervised machine learning, specifically Isolation Forest, amongst others.
Certain algorithms were inapplicable, simply because to train them would have been too time consuming.
We optimised our chosen algorithms and reduced the dimensionality of the dataset, using Primary Component Analysis to improve efficiency.
The Outcome
Again, contractual constraints prevent us from revealing too much, but we managed to identify the top 0.07% of fraud candidates in their dataset.
0.07% is the industry standard for fraud in such data, and our client confirmed that our 0.07% contained 100% of their known fraud cases. They subsequently requested we return a larger sample set for further examination by their internal fraud investigation department.
We are currently working together with the aim of placing our model into a real-time, fraud prediction service internal to the client.