How can machine learning (ML) assist an overworked Compliance Officer? In February, I wrote a blog post about our plans to utilize machine learning in FA. Now it’s time for a follow-up!
Many of our clients are obligated to perform anti-money laundry (AML) monitoring, which entails identification of suspicious or otherwise noteworthy financial activity and reporting it to authorities. Exactly how this should be done is open to interpretation. Complying with AML regulations includes things such as:
- Reporting cash transfers that exceed a specific value: individual large transactions, but also multiple smaller transactions that add up to a significant amount within a limited time.
- Recording where funds originated, and comparing a client’s background information with his or her financial activity.
Without good tools, these obligations can eat up plenty of valuable time and resources that would be better spent elsewhere. Luckily, FA Platform includes tools to make these (and other) regulatory obligations easier to manage.
Flagging and reporting transactions that exceed a predefined monetary value is straightforward. We built a convenient tool for doing just that – check out our Compliance processes in AppStore. We did not need machine learning to achieve that. But how can we spot changes in a client’s behaviour? That’s where machine learning comes into play.
Automatically detecting changes in financial activity
Machine learning is not easy. You can’t just throw a bunch of raw data at a machine learning algorithm and expect meaningful results. Creating a useful analytical tool takes lots of tweaking as well as trial and error. For the sake of brevity, I will skip straight to summarizing how our soon-to-be-released tool looks for changes in behaviour.
Indicators: total number of deposits + withdrawals, and total volume of cashflows (value of deposits plus value of withdrawals).
Data pre-processing: to make data comparable, we averaged both indicators to weekly values, and normalized them (scaled to a range of 0-1). The volume of cashflows figure was scaled logarithmically.
Algorithm: predefined categories of this kind of client behaviour usually don’t exist, so we approached the problem as an unsupervised learning task. We cluster clients based on the indicators above to end up with groups that represent varying types of cash transfer activity. The idea is to categorize clients’ activity during different periods of time to see if they jump from one category to another. This kind of jump represents a change in behaviour, which is what we are looking for. Our initial clustering algorithm of choice is K-means, which is a simple but powerful clustering algorithm. We are also looking to use Gaussian mixture model as an alternative algorithm.
Reducing false positives: are we interested in a client who is close to the edge of two different groups, and whose slight change in behaviour barely pushes him over the edge? Usually we are not, so we built in an option to minimize these kinds of results.
Productization of machine learning
Productization around machine learning often means productizing services. An example of a productized machine learning service is a project wherein a data scientist works with the clients’ data for a few days, weeks or months. The outcome of this project is the analysed data as well as the results, usually packaged into a pretty report. Others have gone further in their productization efforts and sell various kinds of machine learning tools directly.
We chose the latter approach: our goal was to produce an analytical tool that you can easily use yourself, directly within FA Platform. That is what we have been working on, and that is what you can expect to find in AppStore in the coming weeks.
What’s next?
AML is just one of many areas where users of FA could benefit from machine learning techniques. Now, as the development of our AML process is winding down, our focus turns towards new projects. The next topics we will focus on include adding intelligence to data imports as well as outlier detection – automatically spotting errors in data.