Sequencing transactions with pattern matching, part 1

13 Nov 2019  |  by Chris Roe

How Apteco FastStats® can offer valuable customer insights into customers’ purchasing history.

Finding a common theme

Pattern matching is a technique that can be used to understand sequences in customers’ transaction history. It can help us understand their behaviour by examining what they have previously bought from us and how they have interacted with us. We often know a lot about what products customers buy and the order in which they buy them, which gives us the opportunity to be able to identify particular sequences. These sequences can have interesting meanings and can bring real marketing benefit. 

There are a number of examples of analytics that involve being able to select people by patterns in consecutive transactions. For example:

  • Selecting people who have matched a particular pattern
  • Comparing people who have matched a pattern against those who have not
  • Finding people who have repeatedly bought the same product in a row

There are already some existing techniques that can be used to handle some aspects of pattern matching in Apteco FastStats®.

The transactional analysis wizard currently available in FastStats can be used to find the most frequently occurring patterns of the same length. The limitations in this technique are that it cannot find specific patterns, and cannot find those patterns if they are not the most frequently occurring or if they need to have more specific advanced features (such as wildcards).

The existing Relative aggregations can be used to take information from one transactional record and push it onto another record. In this way, expressions with several ‘AND’ conditions can be used together to create a pattern. This technique is very powerful as it can find specific patterns, and ones with some of the more advanced wildcards. It doesn’t allow for situations where patterns have ‘*’ wildcards in them, or date restrictions across patterns.

Pattern matching

The pattern matching technique introduced here allows the specification of a set of patterns of potentially different lengths, which are defined in a priority order. A pattern is defined as a set of values that represent the transactions in the sequence. The name associated with the first matched one is returned. This aggregation could be used on its own, or combined together with any of the other expression functionality for more sophisticated analysis.

The option is available in the function type dropdown for an on-the-fly aggregation as ‘Pattern Match’ in FastStats. The user chooses which variable is used to sequence the transactions and the variable to be used for the pattern matching. An example of a set of patterns is shown below.

ap_-_patternmatch-_insert_1.jpg

Adding the above pattern definition to a cube gives the following results. There are 109 people who have visited New Zealand directly after Australia. A further 46,143 people have visited the USA three times in a row, and there are no other people who’ve matched the FGFG pattern. The transactional details of a couple of the people from the A-NZ pattern are shown on the right-hand side of the screenshot.

ap-patternmatch-insert-2.jpg

Wildcards

We have three available wildcards in the Q1 2019 version of the software.

The first is familiar to those who have used text selection wildcards and is the ‘?’ wildcard. This can be used where an element of the pattern is unimportant and we don’t mind what the value was. As an example, the pattern ‘Australia ? New Zealand’ would match people who, at some point, have visited Australia, then any destination, then New Zealand. Note that the middle transaction could be Australia as well!

The second is the ‘*’ wildcard and it works in the same manner as the text selection wildcard. It is used to match ‘0 or more transactions’. As an example, the pattern ‘Australia * New Zealand’ would match people who, at some point have visited Australia, then have had 0 or more holidays elsewhere, and then have been to New Zealand. It is an easy way to construct selections of people who have bought X and then sometime later bought Y.

These two wildcards can be used in conjunction with one another. For example ‘Australia ? * New Zealand’ is a way of selecting people who have been to Australia and then had at least one holiday before a holiday to New Zealand.

The final wildcard is the ‘=’ wildcard. This is used to specify that this item in the pattern matches the previous one. This can be a useful shorthand way of writing patterns with fixed values that you might then want to change later (e.g. Australia = = = is Australia followed by three of the same – but it is quicker to edit one box to change to say New Zealand if you wanted to look at different destinations).

However, the more useful case is when it is preceded by a ‘?’ wildcard, or just on its own (= as the first item in a pattern is taken to match any item). So, the example pattern below would enable us to segment people by the longest consecutive sequence of identical destinations that they have visited. In this instance, since I know the maximum number of bookings any person on our test system has had is eight, this is the longest possible sequence. The user interface has a shortcut for defining patterns that follow the construction shown below.

ap-patternmatch-insert-3.jpg

Adding the expression to a cube allows us to see all our customers and their longest sequences. I have then selected those with eight, and then shown the bookings made by those people (which must all necessarily be the same). This shows just three of the destinations have been visited eight times consecutively by any of our customers.

ap-patternmatch-insert-4.jpg

Some more involved examples

In previous blog posts we have undertaken analysis on both international and English league football results. The transactions here are the matches that the teams have been involved in, and there are lots of teams who have taken part in several thousands of matches in the last 130 years.

The pattern match features introduced in this blog can be used to answer a whole range of questions that would have previously been much more time consuming. I’ll list some of the questions below so that you can have a think about them before showing you the techniques we can use to answer them. The actual answers are included at the end of the blog so that you can see how you did!

  1. Which international team has had the most consecutive wins? How many?
  2. Which international team has had the most consecutive losses? Is this more or less than the number of wins?
  3. What about international teams with consecutive draws?
  4. “1-0 to the Arsenal” used to be a common chant from their supporters. But which score-line has been achieved consecutively the most number of times in English league football? How many matches? (Hint – it is not 1-0 to the Arsenal. They have only managed to do that three times in a row!)

Small extra details

There are a couple of other functionalities that the Pattern Match feature allows.

Firstly, the transactions that are analysed can be restricted using a transactional filter selection. For example, we could restrict our holidays to be expensive ones, or restrict the football matches to be just league matches.

Secondly, the user can specify an optional parameter to break a pattern if the gap between two adjacent transactions is too great. When sequencing by a date variable, for example, the user could specify that each pair of holidays has less than 365 days between them.

Finally, there is no restriction on the naming of the patterns so you can give the same name to multiple patterns if you want different patterns to return the same value.

Conclusions

This blog post has introduced functionality that has been introduced in the Q4 2018 release and extended to include the ‘*’ wildcard in the Q1 2019 release. It enables quicker and easier analysis of sequences of transactions. Hopefully you’ve also learnt something new about football statistics along the way!

Keep your eyes peeled for a second blog post on the subject of pattern matching which will be along shortly.

Want to see how marketers are tackling the challenge of customer engagement? Download your copy of the Apteco Data Trends Report 2019 to discover more. 

This is the first part of a series with four parts. Click here to access the next part of this blog series. 

Download Top 10 analytics techniques for data-savvy marketers
Chris Roe

Developer

Chris spends his time developing new analytics features for Apteco FastStats®. You may also meet him during Apteco training sessions. In addition to this, Chris spends his time building FastStats systems from publicly available data, searching for insights and writing for the Apteco Blog series.

Subscribe to our blog and get all the latest data analysis and campaign automation news.