Sequencing transactions with pattern matching, part 1
13 Nov 2019 | by Chris Roe
How Apteco FastStats® can offer valuable customer insights into customers’ purchasing history.
Finding a common theme
Pattern matching is a technique that can be used to understand sequences in customers’ transaction history. It can help us understand their behaviour by examining what they have previously bought from us and how they have interacted with us. We often know a lot about what products customers buy and the order in which they buy them, which gives us the opportunity to be able to identify particular sequences. These sequences can have interesting meanings and can bring real marketing benefit.
There are a number of examples of analytics that involve being able to select people by patterns in consecutive transactions. For example:
- Selecting people who have matched a particular pattern
- Comparing people who have matched a pattern against those who have not
- Finding people who have repeatedly bought the same product in a row
There are already some existing techniques that can be used to handle some aspects of pattern matching in Apteco FastStats®.
The transactional analysis wizard currently available in FastStats can be used to find the most frequently occurring patterns of the same length. The limitations in this technique are that it cannot find specific patterns, and cannot find those patterns if they are not the most frequently occurring or if they need to have more specific advanced features (such as wildcards).
The existing Relative aggregations can be used to take information from one transactional record and push it onto another record. In this way, expressions with several ‘AND’ conditions can be used together to create a pattern. This technique is very powerful as it can find specific patterns, and ones with some of the more advanced wildcards. It doesn’t allow for situations where patterns have ‘*’ wildcards in them, or date restrictions across patterns.
The pattern matching technique introduced here allows the specification of a set of patterns of potentially different lengths, which are defined in a priority order. A pattern is defined as a set of values that represent the transactions in the sequence. The name associated with the first matched one is returned. This aggregation could be used on its own, or combined together with any of the other expression functionality for more sophisticated analysis.
The option is available in the function type dropdown for an on-the-fly aggregation as ‘Pattern Match’ in FastStats. The user chooses which variable is used to sequence the transactions and the variable to be used for the pattern matching. An example of a set of patterns is shown below.
Adding the above pattern definition to a cube gives the following results. There are 109 people who have visited New Zealand directly after Australia. A further 46,143 people have visited the USA three times in a row, and there are no other people who’ve matched the FGFG pattern. The transactional details of a couple of the people from the A-NZ pattern are shown on the right-hand side of the screenshot.
We have three available wildcards in the Q1 2019 version of the software.
The first is familiar to those who have used text selection wildcards and is the ‘?’ wildcard. This can be used where an element of the pattern is unimportant and we don’t mind what the value was. As an example, the pattern ‘Australia ? New Zealand’ would match people who, at some point, have visited Australia, then any destination, then New Zealand. Note that the middle transaction could be Australia as well!
The second is the ‘*’ wildcard and it works in the same manner as the text selection wildcard. It is used to match ‘0 or more transactions’. As an example, the pattern ‘Australia * New Zealand’ would match people who, at some point have visited Australia, then have had 0 or more holidays elsewhere, and then have been to New Zealand. It is an easy way to construct selections of people who have bought X and then sometime later bought Y.
These two wildcards can be used in conjunction with one another. For example ‘Australia ? * New Zealand’ is a way of selecting people who have been to Australia and then had at least one holiday before a holiday to New Zealand.
The final wildcard is the ‘=’ wildcard. This is used to specify that this item in the pattern matches the previous one. This can be a useful shorthand way of writing patterns with fixed values that you might then want to change later (e.g. Australia = = = is Australia followed by three of the same – but it is quicker to edit one box to change to say New Zealand if you wanted to look at different destinations).
However, the more useful case is when it is preceded by a ‘?’ wildcard, or just on its own (= as the first item in a pattern is taken to match any item). So, the example pattern below would enable us to segment people by the longest consecutive sequence of identical destinations that they have visited. In this instance, since I know the maximum number of bookings any person on our test system has had is eight, this is the longest possible sequence. The user interface has a shortcut for defining patterns that follow the construction shown below.
Adding the expression to a cube allows us to see all our customers and their longest sequences. I have then selected those with eight, and then shown the bookings made by those people (which must all necessarily be the same). This shows just three of the destinations have been visited eight times consecutively by any of our customers.
Some more involved examples
In previous blog posts we have undertaken analysis on both international and English league football results. The transactions here are the matches that the teams have been involved in, and there are lots of teams who have taken part in several thousands of matches in the last 130 years.
The pattern match features introduced in this blog can be used to answer a whole range of questions that would have previously been much more time consuming. I’ll list some of the questions below so that you can have a think about them before showing you the techniques we can use to answer them. The actual answers are included at the end of the blog so that you can see how you did!
- Which international team has had the most consecutive wins? How many?
- Which international team has had the most consecutive losses? Is this more or less than the number of wins?
- What about international teams with consecutive draws?
- “1-0 to the Arsenal” used to be a common chant from their supporters. But which score-line has been achieved consecutively the most number of times in English league football? How many matches? (Hint – it is not 1-0 to the Arsenal. They have only managed to do that three times in a row!)
Small extra details
There are a couple of other functionalities that the Pattern Match feature allows.
Firstly, the transactions that are analysed can be restricted using a transactional filter selection. For example, we could restrict our holidays to be expensive ones, or restrict the football matches to be just league matches.
Secondly, the user can specify an optional parameter to break a pattern if the gap between two adjacent transactions is too great. When sequencing by a date variable, for example, the user could specify that each pair of holidays has less than 365 days between them.
Finally, there is no restriction on the naming of the patterns so you can give the same name to multiple patterns if you want different patterns to return the same value.
This blog post has introduced functionality that has been introduced in the Q4 2018 release and extended to include the ‘*’ wildcard in the Q1 2019 release. It enables quicker and easier analysis of sequences of transactions. Hopefully you’ve also learnt something new about football statistics along the way!
Keep your eyes peeled for a second blog post on the subject of pattern matching which will be along shortly.
Want to see how marketers are tackling the challenge of customer engagement? Download your copy of the Apteco Data Trends Report 2019 to discover more.
This is the first part of a series with four parts. Click here to access the next part of this blog series.
Answers to the football questions:
- Spain had fifteen wins in a row from June 26th 2008 to June 20th 2009, but going further back Mauritius had seventeen wins in a row in the 1950s. Admittedly they were all against Madagascar or Reunion! This result can be found by constructing a pattern with 20W, then 19W, all the way down to 1W. Then the longest sequence of W’s will be matched. There is, however, an element of trial and error required to establish how big you need the first pattern to be. As soon as you have a gap in the highest length sequences you know you must have got the longest possible one.
- An easier one this! San Marino had a sixty-one match losing streak starting on September 4th 2004 (following a 1-0 win against Liechtenstein). They broke this sequence with a 0-0 draw against Estonia. This result uses the same technique as question one, but looking for Loss results.
- Gabon had eight draws in a row, starting on October 8th 2016 and ending on June 4th 2017. This result uses the same technique as question one but looking for Draw results.
- Bristol Rovers had six 2-1 league matches in a row, starting on March 17th 1990. QPR had six 1-1 league matches in a row, starting on December 14th 1957. This uses a similar construction technique as above, on match score-lines, but with the added feature that we don’t know what scoreline it will be that we are looking for, so we use the ‘=’ wildcard.