Geographic Expressions in FastStats
07 Aug 2018 | by Chris Roe
Practical examples of the insights made possible by the location expression functions available in FastStats.
In this blog post we’re going to look at some practical situations in which we can make use of the powerful location expression functions available in Apteco FastStatsⓇ. We’ll use a mixture of different FastStats systems and different types of location data. Some of these have come direct from openly available data sources, and some we’ve had to programmatically scrape from the web. In addition to demonstrating these expressions, we’ll visualise some of the results using the Mapping capabilities within the software.
Distance and Locality functions using UK Postcodes
The first example we’re going to use is from our standard Holidays training database. There are a set of expression functions that are calculated from UK postcodes using a standard library of postcode and location information. In this particular dataset locations are stored as postcodes.
In the example below, we have an expression to find the minimum distance between each customer postcode and Heathrow, Birmingham and Manchester airports. This kind of expression can be used to find customers who are near to particular relevant locations.
A slight modification of the expression can give the textual value of the particular airport that the customer is closest to. This expression below could also be turned into a Selector virtual variable using the Expression wizard.
These two examples can be utilised together in a marketing context. In the screenshot below, I’ve selected customers who live within 20 miles of one of the three airports. I’ve then chosen to output a range of personal information and the name of the airport that they’re closest to.
How far between?
In this next example, we’re using a FastStats system built out of UK flight delay data. The data provided by the CAA includes origin and destination airports, but does not provide location information. We’ve found and added the latitude and longitude values for over 1,200 airports that are flown to from the UK. The rationale behind doing this was so that we could investigate whether flight distance had an impact on how much flights were delayed.
The GeoDist() function can be used to find the distance between any two locations, and is therefore a much more general distance function than the UK specific ones described above.
An expression to calculate distance is:
This expression could then be used in our analysis to give a relationship like:
This style of calculation is slightly different to the usual case in that each data record had two locations associated with it – to represent the start and end point of the flight. More typical is the situation where there’s a location associated to a record (usually a customer) and one or more reference points to represent locations of interest. The above expression could have been rewritten with fixed numeric values to find the distance between a customer and a set of locations (e.g. our retail stores).
Find my nearest…
In this example, we’re using a FastStats system comprising house price sale data from a two-year period in Melbourne. The dataset comprised latitude and longitude information to pinpoint the location of each sold property. The challenge was to build a model to predict sold house prices. However, some factors which may turn out to be important were not provided in the source data. For instance, proximity to public transport (e.g. train or tram stops) could have an influence on house prices.
We’ve written some R scripts to scrape the location information for train and tram stops in Melbourne and write them out to a CSV file. With a simple bit of further processing we can bring this data into FastStats expressions directly in the format we need to utilise the location expressions to establish how close each house is to its nearest public transport stop. The screenshot below shows the start of the expression to work out the distance to the nearest train station in Melbourne.
For an analytical or modelling purpose it may well be that the numeric distance is the important factor. In many marketing instances, however, we also wish to know which of the locations was nearest (e.g. what is my nearest store? Airport? Train station? etc).
In the screenshot below, there’s a variation on the above expression that works out which location is the nearest and then returns the textual value of that station name. There are two copies below showing different parts of the same expression.
We’ve seen this type of analysis requirement in many scenarios. Clients have had the need to find the nearest store, the nearest post office, or the nearest cinema. The GeoNth() and GeoDistNth() functions also allow for finding the distance of (or index to) the Nth closest. There are so many possibilities – and all of these can follow the approaches described above.
We can now use these expressions and visualise the results on a map. I’ve taken a selection of houses where the nearest train station was Northcote. The first (zoomed out map) has an overlaid layer showing the locations of all the Melbourne metropolitan stations (cyan), together with a red group for those nearest to Northcote.
The second zoomed in map shows the small number of sold houses nearest to Northcote station. The mouse is hovering over the blue circle of Northcote station, and the supplementary information in the tooltip was information that we scraped about this particular station.
We could have tailored this map further to colour the circles by different house price bands, or house types.
In this blog we‘ve described several examples of how the location expressions in FastStats can be used in different analytical scenarios. Your analysis may require the use of locations that are relevant to your business and that you have full knowledge of, such as your store locations.
Alternatively, you may need to go out and obtain the data you need from one of the many open data sources that are available. For example, the UK government has published a full list of public transport stops. It doesn’t take too much work to extract a relevant subset for your particular requirements (e.g. find all UK airports/train stations/underground stations etc). If the location data you need isn’t readily available, then you’ll need to put in more effort to get it.
See how marketers are using data to get rich insights by downloading the Customer Centric Data Trends 2018.