A projection displays information from two Wall Street Journal readers: one a city dweller who has read five articles from across the paper in two weeks, all through social media.
The other lives in a rural area and has read two stories from the business section via search. Who is more likely to subscribe? The WSJ has been doggedly crunching data to answer questions such as these.
In this example it is in fact the latter who is more likely to subscribe. Despite their rural location and the smaller number of articles read, it's the intentionality which in this case is the most important factor.
While the former merely stumbled across links surfaced on their social feed, the second reader purposefully sought out the WSJ.
Thanks to publication's ongoing data analysis and the increasing refinement of its predictive tools, it's likely that the company would have guessed this. Based on more than 60 important dimensions of behaviour, the newspaper assigns each reader a number on a sliding scale from 0 to 100.
"The way that we've assessed this is similar to how a weather channel tries to figure out the probability of rain," said John Wiley, data scientist at WSJ, speaking at the O'Reilly Artificial Intelligence conference recently in London. "We're trying to figure out the probability of a subscription."
The 0s are very unlikely to subscribe, 100s are almost sure to. This score, unknown to the reader, is paramount to the company because it decides how the dynamic paywall shielding the majority of WSJ content interacts with this particular visitor.
"We want to get to the point, through these exercises, that we're actually segmenting our audience into these tiny micro segments," said Wiley.
"If you were to partition off different internet traffic, you'd actually be able to say that people with a score of a hundred were generally converting at about three times the rate of someone with a score of one."
As a result of extensive research, some hard and fast rules have emerged. One is that no matter who the visitor is, the supply of free monthly articles should have a hard cusp at five. According to Wiley, this is why publications offering free articles tend to converge on this value.
"Once you go beyond that, what you see is diminishing returns," Wiley said. "Especially in a scenario where someone might be just getting unrestricted access to content." Beyond five free articles, conversion rates drop.
Value of data
There are other broad truths that emerge from the data. The earlier example notwithstanding, people located in city centres, in places like New York and San Francisco are generally more likely to subscribe.
An obvious application of the newspaper's growing understanding of its readers is choosing the optimal moment to surface the paywall in order to maximise the number of subscriptions.
While those with the highest scores may find themselves shut out fairly quickly, driving demand, those who are less convinced may find themselves able to peruse a larger number of articles before hitting the paywall.
Like any machine learning model, data is what it feeds on. Through this data, the company is able to develop and test out models of the relatively rare event in terms of internet traffic: someone hitting the 'subscribe' button.
"We can calibrate our systems so we are constantly recalculating across these different engagement dimensions, which ones matter for a subscription," said Wiley.
Isolating these predictive dimensions is, of course, imperative for the newspaper's model. This is why some readers may be asked to submit an email address in return for more free articles, so the paper can gather more data on their browsing behaviour and further refine their predictive machinery.
The results are promising. Now that the publication is able to adjust the subscription offer to appear at a 'relevant' time, the proportion of those actually subscribing has increased. The newspaper celebrated hitting 3 million subscribers earlier this year.
But propensity modelling is just the beginning - getting someone to subscribe is only half the battle.
"Someone choosing to unsubscribe: that becomes a very relevant thing to be able to predict," said Wiley. "To determine who your highest risk cohorts are among your subscribers, you also need to determine what types of engagement interventions impact people's retention."
By this, he means the willingness of readers to remain subscribers if offered an alternative package or discount.
"We have a model running for engagement to predict the churn, or the moment that somebody will cancel," he said. "What are the daily behaviours? Basically, what are the positive experiences on site that lower the risk of somebody churning."
The company may go through dormant periods on the marketing front, where it's not aggressively attempting to convert people into subscribers, but where it is still measuring the overall engagement of readers.
This means that when it does next embark on a subscription drive, it will have plenty of information about those who are poised and ready for a tempting offer to take them over the edge.
The next goal, according to Wiley, is how to incentivise engagement among non-subscribers to allow the publication to interact with that crucial volume of content before users are ready to sign up.
This will largely focus on increasing the personalisation of the product to render it as enticing as possible to prospective subscribers, feeding into the recommendations for content the reader sees, and even how the company messages them.
For example, the paper may email someone with a higher score more often or recommend newsletters in fields in which they've already shown an interest.
As the company becomes more and more advanced in segmenting and interacting with its audience, the hope is it will become increasingly difficult to remain unconverted, whatever your score.