A sideways look at economics

Stating that “the internet has a huge impact on our decisions and actions” is unlikely to be met with much pushback from the general population. Most of us are acutely aware of the extent to which we are influenced by the content that we are exposed to online. However, the results of a study  into the impact of Wikipedia pages on choices made by tourists are still surprising. The study found that adding information to the Wikipedia page for a tourist destination could increase overnight visits to that destination by 9% on average. It provides clear evidence of how powerful the internet can be in changing our behaviour. And makes it even more surprising that tourist destinations do not put more effort into tailoring their Wikipedia pages. You can imagine my surprise when reading the Wikipedia page for Suwon in South Korea and finding no mention of the ‘world’s first toilet theme park’. Whilst these results are striking, this blog will examine why we should be careful about how we interpret them.

Studies into relationships like the impact of the internet on our decisions are plagued by a fundamental statistical issue that economists face in their role as passive observers of data: endogeneity of regressors. That is, the impacts of the things you were not able to include in your model are picked up by the things you are trying to measure the impact of. This can lead to biases in estimations that make a causal interpretation difficult. The classic example of this is the question of returns to education. If we want to estimate the impact of the number of years of schooling an individual has engaged in on their earnings potential, we need to account for the possibility that people who decide to engage in more schooling might have earned higher salaries even if they didn’t engage in more schooling. The fundamental issue here is the omission of an individual’s innate ability. This idea is captured by the work of Michael Spence on the role of job market signalling. That is, education’s role is less about increasing the productivity of a worker and more about signalling that the worker is more productive than others.[1] His work would suggest that those who have higher ability will educate themselves more because they find education less costly.

A large share of the econometrics literature creates elegant tricks to solve this problem, with the most well-known being the uses of instrumental variables and natural experiments. However, the gold standard for dealing with this problem is the randomised controlled trial (RCT),[2] which has been deployed extensively in the development economic literature, pioneered by recent Nobel prize winner Esther Duflo. The simple logic behind an RCT is this, if I randomise the treatment — in other words the thing I want to measure the impact of — then the things I exclude won’t systematically impact the results. This means that if my sample is large enough, the effects of these things will net off in the aggregate. In the case of returns to education, suppose we could randomly allocate years of education to individuals, then we could expect that ability will be randomly distributed amongst different years of education. Whilst the omission of ability would be likely to have an impact on results by increasing the variance of our estimator, it wouldn’t bias our estimates.

Wikipedia effect

Whilst RCTs have nice causal interpretations, they’re often practically difficult or prohibitively expensive. However, the authors of the aforementioned paper on Wikipedia find a beautifully simple solution. There are vastly different amounts of content about a given tourist destination across the different language versions of their Wikipedia page. The authors intervene by actively changing tourist destination’s Wikipedia pages in different languages randomly. Utilising a dataset of Spanish tourist destinations overnight stays by nationality of tourist, they observe the impact of these random changes on tourist numbers. Their finding: on average adding information to the Wikipedia page increases overnight visits by 9%.

So, what can this study tell us about the impact of Wikipedia pages on tourism? It tells us that adding information to different language Wikipedia pages could increase the number of tourists a destination receives. However, it cannot tell us, as was claimed in the popular press, that if everyone did this global tourism revenues could increase by billions. Why? Because, of an assumption required for the stability of this type of analysis called SUTVA.

The stable unit treatment value assumption (SUTVA) assumes that the potential outcome of one observational unit cannot be impacted by the treatment or potential outcomes of another observational unit.[3] For SUTVA to hold, estimations involving the decision of Fathom’s local, the George and Vulture, to advertise that it is ‘London’s highest pub’ — a feature that is clearly far more important in determining its use as Fathom’s local than its proximity to our office — can’t have an impact on the number of people that decide to visit an alternative pub.

The most used, and currently relevant, example of this is vaccinations. Suppose we wish to measure the marginal benefit of COVID-19 vaccination. Right now, the marginal benefit for most people would be considerable because, given no-one else is vaccinated, the probability of catching the virus is relatively high. But, as a larger share of the rest of the population becomes vaccinated, my probability of catching the virus reduces and therefore my marginal benefit of a vaccination is lower. The key here is that the marginal benefit is only valid at the level of vaccination in the general population at which it was estimated.

In economics, the two most common violations of SUTVA come from pure externalities of consumption or production and business stealing effects. In the case of the increase in tourism derived from changing Wikipedia entries, it’s reasonable to imagine that whilst some of the increase in tourism is from people who otherwise would not have travelled anywhere, at least some of the increase is also derived from business stealing from other tourist destinations. In this sense, extrapolating the results to derive implications for the entire world is difficult.

The main takeaway, and a key criticism of the RCT literature, is that these marginal effects are generally local in nature because SUTVA very rarely holds. The only thing that seems to be certain is that causal interpretations are always difficult and that outcomes of studies must be interpreted with nuance. And, that these causal effects are averages and do not necessarily apply universally. Take Southport for example, which is, according to its Wikipedia page, home to “one of the few British lawnmower museums”. Perhaps sometimes, when it comes to adding new information to your Wikipedia page to attract tourists, less is more…



[1] Spence, M. (1973). Job Market Signaling. The Quarterly Journal of Economics, 87(3), p.355

[2] On the huge growth in RCTs, see: Cameron, D.B., Mishra, A.  & Brown, A.N. (2016), ‘The growth of impact evaluation for international development: how much have we learned?’, Journal of Development Effectiveness, 8:1, pp. 1-21

[3] For greater detail on the SUTVA assumption and its place in causal inference see: Angrist, J.D., Imbens, G.W. and Rubin, D.B. (1996), ‘Identification of Causal Effects Using Instrumental Variables’, Journal of the American Statistical Association, pp.444–455.