06.10.2021 | Blogpost

Machine learning for fighting corruption in public procurement for education services and institutions

Vitezslav Titl
Assistant Professor of Law & Economics at Utrecht University and a principal investigator of a Junior STAR grant at Charles University. His research interests comprise economics of public procurement markets, corruption, law and economics.
Public procurement markets are worth 10-15 percent of global GDP, which constitutes about one-third of general government spending (OECD, 2019). The available data suggest that, in OECD countries, 11.9 percent of procurement spending is on education services and institutions. This means that about 1.2-1.8 percent of the GDP of OECD countries is spent on education via public procurement. The importance of this spending lies not only in its size, but also in its large implications for economic development, democratic institutions as well as overall wellbeing.

The funding systems of schools differ significantly across countries (for an overview of selected best performing countries and their funding systems; see De Witte et al., 2019), which means that, depending on the exact funding system, resources can be managed by the school, district , municipality or ministry, or a combination of the above. Whatever the system looks like, almost all must run public tenders, especially for large contracts.  

Vast academic evidence shows that public procurement markets are susceptible to fraud due to issues such as corruption (Decarolis et al, 2021), political connections (Titl & Geys, 2019; Titl et al., 2021), and collusion (Kawai & Nakabayashi, 2021; Baranek et al. 2021). These issues cause large inefficiencies on the market. The precise estimates of the costs of these inefficiencies differ across countries and settings. Khwaja & Mian (2005) estimate that politically connected firms cost up 1.9% of global GDP every year. Similarly, a recent study commissioned by the European Commission suggests that inefficiencies in public procurement amount to 18% of the overall expenditure on procurement, of which two thirds can be attributed to corruption.

Policy implications of the aforementioned studies are not clear. There is a long-standing debate about whether to give less or more discretion to procurement officers. On the one hand, discretion can be (mis)used to favour certain firms and thus enable corruption, however under some circumstances less stringent rules may bring more efficient outcomes. This is supported by recent empirical evidence by Baranek (2021). Similar findings suggesting ineffectiveness of additional rules under some circumstances can be found also in Bosio et al. (2020).

Given this uncertainty in how to mitigate the negative effects of corruption in public procurement in education and elsewhere, one solution could be to employ new machine learning algorithms to identify procurement contracts which have a high chance of being mis-allocated due to corruption, political connections and/or conflicts of interests. These contracts could then be tagged for further investigation. If successful, procurement systems could be made less restrictive – and more efficient. In addition, such a system would have a dissuasive effect on corrupt behaviors amongst firm and procurement officers, due to high chances of being caught.  

There is already some evidence that machine learning algorithms are effective in detecting potentially corrupt public procurement contracts, as well as situations with conflicts of interests (firms with political connections). For instance, Decarolis & Giorgiantonio (2020) show that quantitative indicators, in combination with machine learning methods (especially random forests), are very effective in detecting corruption. This study showed more than 99.9 percent accuracy in predicting that there was corruption in a public procurement contract (less than 0.1 percent false positive). Similarly, Mazrekaj, Titl, and Schiltz (2021) show that these algorithms can be used with high accuracy (above 80 percent) to find conflicts of interests in public procurement. Unlike random checks (or waiting for a tip), the algorithm is able to suggest to investigators which tenders are of interest. The above-mentioned academic research shows that the chances of finding corruption and conflicts of interests via this method are very high. If, for example, 1 percent of procurement contracts are corrupt, an investigator randomly checking contracts would need to analyse on average 100 procurement contracts to find 1 which is corrupt. When using machine learning the chances are reversed - in 100 contracts identified by the algorithm, less than 1 will not be corrupt. The investigator will therefore become much more successful in finding corruption.

There are however several conditions that must be satisfied so that machine learning can be effectively used to detect irregularities and corruption. Interoperable open public procurement data in machine-readable format must exist. It should be easy to merge the procurement data with other public registers such as a company registers and registers of politically exposed persons. Thus, from the policy perspective, it appears vital to make sure public education institutions provide good quality, complete, interoperable procurement data. Subsequently, the data can be used by law enforcement agencies as well as the public (NGOs, activists and others) to identify potentially corrupt procurement tenders. Finally, it is necessary to note that technology cannot prosecute anyone. This last step has to remain in the hands of humans.


This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
1 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.