Meaning, AI, and procurement -- some thoughts

©Ausrine Kuze, Distorted Reality, 2021.

James McKinney and Volodymyr Tarnay of the Open Contracting Partnership have published ‘A gentle introduction to applying AI in procurement’. It is a very accessible and helpful primer on some of the most salient issues to be considered when exploring the possibility of using AI to extract insights from procurement big data.

The OCP introduction to AI in procurement provides helpful pointers in relation to task identification, method, input, and model selection. I would add that an initial exploration of the possibility to deploy AI also (and perhaps first and foremost) requires careful consideration of the level of precision and the type (and size) of errors that can be tolerated in the specific task, and ways to test and measure it.

One of the crucial and perhaps more difficult to understand issues covered by the introduction is how AI seeks to capture ‘meaning’ in order to extract insights from big data. This is also a controversial issue that keeps coming up in procurement data analysis contexts, and one that triggered some heated debate at the Public Procurement Data Superpowers Conference last week—where, in my view, companies selling procurement insight services were peddling hyped claims (see session on ‘Transparency in public procurement - Data readability’).

In this post, I venture some thoughts on meaning, AI, and public procurement big data. As always, I am very interested in feedback and opportunities for further discussion.

Meaning

Of course, the concept of meaning is complex and open to philosophical, linguistic, and other interpretations. Here I take a relatively pedestrian and pragmatic approach and, following the Cambridge dictionary, consider two ways in which ‘meaning’ is understood in plain English: ‘the meaning of something is what it expresses or represents’, and meaning as ‘importance or value’.

To put it simply, I will argue that AI cannot capture meaning proper. It can carry complex analysis of ‘content in context’, but we should not equate that with meaning. This will be important later on.

AI, meaning, embeddings, and ‘content in context’

The OCP introduction helpfully addresses this issue in relation to an example of ‘sentence similarity’, where the researchers are looking for phrases that are alike in tender notices and predefined green criteria, and therefore want to use AI to compare sentences and assign them a similarity score. Intuitively, ‘meaning’ would be important to the comparison.

The OCP introduction explains that:

Computers don’t understand human language. They need to operate on numbers. We can represent text and other information as numerical values with vector embeddings. A vector is a list of numbers that, in the context of AI, helps us express the meaning of information and its relationship to other information.

Text can be converted into vectors using a model. [A sentence transformer model] converts a sentence into a vector of 384 numbers. For example, the sentence “don’t panic and always carry a towel” becomes the numbers 0.425…, 0.385…, 0.072…, and so on.

These numbers represent the meaning of the sentence.

Let’s compare this sentence to another: “keep calm and never forget your towel” which has the vector (0.434…, 0.264…, 0.123…, …).

One way to determine their similarity score is to use cosine similarity to calculate the distance between the vectors of the two sentences. Put simply, the closer the vectors are, the more alike the sentences are. The result of this calculation will always be a number from -1 (the sentences have opposite meanings) to 1 (same meaning). You could also calculate this using other trigonometric measures such as Euclidean distance.

For our two sentences above, performing this mathematical operation returns a similarity score of 0.869.

Now let’s consider the sentence “do you like cheese?” which has the vector (-0.167…, -0.557…, 0.066…, …). It returns a similarity score of 0.199. Hooray! The computer is correct!

But, this method is not fool-proof. Let’s try another: “do panic and never bring a towel” (0.589…, 0.255…, 0.0884…, …). The similarity score is 0.857. The score is high, because the words are similar… but the logic is opposite!

I think there are two important observations in relation to the use of meaning here (highlighted above).

First, meaning can hardly be captured where sentences with opposite logic are considered very similar. This is because the method described above (vector embedding) does not capture meaning. It captures content (words) in context (around other words).

Second, it is not possible to fully express in numbers what text expresses or represents, or its importance or value. What the vectors capture is the representation or expression of such meaning, the representation of its value and importance through the use of those specific words in the particular order in which they are expresssed. The string of numbers is thus a second-degree representation of the meaning intended by the words; it is a numerical representation of the word representation, not a numerical representation of the meaning.

Unavoidably, there is plenty scope for loss, alteration or even inversion of meaning when it goes through multiple imperfect processes of representation. This means that the more open textured the expression in words and the less contextualised in its presentation, the more difficult it is to achieve good results.

It is important to bear in mind that the current techniques based on this or similar methods, such as those based on large language models, clearly fail on crucial aspects such as their factuality—which ultimately requires checking whether something with a given meaning is true or false.

This is a burgeoning area of technnical research but it seems that even the most accurate models tend to hover around 70% accuracy, save in highly contextual non-ambiguous contexts (see eg D Quelle and A Bovet, ‘The perils and promises of fact-checking with large language models’ (2024) 7 Front. Artif. Intell., Sec. Natural Language Processing). While this is an impressive feature of these tools, it can hardly be acceptable to extrapolate that these tools can be deployed for tasks that require precision and factuality.

Procurement big data and ‘content and context’

In some senses, the application of AI to extract insights from procurement big data is well suited to the fact that, by and large, existing procurement data is very precisely contextualised and increasingly concerns structured content—that is, that most of the procurement data that is (increasingly) available is captured in structured notices and tends to have a narrowly defined and highly contextual purpose.

From that perspective, there is potential to look for implementations of advanced comparisons of ‘content in context’. But this will most likely have a hard boundary where ‘meaning’ needs to be interpreted or analysed, as AI cannot perform that task. At most, it can help gather the information, but it cannot analyse it because it cannot ‘understand’ it.

Policy implications

In my view, the above shows that the possibility of using AI to extract insights from procurement big data needs to be approched with caution. For tasks where a ‘broad brush’ approach will do, these can be helpful tools. They can help mitigate the informational deficit procurement policy and practice tend to encounter. As put in the conference last week, these tools can help get a sense of broad trends or directions, and can thus inform policy and decision-making only in that regard and to that extent. Conversely, AI cannot be used in contexts where precision is important and where errors would affect important rights or interests.

This is important, for example, in relation to the fascination that AI ‘business insights’ seems to be triggering amongst public buyers. One of the issues that kept coming up concerns why contracting authorities cannot benefit from the same advances that are touted as being offered to (private) tenderers. The case at hand was that of identifying ‘business opportunities’.

A number of companies are using AI to support searches for contract notices to highlight potentially interesting tenders to their clients. They offer services such as ‘tender summaries’, whereby the AI creates a one-line summary on the basis of a contract notice or a tender description, and this summary can be automatically translated (eg into English). They also offer search services based on ‘capturing meaning’ from a company’s website and matching it to potentially interesting tender opportunities.

All these services, however, are at bottom a sophisticated comparison of content in context, not of meaning. And these are deployed to go from more to less information (summaries), which can reduce problems with factuality and precision except in extreme cases, and in a setting where getting it wrong has only a marginal cost (ie the company will set aside the non-interesting tender and move on). This is also an area where expectations can be managed and where results well below 100% accuracy can be interesting and have value.

The opposite does not apply from the perspective of the public buyer. For example, a summary of a tender is unlikely to have much value as, with all likelihood, the summary will simply confirm that the tender matches the advertised object of the contract (which has no value, differently from a summary suggesting a tender matches the business activities of an economic operator). Moreover, factuality is extremely important and only 100% accuracy will do in a context where decision-making is subject to good administration guarantees.

Therefore, we need to be very careful about how we think about using AI to extract insights from procurement (big) data and, as the OCP introduction highlights, one of the most important things is to clearly define the task for which AI would be used. In my view, there are much more limited tasks than one could dream up if we let our collective imagination run high on hype.

Public procurement governance as an information-intensive exercise, and the allure of digital technologies

I have just started a 12-month Mid-Career Fellowship funded by the British Academy with the purpose of writing up the monograph Digital Technologies and Public Procurement. Gatekeeping and experimentation in digital public governance (OUP, forthcoming).

In the process of writing up, I will be sharing some draft chapters and other thought pieces. I would warmly welcome feedback that can help me polish the final version. As always, please feel free to reach out: a.sanchez-graells@bristol.ac.uk.

In this first draft chapter (num 6), I explore the technological promise of digital governance and use public procurement as a case study of ‘policy irresistibility’. The main ideas in the chapter are as follows:

This Chapter takes a governance perspective to reflect on the process of horizon scanning and experimentation with digital technologies. The Chapter stresses how aspirations of digital transformation can drive policy agendas and make them vulnerable to technological hype, despite technological immaturity and in the face of evidence of the difficulty of rolling out such transformation programmes—eg regarding the still ongoing wave of transition to e-procurement. Delivering on procurement’s goals of integrity, efficiency and transparency requires facing challenges derived from the information intensity and complexity of procurement governance. Digital technologies promise to bring solutions to such informational burden and thus augment decisionmakers’ ability to deal with that complexity and with related uncertainty. The allure of the potential benefits of deploying digital technologies generates ‘policy irresistibility’ that can capture decision-making by policymakers overly exposed to the promise of technological fixes to recalcitrant governance challenges. This can in turn result in excessive experimentation with digital technologies for procurement governance in the name of transformation. The Chapter largely focuses on the EU policy framework, but the insights derived from this analysis are easily exportable.

Another draft chapter (num 7) will follow soon with more detailed analysis of the feasibility boundary for the adoption of digital technologies for procurement governance purposes. The full details of this draft chapter are as follows: A Sanchez-Graells, ‘The technological promise of digital governance: procurement as a case study of “policy irresistibility”’ to be included in A Sanchez-Graells, Digital Technologies and Public Procurement. Gatekeeping and experimentation in digital public governance (OUP, forthcoming). Available at SSRN: https://ssrn.com/abstract=4216825.

Digital technologies, hype, and public sector capability

© Martin Brandt / Flickr.

By Albert Sanchez-Graells (@How2CrackANut) and Michael Lewis (@OpsProf).*

The public sector’s reaction to digital technologies and the associated regulatory and governance challenges is difficult to map, but there are some general trends that seem worrisome. In this blog post, we reflect on the problematic compound effects of technology hype cycles and diminished public sector digital technology capability, paying particular attention to their impact on public procurement.

Digital technologies, smoke, and mirrors

There is a generalised over-optimism about the potential of digital technologies, as well as their likely impact on economic growth and international competitiveness. There is also a rush to ‘look digitally advanced’ eg through the formulation of ‘AI strategies’ that are unlikely to generate significant practical impacts (more on that below). However, there seems to be a big (and growing?) gap between what countries report (or pretend) to be doing (eg in reports to the OECD AI observatory, or in relation to any other AI readiness ranking) and what they are practically doing. A relatively recent analysis showed that European countries (including the UK) underperform particularly in relation to strategic aspects that require detailed work (see graph). In other words, there are very few countries ready to move past signalling a willingness to jump onto the digital tech bandwagon.

Some of that over-optimism stems from limited public sector capability to understand the technologies themselves (as well as their implications), which leads to naïve or captured approaches to policymaking (on capture, see the eye-watering account emerging from the #Uberfiles). Given the closer alignment (or political meddling?) of policymakers with eg research funding programmes, including but not limited to academic institutions, naïve or captured approaches impact other areas of ‘support’ for the development of digital technologies. This also trickles down to procurement, as the ‘purchasing’ of digital technologies with public money is seen as a (not very subtle) way of subsidising their development (nb. there are many proponents of that approach, such as Mazzucato, as discussed here). However, this can also generate further space for capture, as the same lack of capability that affects high(er) level policymaking also affects funding organisations and ‘street level’ procurement teams. This results in a situation where procurement best practices such as market engagement result in the ‘art of the possible’ being determined by private industry. There is rarely co-creation of solutions, but too often a capture of procurement expenditure by entrepreneurs.

Limited capability, difficult assessments, and dependency risk

Perhaps the universalist techno-utopian framing (cost savings and efficiency and economic growth and better health and new service offerings, etc.) means it is increasingly hard to distinguish the specific merits of different digitalisation options – and the commercial interests that actively hype them. It is also increasingly difficult to carry out effective impact assessments where the (overstressed) benefits are relatively narrow and short-termist, while the downsides of technological adoption are diffuse and likely to only emerge after a significant time lag. Ironically, this limited ability to diagnose ‘relative’ risks and rewards is further exacerbated by the diminishing technical capability of the state: a negative mirror to Amazon’s flywheel model for amplifying capability. Indeed, as stressed by Bharosa (2022): “The perceptions of benefits and risks can be blurred by the information asymmetry between the public agencies and GovTech providers. In the case of GovTech solutions using new technologies like AI, Blockchain and IoT, the principal-agent problem can surface”.

As Colington (2021) points out, despite the “innumerable papers in organisation and management studies” on digitalisation, there is much less understanding of how interests of the digital economy might “reconfigure” public sector capacity. In studying Denmark’s policy of public sector digitalisation – which had the explicit intent of stimulating nascent digital technology industries – she observes the loss of the very capabilities necessary “for welfare states to develop competences for adapting and learning”. In the UK, where it might be argued there have been attempts, such as the Government Digital Services (GDS) and NHS Digital, to cultivate some digital skills ‘in-house’, the enduring legacy has been more limited in the face of endless demands for ‘cost saving’. Kattel and Takala (2021) for example studied GDS and noted that, despite early successes, they faced the challenge of continual (re)legitimization and squeezed investment; especially given the persistent cross-subsidised ‘land grab’ of platforms, like Amazon and Google, that offer ‘lower cost and higher quality’ services to governments. The early evidence emerging from the pilot algorithmic transparency standard seems to confirm this trend of (over)reliance on external providers, including Big Tech providers such as Microsoft (see here).

This is reflective of Milward and Provan’s (2003) ‘hollow state’ metaphor, used to describe "the nature of the devolution of power and decentralization of services from central government to subnational government and, by extension, to third parties – nonprofit agencies and private firms – who increasingly manage programs in the name of the state.” Two decades after its formulation, the metaphor is all the more applicable, as the hollowing out of the State is arguably a few orders of magnitude larger due the techno-centricity of reforms in the race towards a new model of digital public governance. It seems as if the role of the State is currently understood as being limited to that of enabler (and funder) of public governance reforms, not solely implemented, but driven by third parties—and primarily highly concentrated digital tech giants; so that “some GovTech providers can become the next Big Tech providers that could further exploit the limited technical knowledge available at public agencies [and] this dependency risk can become even more significant once modern GovTech solutions replace older government components” (Bharosa, 2022). This is a worrying trend, as once dominance is established, the expected anticompetitive effects of any market can be further multiplied and propagated in a setting of low public sector capability that fuels risk aversion, where the adage “Nobody ever gets fired for buying IBM” has been around since the 70s with limited variation (as to the tech platform it is ‘safe to engage’).

Ultimately, the more the State takes a back seat, the more its ability to steer developments fades away. The rise of a GovTech industry seeking to support governments in their digital transformation generates “concerns that GovTech solutions are a Trojan horse, exploiting the lack of technical knowledge at public agencies and shifting decision-making power from public agencies to market parties, thereby undermining digital sovereignty and public values” (Bharosa, 2022). Therefore, continuing to simply allow experimentation in the GovTech market without a clear strategy on how to reign the industry in—and, relatedly, how to build the public sector capacity needed to do so as a precondition—is a strategy with (exponentially) increasing reversal costs and an unclear tipping point past which meaningful change may simply not be possible.

Public sector and hype cycle

Being more pragmatic, the widely cited, if impressionistic, “hype cycle model” developed by Gartner Inc. provides additional insights. The model presents a generalized expectations path that new technologies follow over time, which suggests that new industrial technologies progress through different stages up to a peak that is followed by disappointment and, later, a recovery of expectations.

Although intended to describe aggregate technology level dynamics, it can be useful to consider the hype cycle for public digital technologies. In the early phases of the curve, vendors and potential users are actively looking for ways to create value from new technology and will claim endless potential use cases. If these are subsequently piloted or demonstrated – even if ‘free’ – they are exciting and visible, and vendors are keen to share use cases, they contribute to creating hype. Limited public sector capacity can also underpin excitement for use cases that are so far removed from their likely practical implementation, or so heavily curated, that they do not provide an accurate representation of how the technology would operate at production phase in the generally messy settings of public sector activity and public sector delivery. In phases such as the peak of inflated expectations, only organisations with sufficient digital technology and commercial capabilities can see through sophisticated marketing and sales efforts to separate the hype from the true potential of immature technologies. The emperor is likely to be naked, but who’s to say?

Moreover, as mentioned above, international organisations one step (upwards) removed from the State create additional fuel for the hype through mapping exercises and rankings, which generate a vicious circle of “public sector FOMO” as entrepreneurial bureaucrats and politicians are unlikely to want to be listed bottom of the table and can thus be particularly receptive to hyped pitches. This can leverage incentives to support *almost any* sort of tech pilots and implementations just to be seen to do something ‘innovative’, or to rush through high-risk implementations seeking to ‘cash in’ on the political and other rents they can (be spun to) generate.

However, as emerging evidence shows (AI Watch, 2022), there is a big attrition rate between announced and piloted adoptions, and those that are ultimately embedded in the functioning of the public sector in a value-adding manner (ie those that reach the plateau of productivity stage in the cycle). Crucially, the AI literacy and skills in the staff involved in the use of the technology post-pilot are one of the critical challenges to the AI implementation phase in the EU public sector (AI Watch, 2021). Thus, early moves in the hype curve are unlikely to translate into sustainable and expectations-matching deployments in the absence of a significant boost of public sector digital technology capabilities. Without committed long-term investment in that capability, piloting and experimentation will rarely translate into anything but expensive pet projects (and lucrative contracts).

Locking the hype in: IP, data, and acquisitions markets

Relatedly, the lack of public sector capacity is a foundation for eg policy recommendations seeking to avoid the public buyer acquiring (and having to manage) IP rights over the digital technologies it funds through procurement of innovation (see eg the European Commission’s policy approach: “There is also a need to improve the conditions for companies to protect and use IP in public procurement with a view to stimulating innovation and boosting the economy. Member States should consider leaving IP ownership to the contractors where appropriate, unless there are overriding public interests at stake or incompatible open licensing strategies in place” at 10).

This is clear as mud (eg what does overriding public interest mean here?) but fails to establish an adequate balance between public funding and public access to the technology, as well as generating (unavoidable?) risks of lock-in and exacerbating issues of lack of capacity in the medium and long-term. Not only in terms of re-procuring the technology (see related discussion here), but also in terms of the broader impact this can have if the technology is propagated to the private sector as a result of or in relation to public sector adoption.

Linking this recommendation to the hype curve, such an approach to relying on proprietary tech with all rights reserved to the third-party developer means that first mover advantages secured by private firms at the early stages of the emergence of a new technology are likely to be very profitable in the long term. This creates further incentives for hype and for investment in being the first to capture decision-makers, which results in an overexposure of policymakers and politicians to tech entrepreneurs pushing hard for (too early) adoption of technologies.

The exact same dynamic emerges in relation to access to data held by public sector entities without which GovTech (and other types of) innovation cannot take place. The value of data is still to be properly understood, as are the mechanisms that can ensure that the public sector obtains and retains the value that data uses can generate. Schemes to eg obtain value options through shares in companies seeking to monetise patient data are not bullet-proof, as some NHS Trusts recently found out (see here, and here paywalled). Contractual regulation of data access, data ownership and data retention rights and obligations pose a significant challenge to institutions with limited digital technology capabilities and can compound IP-related lock-in problems.

A final further complication is that the market for acquisitions of GovTech and other digital technologies start-ups and scale-ups is very active and unpredictable. Even with standard levels of due diligence, public sector institutions that had carefully sought to foster a diverse innovation ecosystem and to avoid contracting (solely) with big players may end up in their hands anyway, once their selected provider leverages their public sector success to deliver an ‘exit strategy’ for their founders and other (venture capital) investors. Change of control clauses clearly have a role to play, but the outside alternatives for public sector institutions engulfed in this process of market consolidation can be limited and difficult to assess, and particularly challenging for organisations with limited digital technology and associated commercial capabilities.

Procurement at the sharp end

Going back to the ongoing difficulty (and unwillingness?) in regulating some digital technologies, there is a (dominant) general narrative that imposes a ‘balanced’ approach between ensuring adequate safeguards and not stifling innovation (with some countries clearly erring much more on the side of caution, such as the UK, than others, such as the EU with the proposed EU AI Act, although the scope of application of its regulatory requirements is narrower than it may seem). This increasingly means that the tall order task of imposing regulatory constraints on the digital technologies and the private sector companies that develop (and own them) is passed on to procurement teams, as the procurement function is seen as a useful regulatory mechanism (see eg Select Committee on Public Standards, Ada Lovelace Institute, Coglianese and Lampmann (2021), Ben Dor and Coglianese (2022), etc but also the approach favoured by the European Commission through the standard clauses for the procurement of AI).

However, this approach completely ignores issues of (lack of) readiness and capability that indicate that the procurement function is being set up to fail in this gatekeeping role (in the absence of massive investment in upskilling). Not only because it lacks the (technical) ability to figure out the relevant checks and balances, and because the levels of required due diligence far exceed standard practices in more mature markets and lower risk procurements, but also because the procurement function can be at the sharp end of the hype cycle and (pragmatically) unable to stop the implementation of technological deployments that are either wasteful or problematic from a governance perspective, as public buyers are rarely in a position of independent decision-making that could enable them to do so. Institutional dynamics can be difficult to navigate even with good insights into problematic decisions, and can be intractable in a context of low capability to understand potential problems and push back against naïve or captured decisions to procure specific technologies and/or from specific providers.

Final thoughts

So, as a generalisation, lack of public sector capability seems to be skewing high level policy and limiting the development of effective plans to roll it out, filtering through to incentive systems that will have major repercussions on what technologies are developed and procured, with risks of lock-in and centralisation of power (away from the public sector), as well as generating a false comfort in the ability of the public procurement function to provide an effective route to tech regulation. The answer to these problems is both evident, simple, and politically intractable in view of the permeating hype around new technologies: more investment in capacity building across the public sector.

This regulatory answer is further complicated by the difficulty in implementing it in an employment market where the public sector, its reward schemes and social esteem are dwarfed by the high salaries, flexible work conditions and allure of the (Big) Tech sector and the GovTech start-up scene. Some strategies aimed at alleviating the generalised lack of public sector capability, e.g. through a GovTech platform at the EU level, can generate further risks of reduction of (in-house) public sector capability at State (and regional, local) level as well as bottlenecks in the access of tech to the public sector that could magnify issues of market dominance, lock-in and over-reliance on GovTech providers (as discussed in Hoekstra et al, 2022).

Ultimately, it is imperative to build more digital technology capability in the public sector, and to recognise that there are no quick (or cheap) fixes to do so. Otherwise, much like with climate change, despite the existence of clear interventions that can mitigate the problem, the hollowing out of the State and the increasing overdependency on Big Tech providers will be a self-fulfilling prophecy for which governments will have no one to blame but themselves.

 ___________________________________

* We are grateful to Rob Knott (@Procure4Health) for comments on an earlier draft. Any remaining errors and all opinions are solely ours.