It has been refreshing to receive a detailed response by the lead author of one of the papers I recently discussed in the blog (see here). Big thanks to Manuel García Rodríguez for following up and for frank and constructive engagement. His full comments are below. I think they will help round up the discussion on the potential, constraints and data-dependency of procurement recommender systems.
Thank you Prof. Sánchez Graells for your comments, it has been a rewarding reading. Below I present my point of view to continue taking an in-depth look about the topic.
Regarding the percentage of success of the recommender, a major initial consideration is that the recommender is generic. It means, it is not restricted to a type of contract, CPV codes, geographical area, etc. It is a recommender that uses all types of Spanish tenders, from any CPV and over 6 years (see table 3). This greatly influences the percentage of success because it is the most difficult scenario. An easier scenario would have restricted the browser to certain geographic areas or CPVs, for example. In addition, 102,000 tenders have been used to this study and, presumably, they are not enough for a search engine which learn business behaviour patterns from historical tenders (more tenders could not be used due to poor data quality).
Regarding the comment that ‘the recommender is an effective tool for society because it enables and increases the bidders participation in tenders with less effort and resources‘. With this phrase we mean that the Administration can have an assistant to encourage participation (in the tenders which are negotiations with or without prior notice) or, even, in which the civil servants actively search for companies and inform those companies directly. I do not know if the public contracting laws of the European countries allow to search for actively and inform directly but it would be the most efficient and reasonable. On the other hand, a good recommender (one that has a high percentage of accuracy) can be an analytical tool to evaluate the level of competition by the contracting authorities. That is, if the tenders of a contracting authority attract very little competition but the recommender finds many potential participating companies, it means that the contracting authority can make its tenders more attractive for the market.
Regarding the comment that “It is also notable that the information of the Companies Register is itself not (and probably cannot be, period) checked or validated, despite the fact that most of it is simply based on self-declarations.” The information in the Spanish Business Register are the annual accounts of the companies, audited by an external entity. I do not know the auditing laws of the different countries. Therefore, I think that the reliability of the data is quite high in our article.
Regarding the first problematic aspect that you indicate: “The first one is that the recommender seems by design incapable of comparing the functional capabilities of companies with very different structural characteristics, unless the parameters for the filtering are given such range that the basket of recommendations approaches four digits”. There will always be the difficulty of comparing companies and defining when they are similar. That analysis should be done by economists, engineers can contribute little. There is also the limitation of business data, the information of the Business Register is usually paywalled and limited to certain fields, as is the case with the Spanish Business Registry. For these reasons, we recognise in the article that it is a basic approach, and it should be modified the filters/rules in the future: “Creating this profile to search similar companies is a very complex issue, which has been simplified. For this reason, the searching phase (3) has basic filters or rules. Moreover, it is possible to modify or add other filters according to the available company dataset used in the aggregation phase”.
Regarding the second problematic aspect that you indicate: “The second issue is that a recommender such as this one seems quite vulnerable to the risk of perpetuating and exacerbating incumbency advantages, and/or of consolidating geographical market fragmentation (given the importance of eg distance, which cannot generate the expected impact on eg costs in all industries, and can increasingly be entirely irrelevant in the context of digital/remote delivery).” This will not happen in the medium and long term because the recommender will adapt to market conditions. If there are companies that win bids far away, the algorithm will include that new distance range in its search. It will always be based on the historical winner companies (and the rest of the bidders if we have that information). You cannot ask a machine learning algorithm (the one used in this article) to make predictions not based on the previous winners and historical market patterns.
I totally agree with your final comment: “It would in my view be preferable to start by designing the recommender system in a way that makes theoretical sense and then make sure that the required data architecture exists or is created.” Unfortunately, I did not find any articles that discuss this topic. Lawyers, economists and engineers must work together to propose solid architectures. In this article we want to convince stakeholders that it is possible to create software tools such as a bidder recommender and the importance of public procurement data and the company’s data in the Business Registers for its development.
Thank you for your critical review. Different approaches are needed to improve on the important topic of public procurement.