Swift Centre for Applied Forecasting

Bridging the Gap

The Swift Centre's 'Bridge the Gap' project seeks to improve AI policy making by providing open sourced policy advice that is built upon robust forecasts on AI capabilities, risks, and impacts by the world-leading team at the Swift Centre for Applied Forecasting.

Review forecasts and policy advice
5 forecasts • 1 policy advice submissions

Key Info

Questions Forecasted

5

Categories Covered

5

Policy Advice Submissions

1

How it Works

1

Forecast

The Swift Centre team provides forecasts on AI capabilities, impacts, and risks.

2

Policy

Anyone can submit policy advice using the forecasts and have it published on the dashboard.

3

Review

Policymakers, advisors, researchers, and funders can review the policy advice submitted.

Submissions

By December 31st 2028, will a G7 government or the WHO officially attribute a biosafety breach or an outbreak that kills at least 2 people to the use of a Large Language Model (LLM)?

Forecast: 03/03/2026Resolution: 31/12/20280 advice submissions
Review resolution criteria

The forecast resolves as YES if a G7 government (including specialized AI Safety Institutes) or the WHO issues a report identifying an LLM as a "materially contributing factor" to the event.

The Event: Must involve at least 2** human fatalities** resulting from:

Intentional Misuse: Deliberate acts such as intentional food/water poisoning, bioterrorism, or the engineered release of a pathogen.

Accidental Breach: A containment failure or lab-acquired infection where AI-generated protocols were used.

The "Uplift": The report must conclude the AI provided Significant Uplift defined as providing specialized knowledge, technical troubleshooting, or planning that the actor likely could or would not have achieved without the model.

Background

The emergence of generative AI seems bound to further complicate the already formidable task of detecting, tracking, attributing, and anticipating biosafety and bioterrorism risks. These risks get outsized media attention – fuelled by public anxiety related to terrorism, institutional accountability, the COVID pandemic experience, and disruptive technological change – even though biosafety incidents that cause substantial casualties seem to have been historically rare (or perhaps indeed for that very reason).

This forecasting question zeroes in on the intersection of LLM adoption and biosafety management, postulating not only a breach or outbreak but that the incident be officially attributed to the use of LLMs. The question thus directs attention to the process and motivations behind official attributions in the aftermath of such incidents.

The question’s scope is wide enough to cover both accident scenarios (such as a biolab leak) and bioterrorist attacks. On a strict reading it might be interpreted to include naturally arising pandemics as well, but the lack of any plausible link between LLMs and naturally originating disease outbreaks effectively leaves such pandemics outside the question scope.

Taking into account that the question criteria require fatalities as well as certain and official attribution, the pre-AI base rates of qualifying events are undoubtedly low. There has been one universally accepted qualifying event in the last 50 years. Bioterrorist attacks have not had high fatality rates. Biosafety breaches are probably more common than is generally known, but even when they are detected they have rarely resulted in fatalities.

The Swift Centre team estimated that there was a 4.7% likelihood of an incident satisfying the question criteria occurring before the end of 2028. Individual estimates were almost uniformly under 10%, with one high outlier at 15%.

The team found that this scenario is unlikely to materialize largely because of its restrictive criteria – the stipulations of at least two fatalities, official attribution, and meaningful involvement of LLMs, all to occur within a three-year time window. They explicitly assumed that for an incident to be plausibly attributable to LLM use, the role that AI was alleged to play would have to be “significant”, not marginal.

On the bioterrorism side, the forecasters believe that the introduction of LLMs is unlikely to have a major impact, at least within the question’s time frame. The major barrier to most prospective DIY bioterrorists is a lack of access to facilities, materials, and practical expertise. What LLMs are best at is making specialized knowledge more accessible, but the information that one would need to engineer a biothreat – at least in principle – is already fairly easy to find online. Meanwhile, in their current forms, LLMs are unlikely to meaningfully expedite the tasks of procuring restricted materials and gaining access to specialized facilities.

The most plausible kind of bioterrorist attack that would meet the question’s criteria might be the dissemination of lower-level, less lethal common pathogens or toxins at a large scale – an attack such as a mass food or water poisoning. The relevant kinds of bioagents (salmonella, for example) would not be very lethal on a small scale, but if disseminated widely enough their impact could pass the bar of two fatalities. LLMs might be more helpful in engineering such an attack, because the materials and techniques needed are less specialized (an LLM could conceivably give a malicious actor the idea in the first place.)

On the biosafety side, the forecasters were hard pressed to imagine scenarios in which LLMs would be substantially implicated in an accident such as a lab leak in the question’s timeframe. While LLMs are likely to be increasingly used in lab research activities, they are much less likely to be integrated into lab safety protocols, and these protocols have themselves improved in recent years.

The question’s stipulation that an incident be announced and officially attributed to LLMs brought the team’s estimate down further. Either sort of incident would have to be detected, investigated and its causes ascertained before any attribution could be even considered. The perpetual controversy over the COVID lab-leak hypothesis demonstrates how fraught (and prolonged) such investigations can be.

Even assuming that an investigation definitively establishes an incident’s cause, authorities might face conflicting incentives when deciding how, or even whether, to make it public. On the one hand, fear of reputational damage might be a strong incentive for authorities not to publicize a small-scale bioterror attack or lab leak, or to delay any reporting as long as possible. On the other hand, a plausible attribution to LLMs – even a concocted one – might provide political actors with a convenient scapegoat. In the current climate, opportunistic leaders might be very happy to blame LLMs for a biosecurity failure.

Swift Centre Forecast Visual

Policy advice