When Your AI Goes Dark: Why Businesses Need a Continuity Plan for LLM Outages

The large language models that now sit inside many businesses — Claude, ChatGPT, Gemini, Copilot — feel like infrastructure. Staff draft proposals with them, summarize meetings with them, triage inboxes with them, generate code with them, and answer customers with them. When the model is up, the business moves faster. When the model is down, a lot of people suddenly can't do their jobs.

That is a change in risk profile that most small and mid-sized businesses haven't formally acknowledged. We've written for years about the real cost of downtime when systems go offline in the context of servers, networks, and SaaS. The same logic now applies to the AI services your team has quietly become dependent on — and the honest answer is that the more useful these tools are, the more dangerous an outage becomes.

This article is a plain-English look at why LLM reliability matters for business continuity, what has already gone wrong, and what a non-technical owner can actually do about it.

LLMs Have Already Proven They Can Go Dark

The idea that a major AI provider could become temporarily unreachable is not hypothetical. It's a fact that can be checked on any vendor status page, and it has been a recurring event across every major LLM provider:

OpenAI ChatGPT has logged dozens of incidents in recent months on its own status page. High-profile events have included a multi-hour global outage in June 2025, a routing-misconfiguration outage in December 2025, and a January 2025 elevated-error event that affected both ChatGPT and the API. An upstream Microsoft Azure data-center issue in late 2024 took ChatGPT offline for roughly nine hours.
Anthropic Claude maintains a public status page at status.claude.com that logs incidents across Claude.ai, the Claude API, Claude Code, and specific models. Outages affecting logins, file uploads, and individual model families (including Opus-class models) have been recorded there.
Underlying cloud providers can take down the AI service even when the AI service itself is healthy. The October 2025 AWS US-EAST-1 outage lasted around fifteen hours, affected more than 140 AWS services, and disrupted hundreds of well-known platforms — a reminder that "my AI vendor" and "my AI vendor's hosting provider" are two separate points of failure.

Specific timings, durations, and causes come directly from the vendors' own status pages and post-incident reports; the details vary, but the pattern does not. Every cloud service the business world runs on has gone down at some point, and the newer the service, the thinner the operational history behind it.

It Isn't Just Outages: Accounts Can Disappear Too

A straight outage is only one way an LLM can be pulled out from under your team. The other — and it's the one most business owners are least prepared for — is losing your access to the platform itself.

Public reporting and social media posts have described incidents in which an entire organization's access to Claude was revoked in response to alleged Usage Policy violations, with one widely circulated case cutting off roughly 60 employees and directing the company to appeal via a Google Form. Anthropic's own help center documents the "Safeguards warnings and appeals" process, which routes through a support form and an email address. There is no phone line, no account manager for typical business-tier accounts, and no same-day SLA on appeals. Similar stories exist around other AI vendors, including paid accounts suspended without advance warning.

From a continuity perspective, this is effectively indistinguishable from an outage — except that:

It can affect your organization while every other customer is unaffected.
The resolution timeline is measured in days to weeks, not hours.
Customer service is often an automated or AI-assisted first line, with limited ability to speak to a human quickly.
In the meantime, your staff are locked out of the workflows they were relying on.

This isn't a criticism of any one vendor. It's a description of how hyperscale consumer-and-business AI services are structured in 2026. The scale that makes them cheap is the same scale that makes their support high-volume and largely automated.

Why This Is a Business-Continuity Problem, Not an IT Problem

Business continuity planning (BCP) has traditionally lived inside IT — what happens when the server dies, when the internet goes out, when ransomware hits. AI dependency is turning it into an operations and leadership problem.

Consider the question a business owner should be able to answer in one sentence: If Claude or ChatGPT was unavailable to my team from 9 a.m. Monday through 5 p.m. Friday, what exactly would not get done?

In most businesses we speak with, the honest answer includes at least a few of the following:

Client proposals and quotes that a salesperson now drafts with AI and edits, instead of drafting from a template.
Meeting notes and action items summarized by an AI meeting assistant — the kind we've covered in AI meeting recorder privacy.
Customer support replies written with AI-assisted suggestions.
Marketing copy, product descriptions, job ads, internal policies.
Code changes, fixes, and deployments that developers now run through AI coding agents — see our pieces on Claude Code security considerations and coding agents and security risks.
Data analysis, spreadsheets, and reporting that used to take a day and now takes an hour.

If that list is long, and the vendor behind it is one company, then that vendor is a single point of failure. The strategic question isn't whether you love the tool. It's whether the business is positioned to absorb a one-to-five day disruption to that tool without losing revenue, clients, or trust.

The Data-Portability Problem Nobody Talks About

Beyond uptime, there's a quieter issue that affects continuity planning: the data your team creates inside these tools is often hard to back up, and in some configurations, hard to export at all.

Typical examples include:

Chat histories with months of context, saved prompts, and custom instructions — often accessible only through the vendor's own UI.
Projects, Workspaces, or Custom GPTs / Claude Projects that bundle together files, system prompts, and conversation history. These generally cannot be migrated to a competitor.
Uploaded files and knowledge bases used to ground AI responses. The originals should live somewhere you control; in practice, many businesses treat the AI provider as the storage system.
Fine-tuned or customized assistants that encode workflow knowledge built up over months.

If the account is suspended, the vendor is unreachable, or the region you use is temporarily blocked, you may not have easy access to any of this. That's a data-ownership issue that sits alongside the availability issue, and it's one we've touched on in different forms in backup and recovery assumptions that fail and best practices for managing sensitive data.

What a Reasonable AI Continuity Plan Looks Like for an SMB

You don't need an enterprise-grade disaster recovery program to manage this risk. You need a short, honest plan that treats AI like any other critical vendor. Here's a practical starting point.

1. Inventory the AI tools your team actually uses

This sounds obvious, and almost no one has done it. Many of the tools will have been adopted informally — the shadow AI problem we've written about. For each tool, document:

Who is the vendor?
Which team or role depends on it?
What work would stop if it was unavailable for a day? A week?
What data lives inside it that doesn't live anywhere else?

2. Identify the single points of failure

If every department in the business is routed through the same AI provider, that provider is load-bearing. Note that, and treat it accordingly. This is the same concept we explore in third-party vendor risk from an SMB perspective — when one vendor supports everything, that vendor's bad day is your bad day.

3. Keep source material in systems you control

If an employee uploads contracts, playbooks, customer lists, or design docs into an AI tool, those files should also exist in a system you fully own — Microsoft 365, Google Workspace, an internal file server, or a dedicated document-management system with its own backups. The AI tool is a consumer of your data, not its system of record.

4. Export what you can, on a schedule

Most major AI platforms provide some form of data export — conversation history downloads, project exports, or API-based access. The mechanisms are limited and vary by vendor, and in some cases don't cover everything you'd want (custom system prompts, fine-tunes, workspace configuration). Where export is available, use it on a routine cadence and store the output with your normal business backups. Where it isn't available, that gap is itself a finding to flag internally.

5. Identify a manual fallback for each critical workflow

For each workflow that now runs on AI, define what the team would do if the AI was gone for a week. Old templates. Previous playbooks. A slower, human process. The point isn't that the fallback is as good — it's that the business isn't paralyzed when the fallback is the only option.

6. Consider more than one AI vendor where it's reasonable

For some workflows, having a second vendor available — even as a backup — is a legitimate option. For others (for example, one that requires a specific model's quality), it isn't. A thoughtful approach treats multi-vendor as a tool to use where it helps, not a mandate. The goal is to avoid betting the whole operation on one login screen.

7. Write down the incident response steps

If Claude or ChatGPT was unreachable this morning, what would a manager tell the team? Who checks the status page? Who communicates to clients whose deliverables are delayed? Who authorizes switching to the backup process? None of this needs to be long. It just needs to exist before the incident, which is the same principle we've written about in incident response planning before something happens.

8. Bake it into your AI usage policy

All of this belongs in the same document that governs how your team can use AI responsibly. If you don't have one yet, we walk through what belongs in it in why your organization needs an AI usage policy — and what it should include.

A Note on "Open-Source" and "Private" AI

One popular response to this risk is the argument that organizations should move to self-hosted or open-source models instead of depending on Big Tech's hosted AI. That approach has real merits — you control the weights, you control the environment, and a vendor can't unilaterally revoke your access. It also has real costs: infrastructure, model quality, ongoing operational overhead, security of the deployment itself, and the responsibility of keeping up with a fast-moving field.

For most SMBs, the realistic answer is not "replace hosted AI with private AI." It's: understand which hosted AI you depend on, reduce that dependency where it's reasonable, and know what you'd do if it was gone. Private or on-prem AI is one lever among several, not a universal cure. We'll keep writing about specific options as the landscape matures.

What to Take Away

LLMs are genuinely useful, and the businesses using them well are getting more work done than the businesses that aren't. That isn't in dispute. What's in dispute is the assumption that these tools will always be available, always unrestricted, and always reachable. They won't be — not because of any particular vendor's failings, but because no cloud service in history has ever achieved that, and AI is not an exception to how the internet works.

The practical position is the same one we've recommended for every other critical vendor: expect the outage, keep your own copy of the data, have a plan you've written down, and make sure no single login screen can take your business offline.

This article is intended for general informational purposes only and does not constitute professional security, legal, compliance, or business-continuity advice. Details about specific incidents, outages, and vendor policies are based on public reporting, vendor status pages, and help-center documentation as of the date of publication and may evolve over time. Organizations should consult qualified professionals and review each vendor's current terms, status history, and export capabilities before making decisions that affect their operations.