Ask ChatGPT, Claude, or Gemini to generate a password and you'll get something that looks strong—a mix of uppercase and lowercase letters, numbers, and special characters. It might look like G7$kL9#mQ2&xP4!w. Randomized. Complex. Secure.

Except it isn't. And if your team has been using AI chatbots to generate credentials, recent research suggests those passwords may be far weaker than they appear.

What the Research Found

In February 2026, AI security firm Irregular published a study titled "Vibe Password Generation: Predictable by Design." The researchers prompted three major large language models—Anthropic's Claude (Opus 4.6), OpenAI's ChatGPT (GPT-5.2), and Google's Gemini (Gemini 3 Flash)—to generate 16-character passwords mixing uppercase and lowercase letters, numbers, and special characters.

The results were striking.

Claude Opus 4.6: Out of 50 independent prompts (each in a separate conversation), only 30 unique passwords were generated. One specific password appeared 18 times—a 36 percent collision rate for a single string. The majority of outputs started and ended with the same characters.

ChatGPT (GPT-5.2): Nearly every password began with the letter "v," and almost half used "Q" as the second character. The patterns were consistent enough that an attacker could meaningfully narrow their search space.

Gemini 3 Flash: Most passwords started with either uppercase or lowercase "K," followed predictably by characters like "#," "P," or "9."

Perhaps most telling: across all 50 passwords generated by each model, there were zero instances of repeating characters. While that might look more "random" at a glance, Irregular noted that this would be statistically very unlikely in a truly random output—its absence is itself a pattern.

Why This Happens

The explanation is rooted in how large language models work. LLMs generate text by predicting the most likely next token based on patterns learned from training data. That's the opposite of what secure password generation requires: uniform, unpredictable randomness.

When you ask an LLM to "generate a random password," it doesn't call a cryptographically secure random number generator. It produces a sequence of characters that looks random based on what it has learned passwords typically look like. The result mimics the appearance of randomness without actually achieving it.

Irregular's entropy analysis quantified the gap. A truly random 16-character password using the full printable ASCII set should carry approximately 98 bits of entropy. Claude's generated passwords measured roughly 27 bits. GPT-5.2's 20-character passwords were even lower at approximately 20 bits—some individual character positions showed as little as 0.004 bits of entropy, meaning those positions were 99.7 percent predictable.

In practical terms, passwords with this level of entropy could be brute-forced in hours, even on modest hardware.

Temperature Settings Don't Fix It

A natural assumption is that adjusting the LLM's "temperature" parameter—which controls how much randomness the model introduces into its outputs—would solve the problem. Irregular tested this directly.

Running Claude at its maximum temperature of 1.0 still produced the same repeated patterns. Reducing it to 0.0 caused the identical password to appear on every single run. The researchers concluded that this weakness is "unfixable by prompting or temperature adjustments." It's a fundamental limitation of how these models generate output.

It's Not Just Chatbots—Coding Agents Are Affected Too

The risk extends beyond employees asking a chatbot for a password. AI coding agents—tools like Claude Code, Codex, and Gemini CLI that generate and deploy code—sometimes produce LLM-based passwords during development tasks without the developer explicitly requesting them.

In vibe coding environments, where code is built and deployed with minimal manual review, these weak credentials can slip directly into production. Irregular found LLM-generated password patterns—characteristic substrings like K7#mP9 and k9#vL—appearing in public GitHub repositories, docker-compose files, .env files, and setup scripts.

An attacker who suspects a service was built with a specific AI coding tool could attempt to enumerate that model's password patterns to gain access—creating an attack surface that didn't exist before AI-assisted development became widespread. We explored related risks in our article on shadow AI and what business leaders should know.

Kaspersky's Corroborating Research

Irregular's findings don't exist in isolation. In May 2025, Kaspersky's Alexey Antonov, Data Science Team Lead, independently tested 1,000 passwords generated by ChatGPT, Meta's Llama, and DeepSeek. The results reinforced the same conclusions.

Antonov found that certain characters appeared far more frequently than others across LLM-generated passwords—ChatGPT favored "x" and "p," Llama preferred "#" and "p," while DeepSeek leaned toward "t" and "w." A truly random generator would show no such preferences.

More concerning: 26 percent of ChatGPT's passwords, 32 percent of Llama's, and 29 percent of DeepSeek's omitted special characters or digits entirely—despite being asked to include them. Some models generated recognizable dictionary words with predictable substitutions ("S@d0w12," "M@n@go3") or even variations of the word "password" itself ("P@ssw0rd!23").

When Antonov ran these passwords through Kaspersky's machine-learning-based cracking tool, 88 percent of DeepSeek's and 87 percent of Llama's outputs were not strong enough to withstand a sophisticated attack. ChatGPT performed somewhat better, but still had 33 percent of its passwords flagged as inadequate.

"The problem is LLMs don't create true randomness," Antonov noted. "Instead, they mimic patterns from existing data, making their outputs predictable to attackers who understand how these models work."

What This Means for Businesses

For organizations, the practical implications are straightforward but worth stating clearly.

Don't Use AI Chatbots to Generate Passwords

If employees are asking ChatGPT or Claude to create passwords for work accounts, those credentials may be significantly weaker than anyone realizes. The passwords look complex, which creates a false sense of security—but the underlying entropy is a fraction of what a dedicated password generator provides.

This is especially relevant as AI tools become embedded in daily workflows. Employees who wouldn't think to use "password123" might readily accept whatever a chatbot generates, assuming it's been properly randomized. As we noted in our piece on what businesses still get wrong about password security, the human tendency to trust convenient shortcuts remains one of the biggest credential risks.

Use Dedicated Password Managers

Password managers like 1Password, Bitwarden, and Dashlane use cryptographically secure random number generators (CSPRNGs) to produce passwords. These generators draw from system-level entropy sources that produce genuinely unpredictable output—fundamentally different from an LLM's token prediction.

We recently covered new research on password manager encryption architectures, which is worth reading alongside this article. While no tool is perfect, the gap between a CSPRNG-generated password and an LLM-generated one is enormous.

Review AI-Generated Code for Hardcoded Credentials

Development teams using AI coding assistants should treat any password, secret, or API key produced by an LLM as untrusted. This means auditing AI-generated code for hardcoded credentials, rotating any that are found, and standardizing on secure generation methods like openssl rand or equivalent system utilities.

Establishing clear policies about which tools can generate security-critical values—and which cannot—is part of a broader AI usage policy that organizations increasingly need.

Consider Passwordless Alternatives

This research adds another argument for moving toward passwordless authentication where possible. Passkeys, hardware security keys, and biometric authentication eliminate the password generation problem entirely. While the transition is gradual, organizations can start by enabling passkey support for critical systems and services that already offer it.

The broader shift away from password-centric security is already underway. The latest NIST password guidelines emphasize length over complexity and breach detection over forced rotation—a recognition that the traditional approach to passwords has fundamental limitations.

Layer Your Defenses

Regardless of how passwords are generated, multi-factor authentication remains essential. Even a strong password can be phished or leaked in a breach. MFA ensures that a compromised credential alone isn't enough to gain access. We covered implementation considerations in our article on common MFA mistakes in the workplace.

The Bigger Picture

This research raises a question that extends well beyond passwords: if LLMs can't produce reliable randomness for something as straightforward as a credential, what about other security-adjacent tasks users are delegating to AI? Developers increasingly ask AI assistants to generate API keys, cryptographic salts, session tokens, and other values where unpredictability is essential.

The lesson isn't that AI tools are useless—they're transforming how work gets done. But understanding what these models are and aren't designed to do is critical. LLMs are pattern-matching engines trained to produce probable outputs. Security credentials require the opposite: improbable outputs that no pattern can predict.

Using the right tool for the right job has always been a core principle of good security practice. In a world where AI can do more every month, knowing where its capabilities end is just as important as knowing where they begin.


This article is intended for informational purposes only and does not constitute professional security, legal, or compliance advice. Organizations should consult with qualified cybersecurity professionals to assess their specific credential management needs and develop appropriate security policies.