Back to Blog
May 24, 2026

AI and Software Development: Why Plausible Code Is the Most Dangerous Code

AI does not remove the need for understanding

Artificial intelligence is already changing software development. Developers now use generative tools for autocompletion, refactoring, documentation, test generation, debugging and increasingly for agentic workflows where an AI system can inspect a repository, modify files and propose a pull request. This can be genuinely useful. It can reduce routine effort, help developers navigate unfamiliar codebases and accelerate repetitive tasks. But it also creates a new kind of risk: code that is wrong in ways that look right.

In traditional software development, there is a built-in control gate. When a developer works on a non-trivial system, they must first understand enough of the code to change it. If they do not understand the data flow, the architecture, the dependencies or the failure modes, they usually cannot make meaningful progress. The difficulty of the task forces comprehension. With AI assistance, that gate can disappear. A model can produce a clean patch, with good naming, familiar patterns and passing continuous integration checks, even when the underlying reasoning is flawed.

This means that reviewing AI-generated code requires more expertise, not less. The reviewer must understand the code and must also assess whether they truly understand it. That second layer is metacognition: the ability to monitor the limits of one’s own knowledge. In an AI-assisted workflow, the key question is not only “does this code look correct?” It is “can I explain why this code is correct, what assumptions it makes, what it fails to cover and how it behaves under stress?”

The cognitive traps of AI-generated code

The first trap is overconfidence. Less experienced developers may be especially exposed because the skill needed to detect a defect is often the same skill needed to know that a defect might be present. A junior developer may accept an AI-generated authentication change without noticing that authorization checks are incomplete. They may approve database code that appears efficient but introduces injection risk. They may accept concurrency logic that works in a simple test but fails under real workload.

The second trap is the illusion of explanatory depth. People often believe they understand a mechanism until they are asked to explain it step by step. Software is full of such mechanisms: protocols, caches, permissions, distributed state, retry logic, data validation, serialization, cryptography, logging and error handling. Recognizing the shape of code is not the same as understanding its consequences. A reviewer must be able to trace where data enters, how it is transformed, who is allowed to access it, where it is persisted, what happens when a dependency fails and how the system recovers.

The third trap is automation bias. When a system gives a fluent answer, humans tend to treat it as more reliable than it deserves. In code review, this can lead to a dangerous reduction of independent verification. The patch looks polished. The tests pass. The assistant explains the change confidently. The reviewer relaxes. This is exactly when the risk is highest.

Concrete examples

Imagine a public service application where an AI assistant proposes a caching layer to improve performance. The code passes latency tests, but the cache key does not include the user’s role or tenant. The system becomes faster, but it may leak one citizen’s data to another.

Consider a payment workflow where an AI tool simplifies exception handling. The new code is shorter and easier to read, but it catches errors that should have aborted the transaction. The system looks more stable because fewer errors are visible, while critical failures are silently ignored.

Or take a licensing portal where the model generates tests together with the implementation. The tests confirm what the generated code does, not what the legal specification requires. In that case, testing becomes circular. It validates the model’s assumption rather than the system’s duty.

There are also supply-chain risks. A model may suggest a package that is outdated, poorly maintained or close in name to a malicious dependency. It may generate a configuration that disables verification to “make the build pass.” It may use a cryptographic function incorrectly because the pattern appeared frequently in old code. These are not science-fiction risks. They are ordinary software risks amplified by speed and fluency.

Good practices for responsible AI-assisted programming

The first rule is simple: AI can assist, but it cannot be accountable. A human developer must take full responsibility for every accepted change. If the developer cannot explain the patch, they should not merge it. “The model suggested it” is not an engineering argument.

Second, keep AI-generated changes small. Large patches are hard to review even when written by humans. They become much harder when they mix generated code, generated tests, generated documentation and dependency changes. A safe workflow prefers small, single-purpose pull requests with clear intent.

Third, require an explanation with every AI-assisted change. The pull request should state what problem is being solved, what assumptions were made, what files were affected, what risks were considered and what tests were added. This explanation should be written or verified by the human developer, not copied uncritically from the model.

Fourth, use independent verification. Unit tests, integration tests, static analysis, dependency scanning, secret scanning, fuzzing, code coverage, software bills of materials and threat modeling are not optional decorations. They are the guardrails that prevent fluent code from becoming unsafe code. For security-sensitive components, AI-generated code should receive stricter review, not lighter review.

Fifth, separate generated tests from generated implementation. If the same model writes the code and the tests in one step, the tests may reproduce the same misunderstanding. A better practice is to define the expected behavior first, ideally from specifications, user stories, security requirements or examples, and only then ask the model to propose an implementation.

Sixth, never paste secrets, credentials, personal data or sensitive proprietary code into external AI tools without a clear legal, security and governance framework. In public sector systems, health systems, educational platforms and critical infrastructure, local or publicly controlled AI infrastructure is often the safer default. Open-source and auditable tools can also make it easier to inspect, reproduce and govern the development process.

Finally, disclose AI assistance in the development workflow. This is not about stigma. It is about traceability. Teams should know when a model was used, for what type of change, under which review standard and with which human sign-off. Mature open-source communities are already moving in this direction: AI may be accepted as an assistant, but legal and technical responsibility remains human.

AI can make good developers faster. It can help teams understand legacy code, draft test cases, improve documentation and explore alternatives. But it should not be treated as an authority. The more fluent AI-generated code becomes, the more disciplined human review must be. The future of software development should not be “vibe coding” at institutional scale. It should be accountable, testable, transparent engineering, assisted by AI but governed by humans.

Article sources:

Diomidis Spinellis, “Why reviewing AI-generated code is devilishly hard”: The article explains why AI-generated code requires heightened attention, linking the problem to the illusion of understanding, overconfidence, and automation bias: https://www.spinellis.gr/blog/20260523/index.html,

NIST, “Secure Software Development Framework, SP 800-218”: The NIST framework brings together fundamental secure software development practices and is useful as a basis for processes of review, verification, and secure code integration, regardless of whether the code was written by a human or with AI assistance: https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-218.pdf,

OWASP, “Top 10 for Large Language Model Applications 2025”: OWASP documents critical risks for applications based on large language models, such as prompt injection, insecure output handling, and supply-chain vulnerabilities, which must also be taken into account in AI-assisted programming tools: https://owasp.org/www-project-top-10-for-large-language-model-applications/,

GitHub Docs, “Responsible use of GitHub Copilot Chat in GitHub”: GitHub’s documentation stresses that users should be especially careful when using Copilot Chat to generate code for applications with security requirements, and should thoroughly review and test the generated code: https://docs.github.com/en/copilot/responsible-use/chat-in-github,

Pearce, Ahmad, Tan, Dolan-Gavitt, Karri, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions”: This empirical study examines cases in which GitHub Copilot may suggest insecure code in high-risk scenarios, reinforcing the need for independent security review: https://arxiv.org/abs/2108.09293,

Linux Kernel Documentation, “AI Coding Assistants”: The Linux kernel guidelines set out a mature governance principle: AI tools may assist, but only humans can take responsibility, review the code, ensure license compliance, and sign off on the contribution: https://docs.kernel.org/process/coding-assistants.html.