AI becomes useful not after a “magic prompt,” but after three practical decisions: choose the right model for the job, give it usable context, and stop the conversation when the context turns into noise. Most “ChatGPT lies” complaints are not proof that AI is useless; they are usually symptoms of a poorly framed task.
Why models produce nonsense
I hear the same story all the time: someone opens a chat, types “make me a strategy,” gets generic fluff, and closes it. A month later they try again. Same result. After the third attempt the conclusion becomes: “AI does not work.”
But a language model is not an employee who automatically understands your business, constraints, preferred format, and which facts must not be invented. It is a scalpel. In a surgeon’s hands, it is precise. Without preparation, it creates problems.
Most failures come from one of three places:
- the wrong model was chosen;
- the task was described too vaguely;
- the chat is already overloaded with old messages, contradictions, and lost instructions.
This part is about models and context. The second part is about prompts as proper task briefs.
Models are not the same
I do not divide models into “good” and “bad” in the abstract. I divide them by the work they can handle.
Claude is my main working model for code, long documents, architecture reviews, and complex instructions. It usually holds structure well on long tasks.
Gemini is strong for large inputs: long documents, meeting recordings, video, audio, and large reports. Google opened access to a 2 million token context window for Gemini 1.5 Pro developers, which changes the class of possible tasks: you can analyze a full body of material, not just a fragment.
“Today, we’re opening up access to the 2 million token context window on Gemini 1.5 Pro for all developers.” — Google Developers Blog
In plain English: a large context window does not automatically make a model smarter, but it lets the model see more source material at once. That matters for meetings, documents, audio, and codebases — if the input is structured.
Perplexity is a practical search replacement when I need fast research with links. I use it not as an oracle, but as a way to build a map of sources faster.
Qwen / local models are useful when data should not leave your environment: internal documents, drafts, personal context, operational routines.
DeepSeek API and other low-cost API models are useful for high-volume iterations: classification, rough data processing, bulk hypothesis checks. Not one perfect answer, but many affordable passes.
ChatGPT works well for many people, but for a long time it was not my main tool for complex instruction-following work. This is not a religion. You test the model on the task.
Context window: memory, but not exactly
A context window is the amount of information the model can consider in the current request: your messages, system instructions, files, document fragments, search results. It behaves like short-term memory, with one important caveat: the model does not “remember” the conversation like a person. It receives the input again and generates the next answer from what fits into context.
Google described the scale of long context in its Gemini 1.5 announcement:
“This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words.” — Google Blog
The practical translation: long context is not for keeping one endless chat alive for weeks. It is for loading a relevant body of material and asking a specific question about it.
Why long chats degrade
Thirty or forty messages in one chat can already turn a task into mud. You get old solution versions, cancelled requirements, random clarifications, emotional comments, and fragments that are no longer relevant. The model sees all of that as input.
When the context is full, products may trim or summarize the history. You usually do not see what was lost. It may be the first important instruction, a formatting constraint, or the explanation of why a certain source must not be used.
My working rule is simple: one chat, one task. If the conversation starts drifting, I ask the model for a concise decision summary, review it myself, open a new chat, and continue with clean context.
Business use cases
Meeting preparation. Load client context: message history, past decisions, open questions, documents. Get a summary, risks, and a list of questions to ask.
Documents. Contracts, proposals, partner emails, technical briefs — the model can draft quickly if you provide the template, inputs, constraints, and desired format.
Analytics. Long reports, tables, market research, customer feedback. AI speeds up initial structuring, but factual conclusions still need source checks.
Prototyping. Describe a product idea and get an MVP structure, risks, user scenarios, and the questions that must be answered before development.
How to work reliably
My minimum:
- choose the model for the input type: code, documents, search, local data;
- provide source material instead of asking the model to guess;
- state quality criteria and output format explicitly;
- do not keep every task in one chat;
- verify facts and request links when discussing the external world;
- move repeatable knowledge into RAG, a knowledge base, or project memory.
Anthropic’s prompt engineering docs make an important point: not every failure should be solved with a better prompt. Sometimes the right answer is another model, different latency/cost tradeoffs, or proper evaluation.
“Not every success criteria or failing eval is best solved by prompt engineering.” — Anthropic Claude Docs
That matters. If a model keeps failing on a task, do not keep casting prompt spells. You may need a different tool, different context, or a real data integration.
The short version
AI is not supposed to work from “make it nice.” It starts working when you treat it as a professional tool: choose the model, provide context, constrain the task, and control the result. Then it stops being a toy and becomes part of the team’s operating system.
