October, 2025 ∙ 6 minute read
This was originally an internal discussions post, surfacing on my blog because it is interesting to see how far we’ve come since October 2025 and how much my thinking has changed. For context, Claude Opus 4.5 was released November 24, 2025. Part 1 of 2.
Last week, I had an insight about LLMs and where they are a good tool for the job and where they are not. I think this captures some of the frustration I feel in using Copilot and GitHub’s various LLM-driven features. As an engineer, my experience so far is this: there are a limited number of places where the LLM is genuinely helpful, but for the vast majority of my interactions: LLMs live somewhere on the scale of counterproductive and frustrating to downright harmful.
Some places they are useful:
Some places I find them incredibly frustrating and useless:
My insight is that this all comes down to the fact that we are using language models and as-such they are incredibly good at modeling/predicting human language (including code languages), but they have no actual representation of the real world or how the real world works. This is very important to understand. An LLM “knows” (to a surprising degree) about how humans talk about how reality works: but has zero ability to predict, reason, or learn about reality itself. I think this might be a fundamental limitation. Language does not equal intelligence, the arrow goes the other way (I know that people might fervently disagree with this).
Armed with this insight, you can make better decisions about how and when to use LLMs and what we might expect out of their future capabilities. Oh and I should be careful to say: we don’t want to conflate LLMs with AI or ML. The broader set of machine learning technologies and techniques (outside of large language models) may hold the keys for more sophisticated code generation and software engineering, I’m speaking here about the LLMs that form the foundation of products like Copilot in 2025.
So the heuristic then becomes simple. Ask yourself: What am I trying to accomplish right now and can that be done well (or accelerated) by a sophisticated model of how humans use language? Remember, the LLM doesn’t understand your problem in any way, but it can simulate how humans generally talk about many, many things. Here are a few examples, I may add more as they come up:
Should I use an LLM to review my code? Maybe. With care. They are very useful for reviewing spelling and grammar and sentence structure (especially in docs and comments), but don’t forget that they don’t understand what you’re trying to do and they don’t understand e.g. how the rust compiler works. They are OK at surface level review of code written in programming languages in their training sets, but they struggle with anything nuanced or detailed because they don’t understand what your program is trying to do and why or how or what the best way to model that particular problem in code might be. Today, you’re much better off grabbing another human.
Should I respond to each and every CCR comment? No. Definitely not. Ignore and dismiss them, do not waste your words unless it is more important for you to be giving the CCR team feedback than it is for you to accomplish your main day job.
Should I use an LLM agent to write this code? No. Only if the code is throwaway: prototypes, demos or only if you’re committed to iterating and owning the output. Today I find that it takes 2x to 3x the time and effort to write good code with an agent for a small real-world task vs just doing it yourself. If you do use an agent, your name should be on the blame for all future bugs, security vulnerabilities, and availability issues related to that code. Do not submit code that you do not fully understand and have signed off on.
Should I use Copilot completions and/or NES in my editor? Yes, but toggle it on and off liberally and don’t accept suggestions blindly. It’s a good skill to be able to work with and without the assistance of LLM completions and you should learn to discern rapidly whether a suggestion is valid/invalid. I recommend limiting multi-line completions unless you’re going slowly enough to read and digest everything that’s being written. Be aware of the tendency of LLMs to be verbose and to generate slop (hallucinations). Again, remember that this code is YOUR responsibility.
Should I rely on this LLM summary? Maybe. Depends on the context. Company all-hands? Probably fine. Technical requirements document? Maybe use it to get oriented, but if you need to load that context up in your mental model the summary is unlikely to be sufficient. Use summaries as a shortcut for knowing whether or not you need to dive in and invest or move on.
Should I use an LLM to port this library to Rust? Probably not, but depends on the source and destination language and the task. LLMs are heavily trained on Python and JavaScript and translation is actually an ideal task for a language model, but I would only recommend using for a port if you don’t care too much about the clients of the library or the quality of the code. I would say this is maybe one quality step down from outsourcing that work to a team in another part of the world that has a similar amount of limited context and I expect that LLMs will eventually be much better at this.
Continued in Part 2: Models of reality vs. models of what people say.
This post was hand written, but in porting it to my public blog, I did use a model (Claude Opus 4.7) to copy edit; the changes were minor spelling and grammar fixes. See my AI attribution page for more.