Skip to main content

Command Palette

Search for a command to run...

Too Many LLM Anecdotes Without Illustration

Updated
3 min read
Too Many LLM Anecdotes Without Illustration

Lately, there has been an overwhelming number of articles discussing AI tools for developers. These articles often follow a familiar pattern: they speak in broad terms about AI's impact, either positively or negatively, but rarely provide concrete examples of how these tools are being utilized. Key details such as the specific tools being used, the costs involved, the problems being addressed, and the methods of evaluation are frequently omitted.

The comments on these articles also typically fall into three categories: those who found AI for coding to be ineffective, those who consider it revolutionary, and those who believe it is beneficial for specific tasks if applied correctly. However, these comments often lack clarity on the programming tasks being tackled, the platforms or services employed, the associated costs, and the evaluation processes and so on.

The insights from individuals experimenting with AI tools are incredibly valuable. Yet, without detailed information, it's challenging to extract more than a general understanding that there is a wide range of tasks, workflows, and tools that, when combined, produce varied outcomes.

It's unlikely that those who praise AI coding assistants do so merely because they are poor programmers. Similarly, it's improbable that those who dismiss AI tools simply don't know how to use them effectively. The notion that selective use of AI tools only addresses a specific class of programming problems also seems far-fetched.

Apples and Oranges

From the articles and comments, it's easy to assume people are disagreeing about AI as a programming support tool. But the reality is they are often only arguing for or against the tool they have used. There are people who are using suites provided by the big AI providers, such as Cursor, while others are copying and pasting their files straight into Claude, and everything in between.

There are studies that examine the efficacy of AI tools, which are intriguing but often lead to the same types of comments mentioned earlier. Additionally, many articles from AI companies lack thorough research and tend to be more about marketing than providing an open comparison of tools.

What Would I Like to See?

What I'm really interested in is understanding what people are actually doing with these tools. What does their setup look like? What types of code is suitable for delegation to an LLM chat interface or agent? Are discussions about the performance of different models or modes useful, or is it the tooling that makes the significant difference?

Moving Toward More Deailed Discourse

The rapid growth of AI programming tools and models indicates that we are still in the early stages of development. I hope we can eventually reach a level of discourse similar to that of platforms, frameworks, and languages. For instance, no one claims Rust is superior to Go simply because it feels better, nor do they promote a new framework just because it seems more nuanced and responsive compared to the current best-in-class.

There’s no real criticism intended here. I didn’t intend to write one of those “Everyone’s doing X wrong, and here’s why” articles. The whole point of this blog is trying to think and reason publicly, add to the conversation and consciously try to contribute.

I've been experimenting with Qwen Code and could easily share my opinions on various forums without thinking about it as a data-point for others trying to work out how best to evaluate and use the newarray of tools at our disposal. However, I intend to document my experiences and findings in this blog instead. Perhaps others are seeking the same understanding I am, and my insights can contribute by providing semi-reproducible claims.