
Llm Productivity Experiment
This article describes an experiment that sought to determine if no-cost LLM-based code generation tools can improve developer productivity.
/filters:no_upscale()/sponsorship/topic/8e5012e2-847d-4389-ac4d-ff70a961fc6e/NeuBirdLogo-1770640733556.png)
This article describes an experiment that sought to determine if no-cost LLM-based code generation tools can improve developer productivity. This TensorBlue analysis is based on reporting and source material from InfoQ (https://www.infoq.com/articles/llm-productivity-experiment/).
What Happened
InfoQ Homepage Articles Experimenting with LLMs for Developer Productivity
Experimenting with LLMs for Developer Productivity
To help understand if current LLM tools can help programmers become more productive, an experiment was conducted using improved code coverage of unit tests as an objective measure.
No-cost LLMs were chosen to participate in this experiment; ChatGPT, CodeWhisperer, codellama:34b, codellama:70b, and Gemini. These are all free offerings which is why Github Copilot is not on this list.
An experiment was designed to test each of the above selected LLM’s ability to generate unit tests for an already coded, non trivial web service. Each of the above mentioned LLMs were tasked with the same problem and prompting. Then the output was combined with the existing open source project which was then compiled and unit tests run. A record was kept of all the corrections needed to get the build to pass again.
None of the LLMs could perform the task successfully without human supervision and intervention but many were able to accelerate the unit test coding process to some degree.
It hasn't even been two years since OpenAI announced ChatGPT which is the first mainstream Large Language Model from a generative pre-trained transformer to be made available to the public in a way that is very easy to use.
This release triggered lots of excitement and activity from Wall
This topic matters because it signals where AI product delivery, engineering execution, and technical strategy are moving next.
Implications for Product and Engineering Teams
For TensorBlue readers, the useful question is not just what happened, but how this changes product architecture, engineering priorities, AI delivery, observability, team workflows, or executive decision-making.
- Review whether this changes your AI roadmap, platform architecture, or engineering operating model.
- Identify the specific workflow, reliability, governance, or developer-productivity lesson that applies to your organization.
- Convert the lesson into a small production experiment with measurable quality, latency, cost, adoption, or risk metrics.
- Document source assumptions clearly so teams do not overgeneralize from incomplete public information.
TensorBlue Takeaway
The practical opportunity is to turn this signal into a concrete implementation decision: better AI systems, stronger product instrumentation, more reliable automation, and clearer technical governance. Teams that connect public technology shifts to their own delivery systems will move faster without adding unnecessary complexity.
TensorBlue AI Desk
AI systems, software engineering, and product strategy
Related AI Development Resources
Discover more from TensorBlue's expertise
Synthetic Data Generation
Generate training data for personalization
ServiceWeb App Development
Custom e-commerce platforms
ServiceAI Chatbot Development
Conversational commerce bots
SolutionAI for Retail
Personalization and recommendation engines
SolutionAI for Marketing
AI-powered marketing automation
IndustryRetail
AI for retail and omnichannel