My curiosity got the best of me: can AI really hold its own as a coding partner? Can it perform at the level of a senior developer—making architectural decisions, interpreting requirements, and writing production-grade code—or is it only suited to junior-level work (if that)? I wanted answers. I had no idea how deep the rabbit hole would go.
I chose Cursor.com, the new darling of AI-powered IDEs. At the time, Cursor’s operating model was unique: mix and match large language models to inspect code and documentation, then either suggest changes (“Ask” mode) or execute them (“Agent” mode) across the entire codebase. Enormous potential — especially for full-stack projects, where developers constantly switch between languages and frameworks. So I decided to throw it a real challenge: a brand-new full-stack web and mobile app project.
My approach was simple: let AI drive. I would act as the product owner and technical interviewer, not the hands-on developer. With decades in software, this was harder than it sounds — my instinct was to intervene, but I’d try to resist —guiding with prompts and requirements while letting the AI make the technical calls.
OMG.
The plan was to see whether AI could handle everything from requirements gathering to code generation across multiple languages. The project would include:
- A Slim PHP backend API
- A MySQL data store
- A React frontend
- A Swift-based iOS app
- A high-performance AWS Lambda endpoint for generating pre-signed S3 URLs
Given the simplicity of that Lambda function, it was a great testbed. AI would implement it in Rust, Go, and Node.js, comparing both AI competence and runtime performance. We would also integrate services from AWS and Cloudflare, testing whether the models could interpret live documentation — or whether they’d rely on stale, pre-training knowledge.
After nearly 1,500 hours collaborating with various AI agents in Cursor (and elsewhere), I can say this: the most surprising discoveries weren’t about the AI.
They were about me.
Stay tuned — the next post digs into how the experiment unfolded, what worked, what failed spectacularly, and how I adjusted (or didn’t).