When vibe-coding went wrong

Recently, I've been coding more in my spare time. It's becoming dangerously close to what you might describe as a hobby, and is one of those things where the more stupid side-projects I pick up the more ideas for stupid side projects I have. Github copilot has been both a blessing and a curse for this; it's enabled me to mess around with things absent-mindedly, chucking things together with barely any forethought. Without it, I'm confident I'd barely do any coding in my spare time.

Do you just not enjoy coding?

I love coding, but it's very "big brain", and after a day of braining I want to explore. Burnout is real, so if I wanted to keep doing this I needed to set myself some ground rules:

Am I enjoying myself?
Am I making progress?

If the answer to either of those ever became "no", play-time's over. Laptop shut. We go again another time.

I'd been wanting to build on my FPL tool for ages (the first iteration of which I wrote about in my maiden blog post here). To give a brief bit of background, I use Google OR Tools to pick a team that maximizes the points the players are predicted to score. I love this side-project because it combines my love of maths and football, and enjoy doing the odd tech talk on the subject and meeting fellow FPL nerds before/after. The problem is that the FPL API's expected points (XP) model performed poorly for me last season, stupid robots! I thought I'd give making my own models a go.

Now, I'm not a data scientist by any stretch. In my tech talks I joke that I still think python is a species of snake, which gets one or two polite chuckles at best. There was absolutely no way I'd be able to learn how to develop my own machine learning models in my spare time, so I thought I'd try a bit of vibe-coding to see what I could come up with.

Initial progress was amazing. I very quickly "vibed" (hate myself) with Claude models, maybe the excessive emojis gave me the seratonin hit that kept me going. Its hyper-positivity made me think I'd stumbled upon an absolute gold mine. In a single evening, I developed a high-performing suite of ML Models that could relatively accurately calculate a player's XP for a given gameweek. I used almost my entire copilot premium request quota for the month in a single weekend. Ground-rules 1 & 2 very much being respected so far.

From the title of the post, you know that 💩's gonna hit the 🪭, and that's exactly what happened. Fast-forward to the present, and I've not touched this side project in months, and can't see myself picking it back up - though I'm tempted to start from scratch ("don't do it, Owen!"). Now, let it be known that I'm definitely pro-AI, so keep that in mind as I rip it to shreds for the remainder of the post. The aim of this post is to be a cautionary tale, and hopefully articulate some lessons learned.

If I were to describe vibe-coding to someone who's never used it before, I'd say it was like pair-coding with a super-eager junior developer with an astronomical velocity. The only difference would be that its skillset far surpassed mine (especially in python) so it was like a senior developer's know-how with perhaps a less-experienced developer's, well, experience. I would tell it "the sky is pink" and it would respond with "You're absolutely right!"; it's tendency to challenge me was very small, which is dangerous when you haven't got a clue what you're doing.

Over-correcting quickly became a massive issue for me; I would tell it I liked README files so I can use that documentation later. Before long it felt like I had more *.md files than *.py files in the repo, not exactly the time-saver I was looking for. It coded extremely defensively, and the code was littered with feature switches and fallbacks as if I were supporting thousands of users that required backwards-compatibility. As I fixed the issues, I found more and more hidden false-positives that I'd need to take note of to come back to later. As the code became more and more branched it became impossible to debug, and if it was hard for me you can assume it also became hard for the model.

The worse things got, the more trigger-happy I got with the enter key. This quickly snowballed and my velocity plummeted. I heard copilot performs better if you're mean to it; this sadly wasn't the case for me. I'd gone from building multiple ML models in a single evening, to spending an entire weekend (well, when the two year old was in bed) trying to retrain the models based on what I thought would be a small change. Technical debt is real, copilot had made it so the entire lifespan of a project, life and death, could be condensed into a much smaller time period.

At the very least this has been therapeutic for me, but how can I avoid stuff like this in future? I want to keep coding in my spare time, but ideally I'd have something to show for it. If I were to summarize my lessons learned, I'd probably say:

Start small: get an MVP working end-to-end as fast as you possibly can. Maybe don't start with 6 pristine ML models all at once. Copilot tends to recommend waterfall-style approaches, so be wary of the plans it generates and ask yourself what's going to bring value earliest.
Micro-manage: the less your hands are on the wheel, the harder it's going to get. It's very easy to just go "yeah yeah whatever" as you get less interested. I would sooner stop before I started spamming 'Continue' in future.

I hope this hasn't dissuaded you. I've vibe-coded since with more success, and I'm sure I will do in future. The side-project in question is here. Take a look for yourself, warts and all - it's an absolute state.

Let me know your vibe-coding pitfalls, I'd love to hear them!