Mazle AI Blog

The Setup

A recruiter at a well-known Indian quick commerce company told me this story with the kind of tired laugh that only comes from living through something truly stupid.

The company was scaling fast. Hundreds of engineering roles open. The kind of hiring sprint where everyone is moving so quickly that nobody stops to ask whether the process actually works. They needed backend engineers, frontend engineers, mobile engineers, data engineers. All of them. Yesterday.

So they did what every high-growth company does. They set up an online coding assessment as the first filter. HackerRank. Standard stuff. DSA problems. Automated scoring. Clean pass/fail cutoff. The idea was simple: screen out the noise at the top of the funnel so the engineering team only spends interview time on candidates who can actually code.

Sensible. Logical. Efficient.

They sent the assessment to their first major batch. 400 candidates. All from IITs.

The Punchline

200 of them scored perfect.

Not "scored well." Not "passed comfortably." Perfect. Full marks. Every problem solved. Every test case passed. Every edge case handled.

Half the batch. Flawless.

Now, if you've ever written a HackerRank problem, or taken one honestly, you know that a perfect score across a multi-problem assessment is rare. A handful out of 400? Sure. Maybe 10 to 15 genuinely brilliant engineers who happened to have seen similar problems before.

But 200? Out of 400?

The recruiter paused when he told me this, letting the number sit in the air for a second.

"We knew something was off. But what do you do? You can't just throw out 200 perfect scores. You can't accuse IIT graduates of cheating without proof. And the hiring managers were thrilled. They thought we'd found a goldmine."

The Unraveling

They moved the top scorers into live interviews. Technical rounds with actual engineers on the team. Whiteboard problems. System design discussions. The real stuff.

It fell apart immediately.

Candidates who had "solved" complex algorithmic problems in the assessment couldn't explain basic data structures in conversation. People who had written flawless code on HackerRank stumbled through elementary debugging scenarios. One after another, the perfect scorers revealed themselves to be deeply, thoroughly average. Some were below average.

The pattern was obvious in hindsight. Shared answer keys circulating through WhatsApp groups and Telegram channels. Screenshots of solutions passed around hostel rooms. A small cottage industry of "assessment solvers" who would complete tests on behalf of candidates for a fee.

The entire first filter, the thing designed to save engineering time, had done the opposite. It had wasted more time by flooding the interview pipeline with candidates who never should have made it past screening.

The Ripple Effect

This story isn't unique. It's not even unusual.

A senior recruiter at a major enterprise software company told me a nearly identical version. They were running online coding assessments for lateral engineering hires. Standard setup. Proctored. Timed. Supposedly secure.

Then the evaluators started noticing something strange. Assessments designed to take 6 hours were being completed in 1 to 2 hours. Consistently. By candidates who then struggled in interviews.

They ran the numbers. Their estimate: 6 out of every 10 candidates were cheating. Using AI tools, secondary screens, or outright proxies.

Their response was radical. They scrapped online assessments entirely and moved to in-person, physical coding challenges. Candidates come to the office. Sit in a room. Build something real over 5 to 6 hours while being observed.

Completion rates went up (no more chasing candidates to finish tests). Candidate quality went up (no more proxy solvers). Time-to-fill actually improved because they stopped wasting interview slots on people who couldn't actually code.

But it cost them. The logistics of hosting physical assessments. The office space. The evaluator time. The candidate travel coordination. A massive operational lift, all because the "efficient" digital filter had been completely gamed.

The AI Chapter

And then it got worse for everyone.

This was before the current generation of AI tools went mainstream. Before a candidate could paste a problem into ChatGPT and get a working solution in 30 seconds. Before Claude could debug code in real time. Before entire assessment sessions could be run through a second screen with an AI copilot whispering answers.

A recruiter at a fast-growing fintech company told me they tried an AI-powered video screening tool. Asynchronous video interviews where candidates answered role-specific questions recorded on camera. Seemed elegant.

Candidates cheated on those too. Second screens with notes. Someone off-camera feeding answers. Scripts taped to the monitor just out of frame. The team ended up manually reviewing every single video submission to check for authenticity, which defeated the entire point of automation.

They abandoned the tool.

An engineering leader at a well-funded startup put it to me simply: "A traditional take-home assignment is hackable in 10 minutes with the right prompts. That entire category of assessment is dead."

The Uncomfortable Question

Here's where the story stops being funny.

Every company I spoke with has experienced some version of this. The specifics vary. The scale varies. But the core failure is the same: the filter designed to save time became the thing that wasted the most time.

And the response across the industry has been remarkably consistent. Not to fix the assessment. Not to build a better filter. But to add more interview rounds on top of the broken filter.

The assessment doesn't work, so we add a live coding round. The live coding round is too narrow, so we add a system design round. The system design round doesn't catch cultural fit, so we add a hiring manager round. The hiring manager round doesn't catch collaboration skills, so we add a team fit round.

The candidate now has 6 rounds. The company still isn't confident in the outcome.

A recruiter at a major enterprise search company described the endpoint of this escalation: candidates go through a recruiter screen, three coding rounds, and a culture round. Then a hiring committee of two senior directors reviews all the feedback anonymously and can still reject after 5 rounds. Even with all that, appeals are possible, and additional interviews get scheduled.

This is a company with the resources to build the most rigorous process on the planet. And the process still ends with a committee reading transcripts and making a judgment call.

The Actual Learning

The learning from the 200-perfect-scores story isn't "candidates cheat." Everyone knows that. The learning is about what breaks when your screening process is disconnected from your evaluation process.

Assessments test performance in isolation. Interviews test performance in context. When these two things aren't linked, you get exactly what happened: a filter that optimizes for the wrong signal, followed by interviews that start from zero because they can't trust anything that came before.

The companies that have figured this out share a few traits:

They make the assessment a conversation starter, not a gate. The best processes I encountered use assessment results as input to the interview, not as a pass/fail. The interviewer walks in knowing what the candidate built, how they approached it, where they struggled, and asks questions rooted in that context. The assessment becomes evidence to discuss, not a score to threshold.

They accept that any unsupervised evaluation will be gamed. The faster companies internalized this, the faster they stopped being surprised by it. The question isn't "how do we prevent cheating" but "how do we design evaluation that's useful even if the candidate had help." Can they explain their approach? Can they extend the solution? Can they debug it under pressure? These are signals that survive even when the initial code was AI-assisted.

They reduce the gap between assessment and interview. The worst processes have a week-long gap between a take-home and the first interview. By then, the candidate has forgotten half of what they did, the interviewer hasn't reviewed the submission, and the conversation starts cold. The best processes collapse that gap. Same-day assessment and interview. Or better yet, live assessment inside the interview where the candidate builds something with the evaluator watching.

The Epilogue

The rapid commerce company eventually moved away from HackerRank-first screening. They restructured their loop. Changed what came first. Changed what they measured. The recruiter who told me the story said the single biggest improvement wasn't a tool change. It was getting the engineering team to agree that a perfect assessment score means nothing if the candidate can't hold a technical conversation.

"We spent three months interviewing people we already knew couldn't do the job. Three months. Because a number on a dashboard said they were qualified."

He laughed again when he said it. But it wasn't the funny kind.

This is [1/n] in a series about the strange, broken, and occasionally enlightening things that happen inside hiring processes. Based on 60+ conversations with recruiters, hiring managers, and engineering leaders across FAANG, hypergrowth startups, and Fortune 500 companies.

Hiring Horror Stories [1/n]: The Day 200 Out of 400 Candidates Scored Perfect