Overcoming vision deep learning challenges: why bigger models aren’t always better

February 19, 2026

•

minute read

We're launching a new Engineering Blog series, where Helin engineers share practical insights from building real-world AI systems. Modern vision AI may look like plug-and-play, but high-performing models still depend on one critical factor: data quality. In this article, Jason, Tech Lead at Helin, explores the impact of “dirty” data, the limits of automated labelling, and how targeted dataset clean-up can turn a good model into a great one.

Why bigger models aren’t always better

‍

Integration vs Innovation in modern vision deep learning

Most applications of computer vision and machine learning today are more an issue of integration than innovation. Tried and tested architectures and pre-trained models dominate the scene, and most applications follow a familiar flow: identify the objects to be detected or classified, find an existing pre-trained model, and integrate it into the system.

This is becoming even more true in 2026. With tech giants pumping out foundational models such as Meta’s Segment Anything Model (SAM), it’s only a matter of time until training your own vision model becomes a relic of the past… right?

Well, part of me is waiting for the day that happens.

Each time I read news of a new method or model being released, I rush to set up a quick trial. Each time, I’m left wanting more, as the more difficult examples I have on hand continue to slip through the “powerful” detection capabilities of these models. It seems, at least for now, that application-specific models are here to stay, along with all the challenges they bring.

‍

Why data quality still defines model performance

Anyone who has spent time in the trenches knows that one of the most important aspects of developing an accurate model is none other than the word much of the modern world revolves around: data.

There is certainly a mental shift when moving from academia to industry in that regard. If you have ever wondered why the same datasets show up again and again in experiments, you might discover how tedious it actually is to create a dataset worth experimenting on. You must be confident that the data you have created is of high enough quality that it will not distort your results when evaluating how well an architecture learns.

The general consensus seems to be that the more data, the better.

Think of it in human terms. The more experience you have with a certain task, the better you become at performing it. If you are shown the difference between an orange and a mandarin enough times, you learn how to distinguish the two.

We often equate the machine learning process with natural human learning. This analogy works to an extent, but it breaks down if we overlook the fact that our own learning rests on a foundation of years of experience and interaction with countless objects, sensations, and teachings.

In a sense, the data we consume as humans is always “perfect” because we interact directly with reality.

If someone holds up two red balls and claims they are different colors, I can refute that.

I know from the data my eyes provide that they are the same. Neural networks, on the other hand, cannot refute information during training.

If you feed a model the same image twice and label one “red ball” and the other “green ball,” it will be no closer to discovering what is real and what is not.

‍

The hidden cost of dirty data

Now extend that example to thousands, or hundreds of thousands, of images. A few mislabeled examples surely would not matter too much, right? After all, there are many other correct examples.

Just let the model learn.

Well, learn it does.

You may very well end up with a good model.

But removing those few dirty data points can turn a good model into a great one.

‍

The labelling challenge at scale

So we want perfect data, and we want a lot of it.

How do we get there?

People often think it is infeasible to label hundreds of thousands of frames by hand, especially when dealing not only with classification but also with object detection within complex scenes. The truth is that most shortcuts or automation in labeling data do not lead to a better model.

If you use a model to automatically label your dataset, any model trained on that data will struggle to outperform the model that generated the labels. New labels must be introduced by humans to eliminate bias and add genuinely new information. There is a very good reason entire industries exist solely to produce labels for machine learning.

That said, using a model to enhance already human-labeled data can be extremely effective. A model trained on high-quality human annotations can detect objects that humans occasionally miss, which in turn can further improve overall dataset quality.

‍

Using models to improve dataset quality

Our friends at 3LC recognized this and understood how significant the problem of dirty data is in real-world machine learning applications. One of the tools they provide allows us to iterate on large datasets that have grown too large to manage manually. It uses trained models to flag potential flaws, which can then be reviewed by humans to improve overall data quality.

We wanted to find out exactly how dirty our data was and how much cleaning it up would affect our model performance. After running several experiments, we were somewhat surprised to discover that nearly 6% of our frame data contained missing labels or incorrect bounding boxes. That realization changed our perspective on data and reinforced how important it is to ensure labels are sourced and validated correctly.

After some tool-assisted but still tedious spring cleaning, early results show that a new model trained on a dataset an order of magnitude smaller than our previous ones performs just as well as, if not slightly better than, our previous best model.

How do we know it is better? That is a conversation for another day. For now, this has opened a new path in our pursuit of continuous improvement, so watch this space.

In the meantime, if you are walking a similar path, think twice about your data and how good it really is.

Until then, keep learning and keep building cool things!

Subscribe to newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Want to read more?

Stay up to date with the latest trends and developments on the topic of industrial edge computing, monitoring and intelligence.

View all

IPP Transformation Series (Part 2 of 4) - From developer to Independent Power Producer

Together with Palasol, we’re launching a four-part blog series on the shift from developer to IPP, and why it’s becoming unavoidable. This second blog article shows how the shift to IPP actually happens: clear intent, the right capabilities, and a digital foundation that supports scale. We show how organisations can move from awareness to action, without losing momentum along the way.

IPP Transformation Series (Part 1 of 4) - The traditional renewable energy business model no longer works

Five years ago, renewable energy projects delivered predictable returns in a relatively stable market. That reality has changed. Today, price volatility, grid congestion and falling capture rates are putting margins under pressure. Value can no longer be passively collected, it must be actively created. Yet many organisations still operate within models that are struggling to keep pace with a market accelerating around them. The traditional “build-and-hand-over” approach is increasingly misaligned with current market dynamics. Together with Palasol, this four-part series explores why EPCs and single-asset models are under structural pressure, and why operational maturity and the IPP model are becoming decisive for securing long-term value. In this first article, we assess the shift away from EPCs and single-asset strategies and show why operational maturity and the IPP model are now the decisive differentiators.

From delayed insight to real-time awareness: how edge technology is advancing offshore rig safety

As rigs grow more automated and operations more complex, delayed visibility is no longer acceptable. Safety 4.0 brings together edge computing, connected sensors, and AI-driven insights to give rig crews and onshore teams instant awareness of what’s happening in the most hazardous parts of the operation. And for an environment where downtime can exceed $200,000 per day and every second matters, this shift is transformational. This article shows how Helin applies edge computing to enable real-time safety and faster decision-making.

View all

Helin is trusted by global clients

Get real-time insights

into your remote assets

Combine local artificial intelligence with a centrally managed data infrastructure for more accuracy, reduced congestion on your network and lower costs.

Talk to an expert

Read case studies