AI on the Go: Damage Detection on the Edge

Written by

Agnieszka Michalik, Erdem Bulgurr. Dmytro Dehtyarov, Marek Krysiuk

Published on

May 15, 2025

‍How we moved sophisticated AI models from cloud servers to mobile devices.

Every unnoticed scratch, dent, or crack isn't just cosmetic; it represents potential thousands in repair costs, liability risk, and crippling, unexpected vehicle downtime that throws schedules into chaos. In the razor-thin margins of road logistics, ignoring details can be costly. Based on customer feedback, the repair cost for a trailer varies from 1,600€ to 4,500 euros per year.

For the past two years, Checkturio has been leveraging machine learning and AI to help automate processes around truck management. While we use several ML models to support fleet managers and drivers, our most complex is the visual damage recognition model, running in the cloud on powerful GPUs. Drivers snap photos, the app uploads them, and our models analyze the images for issues. While we were satisfied with the models' performance, this cloud-centric approach was not without problems in the real-world environment.

The Constraints of a Remote Brain

The limitations became increasingly apparent. Drivers faced frustrating latency lag, as the round trip – uploading a photo, waiting for cloud processing, downloading the result – often stretched beyond our responsiveness goals, affecting the user experience. Latency was also a particular pain point in remote depots or parking lots with poor network coverage – precisely where many activities take place.

Furthermore, even compressed images consume mobile data – especially an issue if drivers are required to use their own phones to make reports.

Perhaps the most significant hurdle was the offline obstacle. Inspections are non-negotiable; they must happen regardless of connectivity. While our app had an offline fallback, we were required to optimize the UX for both scenarios, and despite our efforts, the offline experience was subpar compared to the AI-assisted one.

Finally, as we grew, the costs associated with processing hundreds of thousands of images – computation, data transfer – began to add up. These combined limitations made it clear: for truly real-time, reliable, efficient, and cost-effective damage reporting, we needed to shift the intelligence closer to the user, onto the mobile device itself.

The Pull Towards On-Device Intelligence

Moving the damage detection model directly onto the driver's mobile device delivered several benefits. The most immediate advantage was instant gratification: inference happens locally, delivering near-real-time feedback directly within the app interface, hitting our UX responsivity targets. This also unlocks true offline capability, allowing AI assistance to work reliably anywhere, anytime, completely untethered from network connectivity.

‍

Beyond the user experience, running AI on the edge leads to reduced cloud costs by minimizing data transfer and remote computation. Crucially, on-device AI opens the door to entirely new workflow possibilities, enabling more interactive and streamlined features within the mobile app – like guiding the user on how to better capture data - that were simply impractical with cloud-based latency.

Challenges on the Road to Edge

While the potential was clear, the path to edge was not straightforward. We faced a few technical challenges. While YOLOv8 offers lightweight 'nano' and 'small' variants theoretically suited for mobile, reliably identifying the often subtle visual cues critical for damage assessment – faint scratches, incipient corrosion, minor structural alterations – frequently benefits from the richer feature extraction capabilities found in the larger, more computationally intensive versions of the model. Therefore, our central challenge became condensing the model for these nuanced detections into a package that could run within the constraints of mobile devices.

Another problem was the diverse hardware landscape of mobile devices. Our solution needed to perform reliably across a wide spectrum of devices, each possessing different processors, memory capacities, and specialized chips. We also had to be aware of battery drain concerns.

Inevitably, we encountered the classic accuracy vs. performance trade-off. Making models smaller and faster often requires sacrificing some accuracy. Finding the sweet spot, where the model remained reliably accurate (we closely monitored metrics like mAP50-95) while being performant enough for the edge, was a critical trade off. Finally, the React Native added another layer of complexity. Our Checkturio app uses this cross-platform framework, meaning we had to efficiently integrate and manage native AI code execution within its architecture, being mindful that processing complex model outputs directly in JavaScript can introduce performance bottlenecks.

Our Optimization and Integration Strategy

Tackling these hurdles required a systematic, multi-stage approach. Our first move was putting the model on a diet through Quantization. Specifically, we used Post-Training Quantization to reduce the numerical precision of the model's parameters from 32-bit floating-point numbers (FP32) down to 8-bit integers (INT8). This significantly shrank the model's footprint and enabled it to leverage the faster, more power-efficient integer math capabilities of mobile chips. While this involved a slight, acceptable adjustment in precision, the performance gains were substantial.

Next, we needed the right toolkit for mobile deployment. We converted the quantized model into TensorFlow Lite (TFLite) format, Google's specialized framework designed for efficiency on mobile and embedded devices. TFLite automatically applies several key optimizations during this conversion, such as fusing multiple mathematical operations into single steps (Operator Fusion), simplifying the model's computational graph, and folding normalization layers into adjacent ones, all contributing to a leaner, faster model. We used tools like LiteRT to validate TFLite's performance benefits early on.

Integrating this TFLite model into our React Native application demanded careful consideration. We selected the react-native-fast-tflite library due to its direct low-level access to the TFLite C++ API and its support for hardware acceleration. The implementation involved building a pipeline within React Native: handling image preprocessing to format the input correctly for the model, executing the inference using the TFLite runtime via the wrapper, and managing the postprocessing step, which included decoding the model's complex output tensor and applying algorithms like Non-Maximum Suppression (NMS) to refine the detected damage areas.

To truly unlock speed, we leveraged TFLite delegates to offload computation to specialized hardware accelerators present on the phones. We primarily focused on the CPU delegate (XNNPACK), which provides highly optimized, multi-threaded execution on the main processor, and the GPU delegates (OpenGL for Android, Metal for iOS), which utilize the graphics processor for massive parallel computation, ideal for image-based tasks. While we explored Android's NNAPI and iOS's CoreML, we ultimately excluded them from our core cross-platform strategy due to NNAPI's device compatibility challenges and CoreML's tendency to favor platform-specific workflows over our universal TFLite approach.

Measuring Success Across the Mobile Maze

Rigorous benchmarking was essential to validate our approach. Using AWS Device Farm and automated Appium tests, we evaluated the INT8 and FP32 models across our target range of top Android and iOS devices.

Quantization proved its worth, with the INT8 model consistently outperforming its FP32 counterpart on the CPU, sometimes by nearly 2x. The GPU delegate was undeniably the speed leader, accelerating inference compared to the CPU – often delivering speedups of 1.5x to over 3x. Interestingly, on the GPU itself, the performance gap between INT8 and FP32 often shrinked, as modern GPUs handle both data types efficiently.

We observed that multi-threading on the CPU offered benefits, but these gains typically plateaued once the number of threads exceeded the available high-performance cores. Critically, we also measured the "React Native tax" – the inherent overhead of running native code within the cross-platform framework. Compared to pure native benchmarks, our React Native implementation showed slowdown factors ranging roughly from 1.3x to over 4x, largely influenced by the processing of model outputs within the JavaScript layer.

Figure 3: Production deployment architecture

Benchmark stack — Figure 4: Mobile benchmark architecture

‍

Navigating the Rollout: A Strategic and Hybrid Approach

Deploying edge AI demands a careful strategy. Our phased rollout prioritizes users with devices demonstrating proven stability and performance gains in our tests. To ensure reliability, our cloud-based inference acts as a backup, seamlessly taking over if edge processing encounters issues in specific conditions.

We continuously monitor real-world performance, closely comparing edge model efficiency and accuracy against cloud counterparts to validate optimizations. This iterative process is accelerated by our ability to update edge models independently from app store releases. This decoupling enables faster model improvements, targeted deployments, and keeps the core application lean.

What's Next?

A key area for further optimization within React Native involves addressing the bottlenecks currently present in pre- and post-processing. Our plan is to implement these computationally demanding tasks, particularly the processing of the model's output data, using dedicated native plugins.

Edge AI: Ready for Prime Time

Our journey bringing damage detection AI to the edge yielded valuable insights. We were able to confirm that even mobile devices several years old can effectively run optimized models, exceeding initial hardware expectations. Quantization delivered the needed performance boost without significantly compromising damage detection quality.

While we are still innovating rapidly, frequently updating models, and a cloud-heavy approach remains our primary strategy for now, we will gradually transition more users to application variants where AI inference is executed locally.

‍

Need a Smarter Way to Manage Your Fleet?

Discover how automation and digitization can optimize your operations, maximize vehicle uptime, reduce costs, and boost revenue. Contact us today—we’re here to help!

Book a demo