Every pixel classified. 10 terrain classes. Powered by DINOv2 vision transformer.
Compare segmentation outputs between training rounds. Drag the slider to compare.
Prediction images will appear here once training is complete.
Model performance across training rounds with iterative improvements.
Understanding where and why the model struggles — key to iterative improvement.
Pixel-level understanding of desert terrain using deep learning.
We use DINOv2 ViT-S/14 as a frozen feature extractor — a pretrained vision transformer from Meta that understands visual features without any domain-specific training. On top of it, we train a lightweight convolutional segmentation head that maps these features into 10 desert terrain classes.
The Off-road Segmentation Dataset contains 960×540 desert terrain images with pixel-level annotations for 10 classes — from sky and trees to rocks and ground clutter.