We want our robots to sense the world at varying levels of abstraction. From raw sensor value, to terrain type, to habitat type. ROST[1] is a technique for semantic modeling of high bandwidth streaming sensor data, such as audio, video. ROST works in realtime, and is suitable for use onboard a robot. Given a sensor data stream, ROST computes a stream of low dimensional descriptors that are semantically relevant, and can be used for high level mission planning tasks.

This video shows how ROST (Realtime Online Spatiotemporal Topics) can be used to automatically learn about different visual objects in a scene. On the right part of the video we see visual features being extracted and colored according to their topic label. Over time we see these labels slowly converging to different objects in the scene. On the left we see a summary that is representative of what has been observed thus far. This summary is calculated by identifying images which have most diversity in their topic labels.

You can find more details about ROST here.


  1. [1] Y. Girdhar, P. Giguere, and G. Dudek, “Autonomous adaptive exploration using realtime online spatiotemporal topic modeling,” The International Journal of Robotics Research, vol. 33, no. 4, pp. 645–657, Nov. 2013. ↩︎