Spatial World Model for Object Tracking
- Maintainer status: maintained
- Maintainer: Russell Toris <rctoris AT wpi DOT edu>
- Author: Russell Toris <rctoris AT wpi DOT edu>
- License: BSD
- Bug / feature tracker: https://github.com/GT-RAIL/interactive_world/issues
- Source: git https://github.com/GT-RAIL/interactive_world.git (branch: master)
The spatial_world_model contains libraries, database configuration scripts, and ROS wrapper nodes to communicate with the Spatial World Model. The Spatial World Model is a persistent, spatial representation of the world and a robot's working memory. This includes tracking and storing relationships between physical objects, maps, robots, and boundaries to name a few. A PostgreSQL database is used in the back-end to maintain this information.
How is the Spatial World Model Different from Other World Model Approaches?
The Spatial World Model provides a general representation, persistent storage, and querying of entities (objects) and actions (affordances) organized in 3D semantic maps. The Spatial World Model is aimed to be used both by autonomous mechanisms (for object recognition, map building, affordance learning, etc.) and direct annotation by human users. The Spatial World Model differs from existing approaches by considering only the spatial and physical properties of objects, and foregoes broader conceptual and ontological knowledge. As such, the Spatial World Model aims to keep a modular separation between the spatial representation of the world and the specific inference mechanisms that could be used for AI and decision making. What the World Wide Web 1.0 and HTML did for 2D documents, the Spatial World Model aims to for 3D objects in the physical world.
Example -- Map Annotation
As a basic example for the types of end-user interfaces that can be created with the Spatial World Model, we look at the Map annotation interface. Information on this interface can be found on the worldtoolsjs GitHub page. Below is a video demonstrating its capabilities:
Installation of the Spatial World Model requires two steps:
Setting up the PostgreSQL database (either locally or on a remote server)
- Setting up the ROS nodes to communicate with the Spatial World Model
Installing the Spatial World Model Database
The following steps are written for Ubuntu 12.04 but apply to most Linux systems.
To begin, we must install PostgreSQL and the Python libraries that will talk to it. To do so, run the following command:
sudo apt-get install git postgresql python-psycopg2
Next, we will need to create the actual database. To do so, execute the following:
sudo -u postgres createdb world_model
It is never a good idea to user the default (root) user as the main user for the database. Therefore, we will create a new user that will be used solely with the new world model database. To do so,
sudo -u postgres createuser -D -A -P <username>
You will then be prompted for a password.
Next, we will have to grant this new user permission to our database:
Finally, we are able to install the database schema. This is provided in a script found in the worldlib package. This script can be used to both install a new database and to update an existing one.
Allowing Remote Connections (Optional)
In many cases, you will be installing the Spatial World Model database on a central server so that multiple clients (and robots) can talk to it. As of now, the ROS nodes communicate via SQL to the database; however, due to security risks this will eventually be changed. To allow remote connections, we must modify the configuration scripts on the system. Using your choice of editor, modify /etc/postgresql/9.1/main/pg_hba.conf with root privileges and add the following line:
host world_model <username> 0.0.0.0/0 md5
Next, modify /etc/postgresql/9.1/main/postgresql.conf with root privileges and add the following line:
listen_addresses = '*'
Finally, restart the server:
sudo service postgresql restart
Installing the ROS Software
--- coming soon ---
--- coming soon ---
Implementation Goal and Design Decision
The Spatial World Model project is currently in its infancy and under active development. Parts of the API are considered to be highly unstable. The following sections describe both the long term implementation goals and design decisions associated with the project.
At its core, the Spatial World Model is designed to be a persistent, multi-robot model to keep track of both the robot's working memory as well as keeping track of properties, affordances, and activities that can be associated with each object. To manage persistence, the world model is stored in a PostgreSQL database At a high level, currently the Spatial World Model allows for two sets of entities: a WorldObjectInstance and a WorldObjectDescription.
A WorldObjectInstance, defined in the WorldObjectInstance message, can be thought of as a robot's working memory. At a basic level, such an entity contains a relative pose in the world with associated tags and timestamps. These entities describe a particular, specific instance of an object in the world (e.g., the cup sitting on the desk in the conference room). Each instance is linked to a single WorldObjectDescription which contains a set of spatial descriptors for the object (e.g., mesh, bounding box, point-cloud cluster, etc...). Below is a detailed explanation of the fields in the WorldObjectInstance. Note that some fields will be blank depending on the type of object or what you know about the world.
instance_id - a unique identifier for this instance. This will be an integer assigned by the world model upon creation. This ID can then be used as a frame ID when adding new instances.
name - a human-readable name for this instance.
creation - creation time of the instance. When using the correct API interfaces, this will be automatically assigned during creation.
update - last time this instance was updated. When using the correct API interfaces, this will be automatically assigned during an update.
expected_ttl - an estimate of how long this instance's information will be valid. For example, if the robot finds a coffee cup on the living room table, it is likely it will not be there the next day (depending no how dirty you keep your apartment!). On the other hand, if you find a refrigerator in the kitchen, it is likely that information will be valid for months, even years.
perceived_end - the actual time this instance was removed from the robot's working memory. This timestamp can be updated when the robot reexamines an area and notices the object is no longer there. This prevents the entity from being immediately deleted from the database so that you can look back in time on where things may have been in the past.
source - source information for the origin of this instance. Was it the robot? A remote human annotator?
origin - hostname or IP of the source origin. This should be where the information came from.
creator - the creator (e.g., username or node name).
pose - position information with a belief state. The frame_id field in the pose can be associated with any instance_id in the world model.
description_id - spatial descriptions of the object itself. This is a foreign key for a WorldObjectDescription.
properties - object properties and relationships. Things like on(45) can be used to talk about spatial relationships. This array is a placeholder for a future implementation that will make use of a graph database. Note that the key word in this project is Spatial. As a result, these relationships should only deal with spatial relationships. Things like "belong to" should be avoided.
tags - high level tags
It should also be noted that things stored as object instances need not be physical objects in the traditional sense. It is appropriate and sometimes necessary to include things like maps, rooms, and robots in the world model. To learn about how some of these things are stored, refer to the following section describing listeners.
The second implemented entity is the WorldObjectDescription. This entity, defined in the WorldObjectDescription message, contains spatial descriptors of objects in the world. These are shared models that are common between all instances of such an object (e.g., a 3D mesh of the object itself). Each descriptions contains a set of tags and an array of actual Discriptors. The genericness of the descriptor model allows for models to come from a variety of sources (point cloud segmentation, 3D model warehouses and databases) with few-to-no restrictions. The main idea is to associate an appropriate type and ref field with each descriptor to determine how the data should be treated. In a sense, the type field can be thought up as a non-standard MIME type (PNG, Collada, but also nav_msgs/OccupancyGrid as a type). Future goals of the project set out to create a standard set of accepted type fields. Below is a detailed description of the fields associated with a WorldObjectDescription.
description_id - unique identifier for this world object description. This will be an integer assigned by the world model upon creation.
name - a human-readable name for this world object description (e.g., Starbucks Mug).
descriptors - list of all descriptions of this object.
type - type of data (e.g., nav_msgs/OccupancyGrid, URDF, Collada, PNG, ect.)
data - raw message data (e.g., XML (URDF), JSON, base64 encoded data, etc.).
ref - JSON representation of source reference. Future goals aim to standardize this.
tags - high level tags (e.g., kinematics, shape, etc.).
tags - high level tags
The first design decision was to use a PostgreSQL database for storage. Given the highly relational components associated with the world model (e.g., WorldObjectInstance and WorldObjectDescription), it made sense to use such a database over other types of databases. For efficiency in searching and storage, the database schema itself is broken into finer grains than the APIs allow for. It is intended that developers make user of these higher-level APIs when dealing with the Spatial World Model as apposed to making raw SQL queries.
Higher Level APIs
The current implementation includes several layers of APIs. As mentioned previously, it is not intended for a developer to use the world model by directly making SQL queries. At the lowest level, the worldlib Python API should be used. This level of the API is responsible for talking SQL to the world model database and is able to make basic insertion and search queries while maintaining the correct structure. This level of the API allows for non-ROS processes to make use of the world model (another future goal of the project). By using an SQL connection between this library and the database, remote connections can be made and a central database can be used (such as one hosted in the cloud). This, of course, requires your server to allow remote SQL connections which is not ideal. Therefore, future plans hope to create a server-side API to allow for remote queries (think REST as an example but this would require polling). With such an API in place, the interface between the robot or client and the database could be made with this new service.
The second level of the API is the actual ROS node itself. The world_model node makes use of the Python API to communicate with the database. This node then offers a series of action servers to allow ROS nodes to search and add to the world model. Conversion between database entities and ROS messages is made here. It is intended that within ROS, a listener framework is used as described below.
The Listener Idea
Within ROS, the intended use of the APIs into the world model was to create a series of what are being called listener nodes. Such nodes listen to a set of defined topics, make the appropriate inferences on the information, and update the world model accordingly. Below are three examples included in world_listeners.
map_listener - The map listener listens to /map and stores the occupancy grid in a description. An instance is created in the world model tagged with map and can be used as a reference frame for other entities. A check is also made to see if a descriptor already exists with the given occupancy grid. If this is the case, the instance is linked to this description instead of creating a new one.
robot_pose_listener - The robot robot_pose_listener listens to /robot_pose (from the robot_pose_publisher) and updates an instance in the world model for the robot. This effectively saves its pose. Upon startup of this node, a check is made to see if that robot's pose already exists. If so, a call is made to /initpose to re-localize the robot based on its last known location.
The above are just examples of the types of listeners that can be created. An additional example could be a segmented object listener. Such a node could listen for any segmented objects found by the robot and update the world model accordingly.
To allow for multiple robots, a notion of namespacing must be kept. To support this feature early on, this information is currently held inside of the tags of the instance. It is up to the developer to maintain this namespace. For example, the above listeners take an optional argument to define the namespace. If no namespace is given, it will default to the hostname of the machine the node is running on. In most cases, this is good enough since the hostname of the robot is usually a good namepsace. Then, when searching for things like a particular robot, we can do a tag search for ["robot", "myRobotName"]. Future improvements should be made to make this clearer and enforce unique namespacing.
Future Implementation Goals
Object Instance Properties Database
One improvement to the current system would be to separate the properties array into its own separate database. The idea behind properties is to define relationships such as on or in between entities in the world model. Current thoughts are to point to entries within a graph database. By doing so, powerful search queries to can written such as "give me all the objects inside the bedroom?" or "is the book on my bookshelf?" in an efficient way.
Affordances and Activities
One large piece of the world model that is missing is the notice of affordances. The goal of the Spatial World Model is to not only keep track of particular instances of objects, but to also manage what types of actions can be taken on certain objects. For example, a door can be opened, a cup can be grasped, and a robot can grasp (assuming it has a gripper, of course). Furthermore, pre-conditions should also be stored here. For example, the cup must be on the table to be picked up (or any number of other conditions). This would rely on the implementation of the graph database described above. These types of attributes should be stored in a separate table in the database and linked to a particular WorldObjectDescription.
In addition to the affordances, a notion of activities, must be stored as well. Such a structure would be used to figure out how to perform such an action on such an object. For example, if you wanted to use a pickup action on a coffee cup, the associated activity would be some action call to a grasping pipeline. Each activity can be thought of as a node with some kind of transition model incorporated to provide feedback and belief states. An updated diagram of the Spatial World Model would be the following:
World Object Instance Improvements
In addition to abstracting out the properties as defined above, several improvements are needed with respect to the instances. For one, belief states should be associated with most attributes. While the current pose does allow for this, beliefs about things just as timestamps are just as important.
A second improvement needed is the enforcement of namespaces. Checks should be made to make sure things are linked to a proper namespace entity (e.g., a robot should be linked to a map within its own namespace).
Thirdly, efforts should be made to standardize the tag set. While the listeners can help enforce this, care should be taken to make sure duplicate names for the same tag do not appear. Standardization helps with this.
World Object Description Improvements
As with the instances, the descriptions also need several improvements. One important feature is to standardize things such as the types, source JSON strings, and tags. A list containing all officially recognized types should be made and kept up to date.
A second major component is a cleanser process for the database. Currently, descriptions can be linked to multiple instances. This is the main idea behind the descriptions itself. Additionally, these descriptions can potentially contain massive amounts of data (Collada models for example). If there are no longer any instances linked to a given description, it should be removed not only from the database itself, but from the disk as well (since the large data portions are kept in PostgreSQL Large Objects. Care should be taken to ensure thread safety in the removal.
Perhaps the largest piece needed in the project is a more robust, efficient, and flexible server-side API for the world model. Currently, the worldlib Python API is used by the main ROS node and speaks SQL to the database. For many reasons, security being one, this is not ideal. Efforts should be made to create a server-side API that allows for multiple remote connections to interact with the world model. Not only would this still allow the robots to communicate with the world model, but clients could now directly connect to the world model instead of using rosbridge_server as a "proxy". While at first glance it may seem appropriate, this API should not be response-based such as a REST API. A more robust socket-level connection should be made to allow for bi-directional communication. By standardizing a server-side interface, we can also create a more powerful query system. The protocol between clients and the server could include things like searching descriptions or descriptors without having to return the data associated with them. This allows clients to subscribe to changes in the world model without the need of polling. A diagram of the updated API levels is shown below.
Discussions and Contributions
Discussions and contributions are welcome! To get involved, check out the GitHub Issue Tracker for current feature requests and discussions.
Please send bug reports to the GitHub Issue Tracker. Feel free to contact me at any point with questions and comments.