Recode: Goal Babbling

This notebook proposes a general introduction to goal babbling, and how it differs from motor babbling, in the context of robotics. Goal babbling is a way for robots to discover their body and environment on their own. While representations of those could be pre-programmed, there are many reasons not to do so: environments change, robotic bodies are becoming more complex, and flexible limbs, for instance, are difficult and expensive to simulate. By allowing robots to discover the world by themselves, we use the world itself—the best physic engine we know—for robots to conduct their own experiments, and observe and learn the consequence of their actions, much like infants do on their way to becoming adults.

This notebook requires no previous knowledge beyond some elementary trigonometry and a basic grasp of the Python language. The spirit behind this notebook is to show all the code of the algorithms in a simple manner, without relying on any library beyond numpy (and even, just a very little of it). Only the plotting routines, using the bokeh library, have been abstracted away in the graphs.py file.

The algorithms and results exposed here were originally presented in the chapter 0 and chapter 3 of my Ph.D. thesis. They were implemented using the explorers library then. Here, as explained above, we don't rely on the library, and the code has been exposed in its simplest form. It is entirely available under the Open Science License. A citable version of this notebook is available at figshare. You can contact me for questions or remarks at fabien.benureau@gmail.com.

A Bit of Context

The core idea behind goal babbling is to explore an unknown environment by deciding which goals to try to pursue in it, rather than deciding which action to try. For robots, it means directing the exploration by choosing which effects to produce in the environments, rather than choosing directly which motor commands to execute. Of course, once a goal has been chosen, the robot must find a motor command to reach it. But the decision on what to explore has been made on which effect to try to produce, not in the motor space, and we will see that it makes an important difference.

The idea of goal babbling was proposed by Oudeyer and Kaplan (2007) (p. 8). Computation implementations were then proposed by Baranes and Oudeyer (2010); Rolf et al. (2011) and Jamone et al. (2011). Formal frameworks were described by Baranes and Oudeyer (2013) and Moulin-Frier and Oudeyer (2013); the work we present here can be understood as an implementation of SAGG-Random.

In [1]:
import random
import numpy as np
import graphs

SEED = 0
random.seed(SEED)                                                                              # reproducible results.

Robotic Arms

We consider four different robotic arms, with 2, 7, 20, and 100 joints respectively. All arms measure one meter in length, and all the segments between the joints of the same arm are of equal length. The 7-joint arm, therefore, has segments of 1/7th of a meter, as shown in the figure below. Each joint can move between -150° and +150°. The arm receives a motor commands composed of 2, 7, 20 or 100 angle values, and returns an effect, the position of the tip of the arm after positioning each joint at the required angle, as a cartesian coordinate $x, y$.

In [2]:
class RoboticArm(object):
    """A simple robotic arm.

    Receives an array of angles as a motor commands.
    Return the position of the tip of the arm as a 2-tuple.
    """

    def __init__(self, dim=7, limit=150):
        """`dim` is the number of joints, which are able to move in (-limit, +limit)"""
        self.dim, self.limit = dim, limit
        self.posture = [(0.0, 0.0)]*dim                # hold the last executed posture (x, y position of all joints).

    def execute(self, angles):
        """Return the position of the end effector. Accepts values in degrees."""
        u, v, sum_a, length = 0, 0, 0, 1.0/len(angles)
        self.posture = [(u, v)]
        for a in angles:
            sum_a += np.radians(a)
            u, v = u + length*np.sin(sum_a), v + length*np.cos(sum_a)           # at zero pose, the tip is at x=0,y=1.
            self.posture.append((u, v))
        return u, v

Let's create our four arms.

In [3]:
arm2   = RoboticArm(dim=2,   limit=150)   # you can create new arms here, or modify the parameters of an existing one.
arm7   = RoboticArm(dim=7,   limit=150)   # for instance, the `limit` parameter can have important consequences.
arm20  = RoboticArm(dim=20,  limit=150)
arm100 = RoboticArm(dim=100, limit=150)

This is what the 7-joint arm looks like in a few handpicked postures. The grey circle represents the approximate limits of the reachable space, i.e. the space the tip of the arm can reach. [1]

In [4]:
fig = graphs.postures(arm7, [[  0,   0,   0,   0,   0,   0,   0],
                             [ 80, -70,  60, -50,  40, -30,  20],
                             [150, -10, -10, -10, -10, -10, -10]], disk=True)

We desire to explore the possibilities offered by the different arms, that is, understand where we can place the tip of the end-effector.

And we want to do that from an agnostic standpoint: the arm is a black box, receiving inputs (the angles of the joints) and producing outputs (the position of the tip of the arm). We can't assume any knowledge about the arm, nor deduce anything about the order of the inputs (which can be shuffled). In particular, zero can't be used as a special value. This constraint of ignorance allows to reuse the strategies developed on the arm for any other black box (in theory).

We add a second constraint: time. We can only execute motor commands a limited number of times on the arm. Here, we limit ourselves to 5000 executions. 5000 executions may seem a lot, but it is not compared to the size of the motor space: if we wanted sample all the motor commands whose angle values are multiples of 10° (-150°, -140°, ..., 0°, ..., 150° for each joint), we would need $31^7 = 27,512,614,111$ executions, more than 5 million times the budget we have.

So, given these two constraints, what are possible strategies to explore the arm?

In [5]:
N = 5000                                                                                       # the execution budget.

Motor Babbling

The simplest strategy is to try random motor commands. We call this strategy random motor babbling (RMB). 'Babbling', here, means that we execute those commands because we don't know—and we want to know—what they will produce.

In [6]:
def motor_babbling(arm):
    """Return a random (and legal) motor command."""
    return [random.uniform(-arm.limit, arm.limit) for _ in range(arm.dim)]    
In [7]:
def explore_rmb(arm, n):
    """Explore the arm using random motor babbling (RMB) during n steps."""
    history = []

    for t in range(n):
        m_command = motor_babbling(arm)                        # choosing a motor command; does not depend on history.
        effect = arm.execute(m_command)                        # executing the motor command and observing its effect.
        history.append((m_command, effect))                                                        # updating history.

    return history

This is a simple strategy. By using it for 5000 motor commands, we obtain the following distribution of effects.

In [8]:
figures = []

for arm in [arm2, arm7, arm20, arm100]:
    history = explore_rmb(arm, N)                                  # running the RMB exploration strategy for N steps.

    figures.append(graphs.effects(history, show=False,
                                  title='motor babbling, {}-joint arm'.format(arm.dim)))

graphs.show([[figures[0], figures[1]],                                                           # displaying figures.
             [figures[2], figures[3]]])

Each blue dot is the position reached by the tip of the arm during the execution of a motor command.

The random motor babbling strategy does rather well on the 2-joint arm. But as the number of joints increases, the distribution of the effects covers less and less of the approximate reachable area represented by the grey disk. Even after numerous samples of the 7, 20 and 100-joint arms, we don't have a good empirical estimation of the reachable space.

Goal Babbling

Motor babbling explores the possibilities offered by the arm by exploring the motor space: the choice of which motor command to execute is done by choosing a point in the motor space.

Another strategy is to explore not the motor space but the sensory space, i.e., the space of effects. Goal babbling does exactly this: choose a point in the effect space, consider it as a goal to be reached, and try to find a motor command to reach that goal. We still produce a motor command to execute, but the choice of that motor command is done in the sensory space.

Implementing the goal babbling strategy raises two problems:

  1. How to find a motor command to reach a specific goal?
  2. How goals are chosen?

Inverse Model

The answer to the first question is an inverse model. An inverse model is a process that transforms a goal, i.e., a point in the sensory space, in a motor command, i.e. a point in the motor space. The better an inverse model is, the less distance there is between the goal and the effect obtained when executing the motor command produced by the model.

There are many ways to create an inverse model. Some rely on the arm's schematics. Here, we can't do this, because of our agnostic constraint about not relying on specific knowledge of the arm. Another way is to learn the inverse model during the exploration: each execution of a motor command bring a motor command/effect pair; this data can be exploited to improve our ability to turn effects into motor commands. There are quite sophisticated ways to do that. Here, we choose one of the simplest ways.

When given a goal, our inverse model looks at our history of observations, and select the one that produced the effect closest to the goal. The motor command that produced this effect is retrieved, and we add a small, random perturbation to it. The resulting motor command is the one that is going to be executed.

In [9]:
D = 0.05                         # with a range of ±150°, creates a perturbation of ±15° (5% of 300° in each direction).
In [10]:
def dist(a, b):
    """Return the Euclidean distance between a and b"""
    return sum((a_i-b_i)**2 for a_i, b_i in zip(a, b))

def nearest_neighbor(goal, history):
    """Return the motor command of the nearest neighbor of the goal"""
    nn_command, nn_dist = None, float('inf')                                          # naive nearest neighbor search.
    for m_command, effect in history:
        if dist(effect, goal) < nn_dist:
            nn_command, nn_dist = m_command, dist(effect, goal)
    return nn_command

def inverse(arm, goal, history):
    """Transform a goal into a motor command"""
    nn_command = nearest_neighbor(goal, history)                              # find the nearest neighbor of the goal.

    new_command = []
    for m_i in nn_command:
        max_i = min( arm.limit, m_i + 2*D*arm.limit)
        min_i = max(-arm.limit, m_i - 2*D*arm.limit)
        new_command.append(random.uniform(min_i, max_i))       # create a random perturbation inside the legal values.

    return new_command

Optional: Fast Nearest-Neighbors

Now, the nearest neighbor implementation we have is fine, but it is too slow for some of the experiments we will do here. We replace it by a fast implementation from the learners library. If you want to keep the slow but simple implementation, skip the next three code cells.

In [11]:
try: # if learners is not present, the change is not made.
    import learners

    _nn_set = learners.NNSet()
    def nearest_neighbor_fast(goal, history):
        """Same as nearest_neighbors, using the `learners` library."""
        global _nn_set
        if len(history) < len(_nn_set): # HACK
            _nn_set = learners.NNSet()
        for m_command, effect in history[len(_nn_set):]:
            _nn_set.add(m_command, y=effect)
        idx = _nn_set.nn_y(goal)[1][0]
        return history[idx][0]

    print('Using the fast implementation.')

except ImportError:
    nearest_neighbor_fast = nearest_neighbor
    print('Using the slow implementation.')
Using the fast implementation.

Let's verify that the two nearest neighbor implementations do the same thing.

In [12]:
history_test = []
for i in range(1000):                                                  # comparing the results over 1000 random query.
    m_command = [random.random() for _ in range(7)]
    effect    = [random.random() for _ in range(2)]
    history_test.append((m_command, effect))

    goal = [random.random() for _ in range(2)]
    nn_a = nearest_neighbor(goal, history_test)
    nn_b = nearest_neighbor_fast(goal, history_test)
    assert nn_a == nn_b                                                              # the results should be the same.

Okay, that checks out. Let's override the slow implementation.

In [13]:
nearest_neighbor = nearest_neighbor_fast

Exploration strategy

Once we have the inverse model, the remaining question is how to choose goals. There are many ways to cleverly select goals, and this is mostly explored in the context of intrinsic motivations (see Oudeyer and Kaplan (2007) for instance). Here, we choose the simplest option: random choice. For each sample, we choose as a goal a random point in the square $[-1, 1]\times[-1,1]$.

In [14]:
def goal_babbling(arm, history):
    """Goal babbling strategy"""
    goal = [random.uniform(-1, 1), random.uniform(-1, 1)]                   # goal as random point in [-1, 1]x[-1, 1].
    return inverse(arm, goal, history)

One detail is that the inverse model needs a non-empty history, because it works by creating perturbation of existing motor commands. To solve this problem we begin by doing 10 steps of random motor babbling, and then switch to random goal babbling for the remaining 4990 steps of our 5000-step exploration.

In [15]:
def explore_rgb(arm, n):
    """Explore the arm using random goal babbling (RGB) during n steps."""
    history = []

    for t in range(n):
        if t < 10:                                                     # random motor babbling for the first 10 steps.
            m_command = motor_babbling(arm)
        else:                                                                             # then random goal babbling.
            m_command = goal_babbling(arm, history)                                # goal babbling depends on history.

        effect = arm.execute(m_command)                        # executing the motor command and observing its effect.
        history.append((m_command, effect))                                                        # updating history.

    return history

Let's look at how the random goal babbling strategy covers the reachable space for our four arms.

In [16]:
figures, histories_gb = [], []

for arm in [arm2, arm7, arm20, arm100]:
    history = explore_rgb(arm, N)
    histories_gb.append(history)                                   # we keep the histories for further analysis below.

    figures.append(graphs.effects(history,
                                  title='goal babbling, {}-joint arm'.format(arm.dim)))

graphs.show([[figures[0], figures[1]],
             [figures[2], figures[3]]])