Machine Learn­ing: adapt­ing to the new

2. October 2018


Machine Learn­ing is essen­tial to the devel­op­ment of arti­fi­cial intel­li­gence and in the emer­gence of autonomous sys­tems and tech­nolo­gies. This is exactly what makes our robots stand out: they act autonomously, even in dynamic, unknown envi­ron­ments. To do this suc­cess­fully, they need to con­stantly learn by draw­ing con­clu­sions from a set of data and adapt­ing their actions accordingly.

Of course, all our robots have excel­lent man­ners: we pro­vide them with a lot of use­ful infor­ma­tion, even before they started work­ing. This includes, for exam­ple, basic action and move­ment sequences for grip­ping and set­ting down safely, cor­rect speeds in var­i­ous sit­u­a­tions, and default reac­tions to obsta­cles and errors. But the ware­house envi­ron­ment is com­plex and con­stantly chang­ing. With peo­ple and objects mov­ing in the envi­ron­ment, the robot can­not suc­cess­fully react to changes solely with its ini­tial knowl­edge because these move­ment pat­terns are com­plex. An exam­ple: the robot wants to pick a box from the shelf rack, but it has not been cor­rectly placed by a human col­league. The robot now faces a major chal­lenge: where exactly should it reach? This is where Machine Learn­ing comes into play.

The dif­fer­ent approaches of Machine Learning

We can train the robot to solve prob­lems in dif­fer­ent ways: on the one hand, sim­i­lar to human learn­ing, there is “super­vised learn­ing”. Here, the sys­tem gets a lot of exam­ples and images for com­par­i­son, as well as the cor­re­spond­ing results as input. How­ever, since our robots are them­selves able to per­ceive, con­stantly col­lect­ing their own data through cam­eras and sen­sors, they can also cre­ate a “super­vised learn­ing prob­lem” for them­selves – we at Mag­a­zino have called this type of Machine Learn­ing “self-super­vised learn­ing”. Using intel­li­gent algo­rithms, the robot can abstract empir­i­cal val­ues or data and draw con­clu­sions from them: if, for exam­ple, the robot com­pares the planned with the observed effect of its action, or the world before and after its action, it can learn a model of its action. This can be used to pre­dict which effects can be expected given cer­tain ini­tial con­di­tions and action para­me­ters, such as dur­ing object seg­men­ta­tion or grasp­ing. This means a model of the object as a delta between before and after, and a method that can also dis­tin­guish other objects from sen­sor data.

A third type of Machine learn­ing is “rein­force­ment learn­ing”, which com­bines both of the pre­vi­ous vari­ants. In addi­tion, incen­tives are given to the sys­tem via a reward or penalty sys­tem. A poten­tial area of appli­ca­tion at Mag­a­zino is order dis­tri­b­u­tion between robots, i.e. within a robot fleet. Penalty points are given for each pass­ing meter or sec­ond, and reward points are awarded for the suc­cess­ful pick­ing or putting down of objects.

But Machine Learn­ing is com­plex, espe­cially for mobile robots with a vari­ety of sensors.
One of the chal­lenges here is that the sen­sor data obtained are extremely dif­fer­ent – not com­pa­ra­ble to two-dimen­sional dig­i­tal images that are processed by sim­ple image recog­ni­tion soft­ware. The robots must be able to fil­ter mean­ing­ful image sec­tions: this means that they must be able to decide which data from the cam­eras is actu­ally rel­e­vant to them at that moment.
You can imag­ine it like this: when peo­ple take pho­tographs, they usu­ally focus on a cer­tain object, which is then rel­a­tively promi­nent in the pic­ture. If a robot cap­tures images or per­ceives, this hap­pens ini­tially with­out a spe­cial focus.
There­fore, the next step is to find out where an object begins, ends and how far away it is from the cam­era or sen­sor. This becomes rel­e­vant, for exam­ple, when the robots are stand­ing in front of a tightly packed shelf full of shoe boxes. They then have to iden­tify the cor­rect boxes among them and plan the cor­re­spond­ing grip­ping move­ment pre­cisely, so that the other boxes are not disturbed.

Where is Machine Learn­ing par­tic­u­larly useful

Our robots learn through feed­back cycles, which we call “self super­vised learn­ing”: per­cep­tion, reac­tion, feed­back, insight. This is very clear in the appli­ca­tion case “iden­ti­fy­ing opti­mal grip­ping points on shoe boxes”. Here, the grip­ping motions are opti­mized for the iden­ti­fied grip­ping points:

The blue areas promise a high prob­a­bil­ity of suc­cess when pick­ing the boxes.


The vac­uum grip­per, with its six suc­tion cups, pre­pares to pick a tar­get box.


This is how learn­ing works

  • With our mobile pick­ing robot TORU, data from numer­ous “picks” of shoe boxes are recorded dur­ing live operation.
  • A spe­cial focus is on which of the six suc­tion cups of the vac­uum grip­per form seals with the box dur­ing picks and can there­fore gen­er­ate a suf­fi­cient suc­tion force.
  • At the same time, before each pick, a photo is taken with the cam­era in the grip­per and linked to the respec­tive results of the sub­se­quent pick.
  • Even if almost all picks are suc­cess­ful, about half of the indi­vid­ual suc­tion cup feed­back is negative.
  • The data obtained is fed into a neural net­work. Based on the out­put, a heat map is cre­ated to show in which areas of the box the suc­tion cups tend to be suc­cess­ful, and where they tend not to be.
  • The sys­tem is then shown the image of a com­pletely new shoe box. On the basis of the pre­vi­ously gained knowl­edge, the sys­tem can iden­tify the areas on the box where the great­est like­li­hood for the six suc­tion cups to suc­cess­fully cre­ate a vac­uum are. The robot has thus learned suc­cess­fully and can con­tinue to apply its knowl­edge in the future.

The effec­tive use of Machine Learn­ing con­stantly improves the per­for­mance of our robots and makes them adapt­able. With their intel­li­gent algo­rithms, they can eas­ily react to novel sit­u­a­tions, such as suc­cess­fully pick­ing new or shifted boxes, or find­ing new ways through the ware­house. And since our robots are net­worked locally and glob­ally via a cloud, a robot learns not just for itself, but for the entire fleet: if a TORU in the ware­house gains a new insight into a blocked pas­sage, it shares this with all oth­ers in real time. In the future, the deploy­ment process will also be sig­nif­i­cantly short­ened, con­verg­ing to full automa­tion. Our long-term goal: after ini­tial setup, the robots can quickly and effi­ciently find their way around their new envi­ron­ment, imme­di­ately pro­vid­ing active sup­port and added value from day one.