[ HOME | Bi ]

In [2]:
%matplotlib inline

import sys
sys.path.append('../lib')

We spoke about the basic operations like binding and bundling, but the sub-symbolic level allows us to do other operations which are "composite" and have no equivalent in non VSA architecture, that is because we can use SDP Algebra as I mentioned.

Analogy mapping

Humans use analogy all the time, it is integral part of intelligence. What if I told you we can mimic a proto form of analogy on syb-symbolic level via vector math. Nooo waaay ! .... way!

Here is the idea :

$ analogy\_map = relation * reverse\_relation $

if we do the algebraic manipulation :

$ analogy\_map * relation = reverse\_relation $

because the bind operation is two way street.

Now we can use the first equation for a training operation and the second as predicting/testing operation. The important part is that we can do this operation on a whole structure too, not just on single term i.e. holistically. Plus this is one-shot learning, something you can't do with Neuro-networks.

Lets concretize the idea, the relations we will pick are "above" and correspondingly the reverse "below". We will train "analogical map" that will virtually "swap" the position of two relations i.e. make one relation transform to the other. Here is what we have :

-- circle above a square --

$ cas = above + a\_role1 * circle + a\_role2 * square $

Let see the reverse :

-- square below a circle --

$ sbc = below + b\_role1 * square + b\_role2 * circle $

Now we learn the mapping (one-shot learning) :

$ map = cas * sbc $

and then we can apply it to unknown (not trained with) objects. Lets define them, so that we can do the comparison.

$ sat = above + a\_role1 * star + a\_role2 * triangle $

$ tbs = below + b\_role1 * triangle + b\_role2 * star $

So if we want to transform the "triangle" above the "star' =to=> 'star' below 'triangle', we will bind it with the learned map. Of course it is approximate match. That is vectors for you.

tbs <=~=> map * sat

sat <=~=> map * tbs

The operation works both ways.

If you want it to work only in one direction you should use permutation-bind when defining (have not tested it).

These kind of operations are also called holistic transformations where one compositional structure is mapped onto another compositional structure without having to first decompose the source representation into its components

Ok now that we know the theory, let's try it.

In [3]:
from bi import *
from lexicon import *

x = lex()
x.add_items(['above', 'below', 'a1', 'a2', 'b1', 'b2', 'circle', 'square', 'star', 'triangle'])

We should use sdp.bundle(), instead of +, because of the even-oddity I mentioned earlier /adds too much noise and it may not work/.

In [4]:
#training data
cas = sdp.bundle([ x['above'], x['a1'] * x['circle'],   x['a2'] * x['square'] ])
sbc = sdp.bundle([ x['below'], x['b1'] * x['square'],   x['b2'] * x['circle'] ])

#novel/testing data
sat = sdp.bundle([ x['above'], x['a1'] * x['star'],     x['a2'] * x['triangle'] ])
tbs = sdp.bundle([ x['below'], x['b1'] * x['triangle'], x['b2'] * x['star'] ])

#partially closer to the training data : 'square' is used in both at the same position
sas = sdp.bundle([ x['above'], x['a1'] * x['star'],     x['a2'] * x['square'] ])
sbs = sdp.bundle([ x['below'], x['b1'] * x['square'],   x['b2'] * x['star'] ])

#misplased 'square' i.e. novel data
sas2 = sdp.bundle([ x['above'], x['a1'] * x['square'],     x['a2'] * x['star'] ])
sbs2 = sdp.bundle([ x['below'], x['b1'] * x['star'],   x['b2'] * x['square'] ])

Lets learn the map (one-shot learning) :

In [5]:
M = cas * sbc

now test it against the training example (measuring distance, btw). Seems OK.

In [6]:
(M * cas) % sbc
Out[6]:
0

Now lets try against the test structure, which we did not use in training :

In [7]:
(M * sat) % tbs
Out[7]:
3779

In our case as we mentioned the mapping is bi-directional, so :

In [8]:
(M * tbs) % sat
Out[8]:
3779

The same using sdp.dist()

In [9]:
sdp.dist((M * sat), tbs)
Out[9]:
3779

The distance is ~38%, but we said that a distance below ~42% means that the symbols are similar (sdp.true_thresh).

And the same this time the-boolean way :

In [10]:
sdp.sim((M * sat), tbs)
Out[10]:
True

Lets also test it with data that partially match with the training data :

In [11]:
sdp.dist((M * sas), sbs)
Out[11]:
2498

The distance as expected is smaller ~25%.

Once more but this time 'square' is placed in the 'wrong' position i.e. it will represent again in a sense novell data, ergo distance again becomes ~37%.

In [12]:
sdp.dist((M * sas2), sbs2)
Out[12]:
3852

And finally lets test against random SDP, to see if they are orthogonal.

In [13]:
sdp.dist((M * sat), sdp.rand())
Out[13]:
5009

That ends our exploration of analogy mapping, but I suppose you can guess that there is big unexplored territory in the sub-symbolic space for other ways of building composite-operations on structures, if you follow the ideas described in this article.

You can see more detailed test at 'test/analogy_mapping.py'.


Categorization

Another common human trait is Categorization or said it otherwise Concept formation. Here again sub-symbolic operations can help us. Lets try very rudimentary categorization.

Categorization is complex process and can be divided to at least two sub-phases.

  • pattern recognition / grounding : were we associate the external pattern or behavior to a symbol.
  • symbol/s adjustments : were we adjust the meaning of the symbol making it similar or dissimilar to other symbols i.e. clustering.

In the current tutorial we are interested of the simplified second phase. Also the examples below happen to follow the two complementary theories for categorization, namely :

  • Prototype theory : emphasizes that we construct summary representations. In our case that is ad-hoc summary-symbol for the category e.g. 'chair'.
  • Exemplar model : emphasize that categories are grounded in our experience and that we store individual and concrete items. We have multiple instances of a category which we adjust to more closely resemble it.

For the purpose of this example we have several instances of chairs and we want to crystallize from them the concept of 'chairs'. (BTW we can ground those symbols on NN-Labels as we did with the 'summing' example).

In [14]:
x.add_items(['wooden_chair', 'metal_chair', 'special_chair', 'wooden_table', 'chair'])

As a prerequisite we know those chairs have to be 'closer', but by default all the symbols are orthogonal on instantiation. We need a way to make them more similar, for this reason we will use an anchor that serves the purpose of the category of chairs.

In [15]:
#initially chairs are orthogonal to the concept
print x['wooden_chair'] % x['chair']
print x['metal_chair'] % x['chair']

#the instances themselves are also dissimilar by default, the goal: make them similar
print x['wooden_chair'] % x['metal_chair']
print sdp.sim(x['wooden_chair'], x['metal_chair'])
4929
4946
5025
False

We know that the bundle operation aside from bundling also result in a vector closer to its operands (by hamming distance).

So if we bundle every type of chair with the 'chair-category'-SDP we will move the vectors towards it, making them at the same time more similar themselves. (Of course if we use lexicon as we do here we have to update them using the .set() method).

In [16]:
new_wc = x['wooden_chair'] + x['chair']
x.set('wooden_chair', new_wc)

new_mc = x['metal_chair'] + x['chair']
x.set('metal_chair', new_mc)

Lets compare again the updated chairs :

In [17]:
print x['wooden_chair'] % x['metal_chair']
print sdp.sim(x['wooden_chair'], x['metal_chair'])
3776
True

Now the distance shrunk, both towards the other chairs and the chair-concept.

In [18]:
print x['wooden_chair'] % x['chair']
print x['metal_chair'] % x['chair']
2498
2508

Also the modified chairs are still orthogonal to the rest of the symbols.

In [19]:
print x['wooden_chair'] % x['wooden_table']
print x['metal_chair'] % x['wooden_table']
5044
4994

As I said this is rudimentary categorization, just to give you yet another idea of how to use the sub-symbolic algebra. As I'm experimenting, I'm also contemplating a ways to integrate the SDP-algebra into the higher levels, such as the Prolog syntax, this way the symbols themselves can morph to more closely resemble the problem we are trying to solve.

Round two

Lets try it differently.

This time we would use SDP structures to represent different types of chairs i.e. we will build them from more basic atoms, rather than being atomic in the first place.

Remember SDPs less that 42% apart are counted similar. I normally use dollar-symbols as a slot-name when building structures (or role if we think in role-filler terms), unless there is better way to describe the bind-pair.

In [21]:
x.erase() #clear all the symbols from the lexer
#... and add the new basic atoms
x.add_items(['wood', 'metal', 'table', 'chair', '$1', '$2', 'isa', 'furniture'])
In [22]:
# create wooden-chair symbol-structure
wc = x['$1'] * x['wood'] + x['isa'] * x['chair']
# create metal-chair symbol-structure
mc = x['$1'] * x['metal'] + x['isa'] * x['chair']

both chairs should be similar because they both contain the 'isa * chair' bind category.

In [23]:
wc % mc
Out[23]:
3732

but they will still be orthogonal to 'chair' itself, because it was included as bind-pair in the structure not as standalone item.

In [24]:
wc % x['chair']
Out[24]:
5001

Here is the correct way to compare it :

In [30]:
print wc % ( x['isa'] * x['chair'] )
print mc % ( x['isa'] * x['chair'] )
# or you can use best-match to get the symbol name
print x.bm(x['isa'] * wc)
2473
2463
chair

Now lets apply the first categorisation rule on the structured-chairs, namely make them super-category 'furniture'.

In [28]:
#furniture-wooden-chair
fwc = wc + x['furniture']
#furniture-metal-chair
fmc = mc + x['furniture']

As we expected 'furniture-wooden-chair' is close to 'furniture-metal-chair'.

In [32]:
fwc % fmc
Out[32]:
3439

and 'furniture-wooden-chair' is close to 'wooden-chair' too.

In [34]:
fwc % wc
Out[34]:
2469

But the important question to ask ourselves is did we ruined the initial structure i.e. can we still extract the roles and fillers.

In [31]:
# does 'furniture-wooden-chair' still contain 'wood'
print (x['$1'] * fwc) % x['wood']
print x.bm(x['$1'] * fwc)

#same for 'metal' and 'chair'
print (x['$1'] * fmc) % x['metal']
print (x['isa'] * fmc) % x['chair']
3715
wood
3809
3805

Yes the structure seems to be still intact and we can query it. Of course the more 'categorizations' we apply the less probable is that we will succeed extracting information, but that is in sync with our expectations.

It is important that we can peek even one level deep with single operation. The fuzziness and at the same time discreetness is one of the qualities we are looking for to bridge the gap between Connectionism and Symbolism.


... compared to Python solution

To more clearly see the difference, lets compare how would we eventually dos similar thing in normal programming language. For illustration lets use Python and dictionaries. First we will replicate the lexer :

In [2]:
z = { 'wood' : 'wood', 'metal' : 'metal', 'chair' : 'chair', '$1' : '$1',
      '$2' : '$2', 'isa' : 'isa', 'furniture' : 'furniture'}

Next we create the chairs :

In [3]:
wc = { z['$1'] : z['wood'], z['isa'] : z['chair'] }
mc = { z['$1'] : z['metal'], z['isa'] : z['chair'] }

So far look may be even easier than SDP, but the problem starts when we have to find similarity. What does similarity in the first place means in this case ? The easiest solution seems to be to count the key-value pairs that match and divide by total number of keys, much more cumbersome than simple XOR-count.

Lets try it :

In [4]:
def dist(a,b):
   i = 0
   #in case structs have diff keys
   keys = set(a.keys() + b.keys())
   for k in keys : #if key in both and values equal
     if k in a and k in b and a[k] == b[k] : i += 1
   return i/float(len(keys))
In [5]:
dist(wc,mc)
Out[5]:
0.5

Here the closer the number to 1 the better. Lets see if we add 'furniture' category.

In [6]:
fwc = wc.copy()
fwc['$2'] = z['furniture']
fwc
Out[6]:
{'$1': 'wood', '$2': 'furniture', 'isa': 'chair'}
In [7]:
dist(fwc,wc)
Out[7]:
0.6666666666666666
In [8]:
dist(fwc,mc)
Out[8]:
0.3333333333333333

Keep in mind we made our life easier here by choosing flat dictionary structure. If we have to replicate SDP more truthfully then we have to use tuple for binding and a list for bundling, which would have complicated the matter by much.

In [ ]: