Saturday, August 12, 2017

A Trie


This is a trie that uses a sentinel node to denote the end of a word. This is more space efficient than having to flag each node as to whether it denotes an end of a word. To quickly find the number of prefix matches, it stores the prefix count in the node.

class Trie {
    char ch;
    int count = 0;
    Map<Character, Trie> list = new HashMap<Character, Trie>();
    
    public Trie(char ch) {
        this.ch = ch;
    }
    
    public Trie add(char ch) {
        Trie node = this.list.get(ch);
        if (node == null) {
            Trie newNode = new Trie(ch);
            this.list.put(ch, newNode);
            node = newNode;
        }

        //adding the count to the current node is preferable
        //to adding to the node that matches the character.
        //This way, we won't add to the sentinel node
        //and we add only in one place.
this.count++; return node; } public int size() { return this.count; } private Trie findChar(char ch) { return this.list.get(ch); } public boolean findWord(String word) { Trie node = this; for (char ch: word.toCharArray()) { node = node.findChar(ch); if (node == null) { return false; } }
        //we may have found a prefix, make sure it is a word
        //if it's a word, the list must have the sentinel.
        return node.list.get((char)0) != null;
    }
    
    public int findPartial(String prefix) {
        Trie node = this;
        for (char ch : prefix.toCharArray()) {
            node = node.list.get(ch);
            if (node == null) {
                return 0;
            }
        }
        return node.size(); 
    }
    
    public void add(String s) {
        Trie node = this;
        for (char ch : s.toCharArray()) {
            node = node.add(ch);
        }
        //add the sentinel to mark the end of the word.
        node.add((char)0);
    }
}

folding with python

Solving this problem the functional way =>

Find if a sorted list of positive numbers has duplicates.


>>> def has_dups(nums):
...     return reduce (lambda x,y: ( x[0] or (y == x[1]), y), nums, (False,0))[0]
... 
>>> has_dups([1])
False
>>> has_dups([1,1])
True
>>> has_dups([1,1,1])
True
>>> has_dups([1,2,4])
False
>>> has_dups([1,2,2])
True
>>> has_dups([1,2,2,3,4])
True

>>> 

From the definition of reduce:

reduce(functioniterable[initializer])
Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned. Roughly equivalent to:

To solve the problem, we need to check if two adjacent items have the same value. To do this the functional way, we walk through the list using the reduce operator. At each point in the list, the reduce operator applies the user supplied function to the previous output(x) and the current item(y) of the list.

We need to remember if we see two adjacent items, and pass it as the output. But we also have to pass the current value so that at the next step, reduce can evaluate the given function. So we have a tuple as output from the function :

(truth value whether we have seen two adjacent items, current item)

We need to pass an initial value for the tuple (initializer). The truth value would be False initially, and we pass a zero as all elements of the list are positive. (False, 0)

So within the lambda, we need to check if the current item is the same as the previous item (y == x[1]) but if we had already met this condition (x[0]), we need to pass this along.

One of the drawbacks of the fold is that there is no quick break from traversing the list once we find a duplicate. It is possible to raise an exception in lambda and force a termination that way, but I don't know of a clean way to terminate the walk of the complete list.

Monday, August 07, 2017

Python 2.7 scoping bug

Here is a piece of code that does not work on Python 2.7:

 #!/usr/bin/python  
 def img_type(s):  
   return str(s)  
 print (img_type(50))  
 a = [img_dir for (img_dir, img_type) in [("a",1),("b",2)]]  
 print (img_type(20))  

On the second print, it raises a TypeError:

TypeError: 'int' object is not callable

The interpreter is incorrectly identifying the scoped variables img_dir, img_type to be in global scope. Since the function is of the same name, the variable takes precedence. Actually it overwrites the function!

We can see what is happening by looking at the globals().items() and locals().items(). Each is a list of tuples where each tuple contains the variable name and its currently assigned value. Here is a modified program that lists the variables, before and after we define the list comprehension:

#!/usr/bin/python

def img_type(s):
    return str(s)

print (img_type(50))

print ("BEFORE")
print (globals().items())
print (locals().items())


a = [img_dir for (img_dir, img_type) in [("a",1),("b",2)]]

print ("AFTER")
print (globals().items())
print (locals().items())

print (img_type(20))

This outputs:

50
BEFORE
[('img_type', <function img_type at 0x7f740ad5eb18>), ('__builtins__', <module '__builtin__' (built-in)>), ('__file__', './proof1.py'), ('__package__', None), ('__name__', '__main__'), ('__doc__', None)]
[('img_type', <function img_type at 0x7f740ad5eb18>), ('__builtins__', <module '__builtin__' (built-in)>), ('__file__', './proof1.py'), ('__package__', None), ('__name__', '__main__'), ('__doc__', None)]
AFTER
[('img_type', 2), ('a', ['a', 'b']), ('__builtins__', <module '__builtin__' (built-in)>), ('img_dir', 'b'), ('__file__', './proof1.py'), ('__package__', None), ('__name__', '__main__'), ('__doc__', None)]
[('img_type', 2), ('a', ['a', 'b']), ('__builtins__', <module '__builtin__' (built-in)>), ('img_dir', 'b'), ('__file__', './proof1.py'), ('__package__', None), ('__name__', '__main__'), ('__doc__', None)]
Traceback (most recent call last):
  File "./proof1.py", line 19, in <module>
    print (img_type(20))
TypeError: 'int' object is not callable

Notice how after the list comprehension the img_type() function got clobbered by the locally scoped variable by the same name.

This is fixed as of Python 3.2. Here is the output running the second version of the program:

50
BEFORE
dict_items([('__name__', '__main__'), ('__doc__', None), ('__loader__', <_frozen_importlib_external.SourceFileLoader object at 0x7ff93f4b7780>), ('__file__', './proof1.py'), ('__builtins__', <module 'builtins' (built-in)>), ('__spec__', None), ('img_type', <function img_type at 0x7ff93f41c2f0>), ('__package__', None), ('__cached__', None)])
dict_items([('__name__', '__main__'), ('__doc__', None), ('__loader__', <_frozen_importlib_external.SourceFileLoader object at 0x7ff93f4b7780>), ('__file__', './proof1.py'), ('__builtins__', <module 'builtins' (built-in)>), ('__spec__', None), ('img_type', <function img_type at 0x7ff93f41c2f0>), ('__package__', None), ('__cached__', None)])
AFTER
dict_items([('__name__', '__main__'), ('__doc__', None), ('a', ['a', 'b']), ('__loader__', <_frozen_importlib_external.SourceFileLoader object at 0x7ff93f4b7780>), ('__file__', './proof1.py'), ('__builtins__', <module 'builtins' (built-in)>), ('__spec__', None), ('img_type', <function img_type at 0x7ff93f41c2f0>), ('__package__', None), ('__cached__', None)])
dict_items([('__name__', '__main__'), ('__doc__', None), ('a', ['a', 'b']), ('__loader__', <_frozen_importlib_external.SourceFileLoader object at 0x7ff93f4b7780>), ('__file__', './proof1.py'), ('__builtins__', <module 'builtins' (built-in)>), ('__spec__', None), ('img_type', <function img_type at 0x7ff93f41c2f0>), ('__package__', None), ('__cached__', None)])
20

Monday, March 13, 2017

Docker: delete all tags of image

Handy one liner to delete all tags of a particular docker image (on Linux):



docker images | grep rabbitmq | tr -s ' ' | cut -d ' ' -f 2 | xargs -I {} docker rmi docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq:{}  

Here are the image tags that this one-liner removes:

1
2
3
4
5
6
7
thusharaw@denali ois-rabbitmq (thushara/rabbitv1)*$ docker images | grep rabbitmq
docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq                   2017031300          6fb1ed865ae6        3 hours ago         179 MB
docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq                   2017031307          6fb1ed865ae6        3 hours ago         179 MB
docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq                   2017031007          21b84ae0586e        2 days ago          179 MB
docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq                   201703100           43d51d243631        2 days ago          179 MB
docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq                   201703107           08927efd735e        2 days ago          179 MB
docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq                   201703109           646cb7852d8e        2 days ago          179 MB

Let's break down the on-liner:


1) docker images => 

get the images

2) grep rabbitmq

find the tags you care about

3) tr -s ' '

convert all repeating contiguous spaces to a single space so we can easily index to a specific column, the tag in this case

4) xargs -I {} docker rmi docker-staging-local.artifactory.corp.alleninstitute.org/rabbitmq:{} 

pass the thusly recovered tag to the `docker rmi` command, but use xargs to change the stdout of the previous cmd to a cmd line arg (which is what `docker rmi` works with)