Tuesday, July 13, 2010

Why am I writing lower_bound() in Java again?

The second and the third time I find myself looking for a lower_bound() function according to generally accepted STL standard, I know that I'm not a Java-type developer. Java developers must have no need for these functions that I find very useful. A search for "java lower bound" or "java upper bound" does not yield the predictable result.

The Apache commons has some good collections utilities but not the lower_bound(), upper_bound() pair.

It is still a mystery why the Java Lib folks decided to ignore this function. The use of it is quite common for look-up tables. Today, I wanted to use these functions to look up for some limit values for input ranges from a table. The easiest way to explain this is that different input ranges require a look-up into a number determined by the range. The ranges are consecutive, say 0-25, 25-50, 50-100,100-500. Each range would map to a number, say 100, 212, 320, 400. The look-up function needs to return the correct number given an input number within a range.

For ex:

10 : 100
25 : 212
30 : 212
200 : 400

To satisfy the lower_bound, upper_bound standard specs, a number below the smallest range (say -10) would map to 100, and a number above the largest range will map to nothing.

lower_bound() should return an index to the first element in the sorted container that is equal to or above the number being looked up, and -1 if there is no such element.

upper_bound() should return an index to the first element in the sorted container that is above the number being looked up, and -1 if there is no such element.

Here are the functions I ended up writing:

    public static int lower_bound(Comparable[] arr, Comparable key) {
        int len = arr.length;
        int lo = 0;
        int hi = len-1;
        int mid = (lo + hi)/2;
        while (true) {
            int cmp = arr[mid].compareTo(key);
            if (cmp == 0 || cmp > 0) {
                hi = mid-1;
                if (hi < lo)
                    return mid;
            } else {
                lo = mid+1;
                if (hi < lo)
                    return mid<len-1?mid+1:-1;
            }
            mid = (lo + hi)/2;
        }
    }

    public static int upper_bound(Comparable[] arr, Comparable key) {
        int len = arr.length;
        int lo = 0;
        int hi = len-1;
        int mid = (lo + hi)/2;
        while (true) {
            int cmp = arr[mid].compareTo(key);
            if (cmp == 0 || cmp < 0) {
                lo = mid+1;
                if (hi < lo)
                    return mid<len-1?mid+1:-1;
            } else {
                hi = mid-1;
                if (hi < lo)
                    return mid;
            }
            mid = (lo + hi)/2;
        }
    }

Here are the tests:

    public void test_lower_bound() {
        final int arrSize = 500;
        final int maxNum = 1000;
        Integer[] bArr = new Integer[arrSize];
        Random r = new Random();
        for (int k=0; k<arrSize; k++) {
            bArr[k]=r.nextInt(maxNum);
        }
        Arrays.sort(bArr);

        final int numIter = 2000;
        for (int k=0; k<numIter; k++) {
            int n = r.nextInt(maxNum);
            int j = Utils.lower_bound(bArr, n);
            assertTrue((bArr[arrSize-1]<n && j==-1) || (bArr[j]>=n && (j==0 || bArr[j-1]<n)));
        }
    }

    public void test_upper_bound() {
        final int arrSize = 500;
        final int maxNum = 1000;
        Integer[] bArr = new Integer[arrSize];
        Random r = new Random();
        for (int k=0; k<arrSize; k++) {
            bArr[k]=r.nextInt(maxNum);
        }
        Arrays.sort(bArr);

        final int numIter = 2000;
        for (int k=0; k<numIter; k++) {
            int n = r.nextInt(maxNum);
            int j = Utils.upper_bound(bArr, n);
            assertTrue((bArr[arrSize-1]<=n && j==-1) || (bArr[j]>n && (j==0 || bArr[j-1]<=n)));
        }

    }

7 comments:

Anonymous said...

Oh yeah.
new TreeSet(yourMap.keySet()).first()

Anonymous said...

// This is the real solution...

static Map.Entry lower_bound(TreeMap mpMap, K key)
{ return mpMap.headMap(key, true).lastEntry(); }

static Map.Entry upper_bound(TreeMap mpMap, K key)
{ return mpMap.tailMap(key, true).firstEntry(); }

Anonymous said...

You're missing the point...

These functions should be available for sorted arrays. Binary search is, for instance. If Java has binary search available for sorted arrays, they really should have a lower bound and upper bound searches as well...

Antar said...

As it happens to be, I also am a C++ developer who finds upper_bound / lower_bound extremely useful. I was surprised to find that these aren't available in java when like you I wanted to use it in a lookup table quite similar to yours.

Anyway, the reason I am posting this is because there's a bug in your implementation.

at mid = (lo + hi)/2 , lo + hi can overflow causing array out of bounds exception. It's better to use (lo + hi) >>> 1

thushara said...

Hi Antar. If you are concerned of overflow, you could do (hi-lo)/2+lo

Anonymous said...

Arrays#binarySearch does the trick

Anonymous said...

The person who refers to TreeMap wrapper, you have no idea what C++ upper_bound/lower_bound does.

A sorted array that might have duplicated entries cannot be a TreeMap's keyset.