tech: Why am I writing lower_bound() in Java again?

Tuesday, July 13, 2010

Why am I writing lower_bound() in Java again?

The second and the third time I find myself looking for a lower_bound() function according to generally accepted STL standard, I know that I'm not a Java-type developer. Java developers must have no need for these functions that I find very useful. A search for "java lower bound" or "java upper bound" does not yield the predictable result.

The Apache commons has some good collections utilities but not the lower_bound(), upper_bound() pair.

It is still a mystery why the Java Lib folks decided to ignore this function. The use of it is quite common for look-up tables. Today, I wanted to use these functions to look up for some limit values for input ranges from a table. The easiest way to explain this is that different input ranges require a look-up into a number determined by the range. The ranges are consecutive, say 0-25, 25-50, 50-100,100-500. Each range would map to a number, say 100, 212, 320, 400. The look-up function needs to return the correct number given an input number within a range.

For ex:

10 : 100
25 : 212
30 : 212
200 : 400

To satisfy the lower_bound, upper_bound standard specs, a number below the smallest range (say -10) would map to 100, and a number above the largest range will map to nothing.

lower_bound() should return an index to the first element in the sorted container that is equal to or above the number being looked up, and -1 if there is no such element.

upper_bound() should return an index to the first element in the sorted container that is above the number being looked up, and -1 if there is no such element.

Here are the functions I ended up writing:

    public static int lower_bound(Comparable[] arr, Comparable key) {
        int len = arr.length;
        int lo = 0;
        int hi = len-1;
        int mid = (lo + hi)/2;
        while (true) {
            int cmp = arr[mid].compareTo(key);
            if (cmp == 0 || cmp > 0) {
                hi = mid-1;
                if (hi < lo)
                    return mid;
            } else {
                lo = mid+1;
                if (hi < lo)
                    return mid<len-1?mid+1:-1;
            }
            mid = (lo + hi)/2;
        }
    }

    public static int upper_bound(Comparable[] arr, Comparable key) {
        int len = arr.length;
        int lo = 0;
        int hi = len-1;
        int mid = (lo + hi)/2;
        while (true) {
            int cmp = arr[mid].compareTo(key);
            if (cmp == 0 || cmp < 0) {
                lo = mid+1;
                if (hi < lo)
                    return mid<len-1?mid+1:-1;
            } else {
                hi = mid-1;
                if (hi < lo)
                    return mid;
            }
            mid = (lo + hi)/2;
        }
    }

Here are the tests:

    public void test_lower_bound() {
        final int arrSize = 500;
        final int maxNum = 1000;
        Integer[] bArr = new Integer[arrSize];
        Random r = new Random();
        for (int k=0; k<arrSize; k++) {
            bArr[k]=r.nextInt(maxNum);
        }
        Arrays.sort(bArr);

        final int numIter = 2000;
        for (int k=0; k<numIter; k++) {
            int n = r.nextInt(maxNum);
            int j = Utils.lower_bound(bArr, n);
            assertTrue((bArr[arrSize-1]<n && j==-1) || (bArr[j]>=n && (j==0 || bArr[j-1]<n)));
        }
    }

    public void test_upper_bound() {
        final int arrSize = 500;
        final int maxNum = 1000;
        Integer[] bArr = new Integer[arrSize];
        Random r = new Random();
        for (int k=0; k<arrSize; k++) {
            bArr[k]=r.nextInt(maxNum);
        }
        Arrays.sort(bArr);

        final int numIter = 2000;
        for (int k=0; k<numIter; k++) {
            int n = r.nextInt(maxNum);
            int j = Utils.upper_bound(bArr, n);
            assertTrue((bArr[arrSize-1]<=n && j==-1) || (bArr[j]>n && (j==0 || bArr[j-1]<=n)));
        }

    }

7 comments:

Anonymous said...: Oh yeah.
new TreeSet(yourMap.keySet()).first(); 12:11 PM
Anonymous said...: // This is the real solution...

static Map.Entry lower_bound(TreeMap mpMap, K key)
{ return mpMap.headMap(key, true).lastEntry(); }

static Map.Entry upper_bound(TreeMap mpMap, K key)
{ return mpMap.tailMap(key, true).firstEntry(); }; 8:18 AM
Anonymous said...: You're missing the point...

These functions should be available for sorted arrays. Binary search is, for instance. If Java has binary search available for sorted arrays, they really should have a lower bound and upper bound searches as well...; 7:16 PM
Antar said...: As it happens to be, I also am a C++ developer who finds upper_bound / lower_bound extremely useful. I was surprised to find that these aren't available in java when like you I wanted to use it in a lookup table quite similar to yours.

Anyway, the reason I am posting this is because there's a bug in your implementation.

at mid = (lo + hi)/2 , lo + hi can overflow causing array out of bounds exception. It's better to use (lo + hi) >>> 1; 10:28 AM
thushara said...: Hi Antar. If you are concerned of overflow, you could do (hi-lo)/2+lo; 6:55 PM
Anonymous said...: Arrays#binarySearch does the trick; 3:00 PM
Anonymous said...: The person who refers to TreeMap wrapper, you have no idea what C++ upper_bound/lower_bound does.

A sorted array that might have duplicated entries cannot be a TreeMap's keyset.; 10:41 PM