Tuesday, December 16, 2008

Java has no unsigned types



I came across this today as I need to create a hash that maps a tuple to an object. The tuple comprises of two keys. Each key is in reality an unsigned integer (32 bit). I was hoping to construct a key for the hash by shifting one key 32 bits and ORing them together to make a 64bit key.

A good article on this can be found here.

The problem with not being able to use an unsigned type supported by the language is the introduction of subtle bugs.

Here's an example:

int x = 0x865a672b;
long y = (0x12345678<<32) | x

1) you can shift an integer (32 bit value) a maximum of 31 bits. in fact if you were to do:

val << shift, where val is an integer, java internally does val << (shift%32)
Thus in this case, 0x12345678 is not shifted at all.

2) so the 0x12345678 never gets promoted to a 64-bit long. Thus we expected 0x1234567800000000, but we have 0x12345678.

3) 0x865a672b is a negative number (as it is an integer). So the 64-bit representation would be 0xffffffff865a672b

4) so we're in fact performing 0x12345678 | 0xffffffff865a672b = 0xffffffff967e777b

So how about this? (We're using longs now)

long x = 0x865a672b;
long y = (0x12345678L<<32) | x;

This gives us 0xffffffff865a672b which is still not what we want, but close. So what is happening here is that even though we declared x to be a long, 0x865a672b is treated as an integer, and java converts that negative integer to a long and in the process keeps the sign (-ve). So x is really 0xffffffff865a672b. So when we OR this to 0x1234567800000000, the high order 32 bits are lost.

Ok, last try:

long x = 0x865a672bL;
long y = (0x12345678L<<32) | x;

Notice we're coercing 0x865a672bL to be a long (the L at the end). This works, giving us 0x12345678865a672b. Enough already!

So if you want to work with unsigned 32-bit values in java, always work in long data type and convert any literal to long with an L at the end.

No comments: