Saturday, October 23, 2010

GZipInputStream throws spurious exceptions in how it handles the end of the file

Sun bug database mentions here and here about spurious errors thrown by GZipInputStream.

But unfortunately, there are no fixes. Also these spurious exceptions are not thrown just for large files. I have seen these errors for files as small as 5K. Here is a set of decompressed files, sorted by size that I managed to generate by ignoring these spurious errors:

x02:~$ cat /tmp/z | perl -ne 'if (/based on (.*)$/) {system("ls -ltr `echo -n $1|md5`.url");}' | sort -n -t ' ' -k 5
-rw-r--r-- 1 mpire mpire 5501 2010-10-23 14:03 409ff1c2b7ce2887db8a5c98d395b543.url
-rw-r--r-- 1 mpire mpire 38681 2010-10-23 14:03 8bc8f64132dd2e4bdbccff555cfa6966.url
-rw-r--r-- 1 mpire mpire 44554 2010-10-23 14:01 ba1c53b23f747efb3aa3a7531da80fb1.url
-rw-r--r-- 1 mpire mpire 45415 2010-10-23 14:03 073cd89a26dd69bac8f8a734bcaec7f1.url
-rw-r--r-- 1 mpire mpire 46058 2010-10-23 14:00 f0ae11a51e1975838c26c428ce14308a.url
-rw-r--r-- 1 mpire mpire 46192 2010-10-23 14:03 73972b286ec326e91404008b4c125e5a.url
-rw-r--r-- 1 mpire mpire 46414 2010-10-23 14:00 c6389b488bf912ddece9075884ff7c80.url
-rw-r--r-- 1 mpire mpire 47030 2010-10-23 14:00 07d9fe64764458b55626d9eb047d5d4b.url
-rw-r--r-- 1 mpire mpire 47565 2010-10-23 14:03 67a9869337a777c7c6a8411fd55e1b39.url
-rw-r--r-- 1 mpire mpire 49034 2010-10-23 14:01 0792cd32f0ef59cbe3592a5c2a7b5744.url
-rw-r--r-- 1 mpire mpire 58397 2010-10-23 14:03 3c973b623e9d27cd36234222f6542788.url
-rw-r--r-- 1 mpire mpire 58981 2010-10-23 14:03 780f0be38cbe0e73de788597cb482af4.url
-rw-r--r-- 1 mpire mpire 59177 2010-10-23 14:01 15dd65994702db5e360704144874f3f8.url
-rw-r--r-- 1 mpire mpire 60043 2010-10-23 14:03 c355cd84d17be2cf405b04fb3663d181.url
-rw-r--r-- 1 mpire mpire 63189 2010-10-23 14:03 43ab8fabf72b5564bc0e8b1ff3fcebe7.url
-rw-r--r-- 1 mpire mpire 70235 2010-10-23 14:01 3a24135855df237d829960a15cd8b170.url
-rw-r--r-- 1 mpire mpire 71536 2010-10-23 14:01 48f80b3fb1c51ea52313fe76b55a3849.url
-rw-r--r-- 1 mpire mpire 76932 2010-10-23 14:03 8c789913e5666e793c38d02001486532.url
-rw-r--r-- 1 mpire mpire 78825 2010-10-23 14:01 49728436874d94d8b1ab0ef17d8b4736.url
-rw-r--r-- 1 mpire mpire 80459 2010-10-23 14:05 c52a68d48d2c23ef846950dec999f084.url
-rw-r--r-- 1 mpire mpire 80459 2010-10-23 14:05 c52a68d48d2c23ef846950dec999f084.url
-rw-r--r-- 1 mpire mpire 83001 2010-10-23 14:00 a963a7357bf480fe24040ecf05e8927d.url
-rw-r--r-- 1 mpire mpire 105473 2010-10-23 14:01 db76aed3163beb6ad49670866045a9d8.url
-rw-r--r-- 1 mpire mpire 109405 2010-10-23 14:01 70da57c0544c122766a9ad6772757f2b.url
-rw-r--r-- 1 mpire mpire 110921 2010-10-23 14:01 579e0f08a036befe374ff8c70126a2bb.url
-rw-r--r-- 1 mpire mpire 111880 2010-10-23 14:00 3091fb91ae4a92b06392acead7170a57.url
-rw-r--r-- 1 mpire mpire 116796 2010-10-23 14:05 a326a52188abc263e2f8804444b48c8c.url
-rw-r--r-- 1 mpire mpire 154209 2010-10-23 14:06 e90211e8f799a88dc70ff83d1aba0748.url
-rw-r--r-- 1 mpire mpire 159089 2010-10-23 14:01 8ef49b62542f8283acba4e573e28ea59.url
-rw-r--r-- 1 mpire mpire 168786 2010-10-23 14:01 487d9c54fbd3e4b760c91dbf5f754d32.url
-rw-r--r-- 1 mpire mpire 212561 2010-10-23 14:03 b3cb42c946716cfe7e31096bd458e9f2.url
-rw-r--r-- 1 mpire mpire 222257 2010-10-23 14:03 f803ac8246dd2ab7dd16f7d28d5b4594.url


The decompressed files are good. I ran into this issue reading gzipped files from web servers that zip content and send this zipped data in chunks using Transfer-Encoding: chunked.

The error seems to be not related to the Java libraries. I saved the gzip content and tried to unzip with gunzip. This failed as well:

x02:/tmp$ gunzip 0792cd32f0ef59cbe3592a5c2a7b5744.gz 

gzip: 0792cd32f0ef59cbe3592a5c2a7b5744.gz: unexpected end of file

No comments: