This caused certain things to go wrong in the production process. I had set a limit of a few Megs on all fetches and had assumed that a single fetch could not be more than a few Megs. This was the first time I have seen such a huge decompression rate. This caused a subsequent file mapping to fail due to inadequate memory.
The downloaded content suggested why this would compress so well. The URL was http://www.jeltel.com.au/news.php There seems to be a dynamically generated part on this URL. If you examine its source, you will see a marker like this:
<!-- JELTEL_CONTENT_BEGIN -->
Content after that seems dynamically generated. You will find markup like this:
<h2></h2> - <br/><h4>... <a href="">read more</a></h4>
On this particular instance, there was an unusually large amount of fake content generated. The downloaded file had just 33 lines, but the last long line was a huge repeating pattern of :
<a href="">read more</a></h4><br/><br/><h2></h2> - <br/><h4>...
This would of course compress well.
No comments:
Post a Comment