Tuesday, June 30, 2009

Performance Improvement in org.apache.hadoop.io.Text class


I wrote earlier on a performance improvement I made to Hadoop. Upon discussing with Hadoop devs, notably Chris Douglas, this change was made to the core org.apache.hadoop.io.Text class. This has the additional benefit of improving a core text handling class used commonly in Hadoop, and we avoid the additional memory foot-print created by having an additional instance of OutputStream.

This improvement will be available in hadoop 0.21.0:

Note the difference in YourKit profiling data with the new Text class:

No comments: