I wrote earlier on a performance improvement I made to Hadoop. Upon discussing with Hadoop devs, notably Chris Douglas, this change was made to the core org.apache.hadoop.io.Text class. This has the additional benefit of improving a core text handling class used commonly in Hadoop, and we avoid the additional memory foot-print created by having an additional instance of OutputStream.
This improvement will be available in hadoop 0.21.0:
Note the difference in YourKit profiling data with the new Text class:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinBLfck2UUHjmSoV1TAhr4mImdd9m9io29LvEZDE8o_l_SIRDrdSQJCUu9isevBsVnV-x7UCaoZkL47hpaFsOCNieF_aHSEIevo4MdPHIbneesFMSJ3txGvPpN5drGeKHJNc8iow/s400/text-perf-imp.jpg)
1 comment:
Hi nice reading yoour post
Post a Comment