Tuesday, February 22, 2011

Java code to tail -N a text file

This code allows you to go over the last N lines of a specified file. It has a "head" method, which simply allows you to go over the file from the beginning.
import java.io.*;
import java.nio.channels.FileChannel;
import java.nio.CharBuffer;
import java.nio.ByteBuffer;
import java.util.Iterator;
import java.util.NoSuchElementException;public class MMapFile {

    public class MMapIterator implements Iterator<String> {
        private int offset;

        public MMapIterator(int offset) {
            this.offset = offset;
        }
        
        public boolean hasNext() {
            return offset < cb.limit();
        }

        public String next() {
            ByteArrayOutputStream sb = new ByteArrayOutputStream();
            if (offset >= cb.limit())
                throw new NoSuchElementException();
            for (; offset < cb.limit(); offset++) {
                byte c = (cb.get(offset));
                if (c == '\n') {
                    offset++;
                    break;
                }
                if (c != '\r') {
                    sb.write(c);
                }

            }
            try {
                return sb.toString("UTF-8");
            } catch (UnsupportedEncodingException e) {}
            return sb.toString();
        }

        public void remove() {

        }
    }


    private ByteBuffer cb;
    long size;
    private long numLines = -1;
    public MMapFile(String file) throws FileNotFoundException, IOException {
        FileChannel fc = new FileInputStream(new File(file)).getChannel();
        size = fc.size();
        cb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
    }

    public long getNumLines() {
        if (numLines != -1) return numLines;  //cache number of lines
        long cnt = 0;
        for (int i=0; i <size; i++) {
            if (cb.get(i) == '\n')
                cnt++;
        }
        numLines = cnt;
        return cnt;
    }

    public Iterator<String> tail(long lines) {
        long cnt=0;
        long i=0;
        for (i=size-1; i>=0; i--) {
            if (cb.get((int)i) == '\n') {
                cnt++;
                if (cnt == lines+1)
                    break;
            }
        }
        return new MMapIterator((int)i+1);
    }

    public Iterator<String> head() {
        return new MMapIterator(0);
    }

    static public void main(String[] args) {
        try {
            Iterator<String> it = new MMapFile("/test.txt").head();
            while (it.hasNext()) {
                System.out.println(it.next());
            }
        } catch (Exception e) {
            
        }

        System.out.println();

        try {
            Iterator<String> it = new MMapFile("/test.txt").tail(2);
            while (it.hasNext()) {
                System.out.println(it.next());
            }
        } catch (Exception e) {

        }

        System.out.println();

        try {
            System.out.println("lines: "+new MMapFile("/test.txt").getNumLines());
        } catch (Exception e) {

        }

    }

}

The technique is to simply map the file into memory using java.nio.channels.FileChannel.map in the Java NIO library and manipulate the file data using memory techniques.

For the "tail" function, we walk back the mapped bytes counting newlines. The MMapIterator class conveniently provides a way to iterate over lines once we find the starting line.

There is a point where care must be taken in the MMapIterator.next() implementation. That is making sure that bytes are converted to the appropriate string encoding. We use "UTF-8" but if you are dealing with a different encoding in the input file, this should be changed.

12 comments:

Anonymous said...

Hi,

Thanks for this blog. I was looking to write a program for doing these features.

Can you please help me how to execute this program. Should I create a project in eclipse ? Where I can create a directory where I can keep the file to read.

Regards
Guru

thushara said...

You don't need a project. Save this file and use javac to compile it.

Anonymous said...

Hi Thushara,

Thank you so much for your response.

I will try out that. I want to read the last newly appended lines. Assume I am going to read a log file. I need to read only the latest logs.

Please help. Share your thoughts.

Regards
Guru

thushara said...

Do you want to keep reading lines as the file is being written?

Anonymous said...

Ya. But while reading again, I should only get the newly written lines.

thushara said...

Does your log file periodically get rolled, so that a new log file by the same name is created?

Ilango said...

Hi
I would like to read files as they are being written. For example, I have a read program that will run every 10 minutes. This will program will have to "remember" where it left off last time on the log file that it read previously. It has to pick up from where it left previously and read the newly added lines to the file that just grew bigger.
Does your program do that?

thushara said...

@llango - Is it possible to convert your program into a server? Meaning, it doesn't start/stop but always runs? Then it can keep the file offset in memory, and every 10 minutes read and see if the pointer has advanced.

stlguy said...

Hi Thushara
I have this strange requirement that my program cannot be a server. It cannot be running continuously.
Do you make sense of this?

thushara said...

In that case, you would have to save the offset to the last line read nto disk. Every tome your program runs, you could use a memory mapping technique to read unto that offset.

thushara said...

In that case, you would have to save the offset to the last line read nto disk. Every tome your program runs, you could use a memory mapping technique to read unto that offset.

thushara said...

In that case, you would have to save the offset to the last line read nto disk. Every tome your program runs, you could use a memory mapping technique to read unto that offset.