import java.io.*;
import java.nio.channels.FileChannel;
import java.nio.CharBuffer;
import java.nio.ByteBuffer;
import java.util.Iterator;
import java.util.NoSuchElementException;
public class MMapFile {
public class MMapIterator implements Iterator<String> {
private int offset;
public MMapIterator(int offset) {
this.offset = offset;
}
public boolean hasNext() {
return offset < cb.limit();
}
public String next() {
ByteArrayOutputStream sb = new ByteArrayOutputStream();
if (offset >= cb.limit())
throw new NoSuchElementException();
for (; offset < cb.limit(); offset++) {
byte c = (cb.get(offset));
if (c == '\n') {
offset++;
break;
}
if (c != '\r') {
sb.write(c);
}
}
try {
return sb.toString("UTF-8");
} catch (UnsupportedEncodingException e) {}
return sb.toString();
}
public void remove() {
}
}
private ByteBuffer cb;
long size;
private long numLines = -1;
public MMapFile(String file) throws FileNotFoundException, IOException {
FileChannel fc = new FileInputStream(new File(file)).getChannel();
size = fc.size();
cb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
}
public long getNumLines() {
if (numLines != -1) return numLines; //cache number of lines
long cnt = 0;
for (int i=0; i <size; i++) {
if (cb.get(i) == '\n')
cnt++;
}
numLines = cnt;
return cnt;
}
public Iterator<String> tail(long lines) {
long cnt=0;
long i=0;
for (i=size-1; i>=0; i--) {
if (cb.get((int)i) == '\n') {
cnt++;
if (cnt == lines+1)
break;
}
}
return new MMapIterator((int)i+1);
}
public Iterator<String> head() {
return new MMapIterator(0);
}
static public void main(String[] args) {
try {
Iterator<String> it = new MMapFile("/test.txt").head();
while (it.hasNext()) {
System.out.println(it.next());
}
} catch (Exception e) {
}
System.out.println();
try {
Iterator<String> it = new MMapFile("/test.txt").tail(2);
while (it.hasNext()) {
System.out.println(it.next());
}
} catch (Exception e) {
}
System.out.println();
try {
System.out.println("lines: "+new MMapFile("/test.txt").getNumLines());
} catch (Exception e) {
}
}
}
The technique is to simply map the file into memory using java.nio.channels.FileChannel.map in the Java NIO library and manipulate the file data using memory techniques.
For the "tail" function, we walk back the mapped bytes counting newlines. The MMapIterator class conveniently provides a way to iterate over lines once we find the starting line.
There is a point where care must be taken in the MMapIterator.next() implementation. That is making sure that bytes are converted to the appropriate string encoding. We use "UTF-8" but if you are dealing with a different encoding in the input file, this should be changed.
12 comments:
Hi,
Thanks for this blog. I was looking to write a program for doing these features.
Can you please help me how to execute this program. Should I create a project in eclipse ? Where I can create a directory where I can keep the file to read.
Regards
Guru
You don't need a project. Save this file and use javac to compile it.
Hi Thushara,
Thank you so much for your response.
I will try out that. I want to read the last newly appended lines. Assume I am going to read a log file. I need to read only the latest logs.
Please help. Share your thoughts.
Regards
Guru
Do you want to keep reading lines as the file is being written?
Ya. But while reading again, I should only get the newly written lines.
Does your log file periodically get rolled, so that a new log file by the same name is created?
Hi
I would like to read files as they are being written. For example, I have a read program that will run every 10 minutes. This will program will have to "remember" where it left off last time on the log file that it read previously. It has to pick up from where it left previously and read the newly added lines to the file that just grew bigger.
Does your program do that?
@llango - Is it possible to convert your program into a server? Meaning, it doesn't start/stop but always runs? Then it can keep the file offset in memory, and every 10 minutes read and see if the pointer has advanced.
Hi Thushara
I have this strange requirement that my program cannot be a server. It cannot be running continuously.
Do you make sense of this?
In that case, you would have to save the offset to the last line read nto disk. Every tome your program runs, you could use a memory mapping technique to read unto that offset.
In that case, you would have to save the offset to the last line read nto disk. Every tome your program runs, you could use a memory mapping technique to read unto that offset.
In that case, you would have to save the offset to the last line read nto disk. Every tome your program runs, you could use a memory mapping technique to read unto that offset.
Post a Comment