Wednesday, February 04, 2009
java HTMLParser - handling EncodingChangeException
If you are analyzing a web page containing characters with different encodings, the HTMLParser may throw an exception of type EncodingChangeException.
The correct way to handle this exception is to reset the parser and re-try parsing. On the second time, the HTMLParser is aware of multiple encodings and manages to parse the page without exceptions.
ex:
NodeList nodes = null;
try {
nodes = parser.extractAllNodesThatMatch(new NodeClassFilter(TitleTag.class));
} catch (EncodingChangeException ex) {
//accommodate new encoding, re-parse
parser.reset();
nodes = parser.extractAllNodesThatMatch(new NodeClassFilter(TitleTag.class));
}
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment