Here is basic html parser which I’ve used in several projects recently. The code depends on libxml2 and is basically a thin wrapper that provides a more convenient interface for parsing html with objective c. This has only been tested on iphone OS 3.1.3 & 3.2, if your using a OSX your probably better of investigating using Webkit to manipulate your DOM.
The code below is provided under an MIT license, but if you do make any updates it would be great if you could send them back.
Usage:
//Example to download google's source and print out the urls of all the images
NSError * error = nil;
HTMLParser * parser = [[HTMLParser alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.google.com"] error:&error];
if (error) {
NSLog(@"Error: %@", error);
return;
}
HTMLNode * bodyNode = [parser body]; //Find the body tag
NSArray * imageNodes = [bodyNode findChildTags:@"img"]; //Get all the
for (HTMLNode * imageNode in imageNodes) { //Loop through all the tags
NSLog(@"Found image with src: %@", [imageNode getAttributeNamed:@"src"]); //Echo the src=""
}
[parser release];
You can grab a copy of the code here on github:
http://github.com/zootreeves/Objective-C-HMTL-Parser
[email protected]:zootreeves/Objective-C-HMTL-Parser.git
Note:
-(NSString*)rawContents; does not work, you need to use NSString * rawContentsOfNode(xmlNode * node, htmlDocPtr doc); to dump the entire html contents of a node.
Hi Ben,
In git-hub you seem to have a problem with your latest check-in. The HTMLNode.h has not been updated with the code change to HTMLNode.m.
The mod required to HTMLNode.h is:
//Returns the contents including html tags
-(NSString*)rawContents;
// JA
//NSString * rawContentsOfNode(xmlNode * node, htmlDocPtr doc);
NSString * rawContentsOfNode(xmlNode * node);
There is also a mistake in the latest HTMLNode.m. The problem is that there is no such method called hxmlNodeDumpOutput.
The mod required to HTMLNode.m is:
// JA
//hxmlNodeDumpOutput(buf, node->doc, node, 3, 0, NULL);
xmlNodeDumpOutput(buf, node->doc, node, 3, 0, NULL);
Hope this helps.
Cheers
James