Often in iPhone projects I’ve needed to parse XML documents from the internet.
- I want the data to be processed and shown to the user as quickly as possible.
- If the file is large I don’t want to have to store the entire contents in memory.
Using NSXMLParsers initWithURL: method the xml file is downloaded synchronously with NSURLConnection before parsing. If NSXMLParser was able to begin as soon as stream buffer began to fill, even though the overall parsing time wouldn’t be reduced significantly if the updates are immediately displayed the user perceived time decreases significantly.
Using the excellent objective c expat wrapper library by Robbie Hanson as a base iPhoneExpat uses CFNetwork & CFHTTPMessage to create and feed a http stream gradually into Expat. If the server supports gzip compression zlib is used to decompress the stream on fly.
iPhone Expat offer 3 methods for initialization
- (id)initWithContentsOfURL:(NSURL *)url;
- (id)initWithContentsOfFile:(NSString *)path;
- (id)initWithData:(NSData *)data;
The delegate messages are also a drop in replacement for NSXMLParser
@protocol ExpatXMLParserDelegate
@optional
- (void)parserDidStartDocument:(ExpatXMLParser*)parser;
- (void)parserDidEndDocument:(ExpatXMLParser*)parser;
- (void)parser:(ExpatXMLParser*)parser didStartMappingPrefix:(NSString *)prefix toURI:(NSString *)namespaceURI;
- (void)parser:(ExpatXMLParser*)parser didEndMappingPrefix:(NSString *)prefix;
- (void)parser:(ExpatXMLParser*)parser foundComment:(NSString *)comment;
- (void)parser:(ExpatXMLParser*)parser foundProcessingInstructionWithTarget:(NSString *)target data:(NSString *)data;
- (void)parser:(ExpatXMLParser*)parser parseErrorOccurred:(NSError *)parseError;
- (BOOL)parser:(ExpatXMLParser*)parser shouldProcessAttributesForElement:(NSString *)elementName;
@required
- (void)parser:(ExpatXMLParser*)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict;
- (void)parser:(ExpatXMLParser*)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName;
- (void)parser:(ExpatXMLParser *)parser foundCharacters:(NSString *)string;
@end
However
- (void)parser:(ExpatXMLParser *)parser foundCharacters:(NSString *)string;
Will report the entire contents of a tag so you don’t need an NSMutablestring to buffer fragments in your delegate. e.g.
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
[buffer release];
buffer = string;
[buffer retain];
}
Should be good in 99% of cases to get the entire contents of a tag
The project can be found here
http://github.com/zootreeves/iPhoneExpat
Alternatively checkout the code using git: http://github.com/zootreeves/iPhoneExpat
Benchmark test output:
I’ve included a test application which sequentially downloads and parses the following feeds:
@"http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/topalbums/sf=143441/limit=300/explicit=true/xml",
@"http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/topalbums/sf=143441/limit=300/explicit=true/xml",
@"http://feeds.feedburner.com/DilbertDailyStrip",
@"http://designsponge.blogspot.com/atom.xml",
@"http://www.slate.com/rss/",
@"http://rssfeeds.usatoday.com/UsatodaycomBooks-TopStories",
@"http://googleblog.blogspot.com/atom.xml",
@"hhttp://api.flickr.com/services/feeds/groups_pool.gne?id=61057342@N00〈=en-us&format=rss_200",
@"http://phobos.apple.com/WebObjects/MZStore.woa/wpa/MRSS/topsongs/limit=25/rss.xml",
@"http://www.readwriteweb.com/rss.xml",
@"http://rssfeeds.usatoday.com/UsatodaycomNation-TopStories",
@"http://dictionary.reference.com/wordoftheday/wotd.rss",
@"http://www.quotationspage.com/data/qotd.rss",
@"http://sports.espn.go.com/espn/rss/news"
Time to reach the first element Expat: 8.611217 — NSXMLParser: 15.404100
Total time for Expat: 10.902272 — NSXMLParser: 17.108309
Memory is also reduced by a factor of about x4 (see screenshots)
Further Considerations
- It would be great if http requests were persistent, however after setting kCFStreamPropertyHTTPAttemptPersistentConnection to true using httpscoop I will still able to intercept “connection close” messages between requests to the same server. I think I must be doing something wrong here.
- It would be preferable if as many objects as possible were released manually rather than using an autorelease pool. For example the delegate methods that pass an elementName it is common for the client to use this string to determine their current tag, but rare that they actually want to retain it. The code could be altered so these strings are not placed in a pool and are released immediately after the delegate message is sent. Not only can the overhead of collection be avoided but CFStringCreateWithCharactersNoCopy can be used as well.
- With UTF8 creating CFStrings with the [NSString stringwithutf8string:] was no problem. However when I configured Expat to use UTF-16 characters internally I found that CFStringCreateWithCharacters would often return garbage. The problem is I was passing in UINT16_MAX as the buffer length hoping that the CFString would be terminated at null, however it doesn’t work that way and I had to change everything to calculate the buffer size using UniCharStrlen(buffer) before hand.
- FTP (CFFTPStream) and file streams could easily be supported.

