| d2r diego's weblog |
php's simplexml One of my favorite features of PHP 5 is SimpleXML. In brief, it maps the structure of an XML document to as variables within PHP, with subnodes as variables within the root object (recursively) and the text value of a node (as you'd expect) to the value of the variable. For example, the code in PHP 5 to parse RSS 2.0 and Atom 0.3 feeds (maintaining a relatively clean structure, and just parsing for the basics) is super straightforward. Observe!
<?php
//each individual entry in the feed
class FeedEntry {
public $title;
public $summary;
public $content;
public $created;
public $modified;
public $issued;
public $id;
public $link;
final function __construct() {
}
}
//the feed (including simpleXML parsing)
class Feed {
public $xmlContent;
//the resulting parsed entries
public $entries = array();
//the document information
public $title;
public $link;
public $description;
public $author;
public $created;
public $modified;
public $issued;
final function __construct($xmlcontent) {
$this->xmlContent = $xmlcontent;
}
function parse()
{
$xml = simplexml_load_string($this->xmlContent);
if ($xml['version'] == '2.0') { //rss 2.0
$this->parseRSSFeed($xml);
}
else if ($xml['version'] == '0.3') { //atom 0.3
$this->parseAtomFeed($xml);
}
}
function parseRSSFeed($xml) {
$this->created = $xml->channel->lastBuildDate;
$this->issued = $xml->channel->lastBuildDate;
$this->modified = $xml->channel->lastBuildDate;
$this->title = $xml->channel->title;
$this->description = $xml->channel->description;
$this->link = $xml->channel->link;
foreach ($xml->channel->item as $item) {
$this->parseRSSEntry($item);
}
}
function parseRSSEntry($entryToParse) {
$entry = new FeedEntry();
if ($entryToParse->description == '' &&
$entryToParse->title == '') {
return;
}
if ($entryToParse->description !== '' &&
$entryToParse->title == '') {
$title = substr(strip_tags($entryToParse->description), 0, 50) . '...';
}
else {
$title = $entryToParse->title;
}
$entry->title = $title;
$entry->summary = $entryToParse->description;
$entry->content = $entryToParse->description;
$entry->created = $entryToParse->pubDate;
$entry->issued = $entryToParse->pubDate;
$entry->modified = $entryToParse->pubDate;
$entry->link = $entryToParse->link;
$entry->id = $entryToParse->guid;
array_push($this->entries, $entry);
}
function parseAtomFeed($xml) {
$this->created = $xml->created;
$this->issued = $xml->issued;
$this->modified = $xml->modified;
$this->title = $xml->title;
$this->link = $xml->link;
$this->description = $xml->tagline;
foreach ($xml->entry as $entry) {
$this->parseAtomEntry($entry);
}
}
function parseAtomEntry($entryToParse) {
$entry = new FeedEntry();
$entry->title = $entryToParse->title;
$entry->summary = $entryToParse->summary;
$entry->content = $entryToParse->content;
$entry->created = $entryToParse->created;
$entry->issued = $entryToParse->issued;
$entry->modified = $entryToParse->modified;
$entry->link = $entryToParse->link;
$entry->id = $entryToParse->id;
array_push($this->entries, $entry);
}
}
?>
This code can then be used as follows:
$feed = new Feed($xmlcontent);
$feed->parse();
//print the title of each feed entry
foreach ($feed->entries as $entry) {
echo $entry->title;
}
Where $xmlcontent is obtained through, say, a curl call: $ch = curl_init(); curl_setopt($ch, CURLOPT_HTTPGET, 1); curl_setopt($ch, CURLOPT_URL,"http://www.dynamicobjects.com/d2r/index.xml"); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); $xmlcontent = curl_exec ($ch); curl_close($ch); All of which amounts to quite a lot of interesting functionality in a small set of easy-to-understand code. Does it have limitations? Sure. Is it enough for many XML parsing jobs? Yes. This is why, btw, PHP is so appropriate for building web stuff where there's so many user-driven elements and "point solutions" (taken to the extreme, one per page in a website). Who needs reuse when (re)coding is so easy? :) PS: I would like to read the following statement which I am making of my own free will and without being coerced in any (ouch!) way: Reuse is good, particularly with Java, and no one should ever say that it's not, lest they find themselves in the depths of hell, or possibly a Standards Body terminology & rules meeting. In summary: Java, good. PHP, bad. (Ruby, great.) PS2: Those that can quote something else from the Simpsons episode which the previous PS is spoofing get extra credit. PS3: Those who react to PS1 above without knowing the context of the Simpsons episode should refrain from commenting. There's a fine line between good and bad references, a fine line indeed, and while it's never hard to know when it's been crossed, I'd rather not be made aware of the distance traveled since. Categories: soft.dev Posted by diego on August 15 2005 at 11:40 PM | TrackBack (0) Copyright © Diego Doval 2002-2007.
|

One of my favorite features of PHP 5 is