Now blogging at diego's weblog. See you over there!

back to PST!

I just realized that the weblog was still "on" GMT (where I have spent most of the last 4 years) but now it's time to switch that, too. And done!

Sure that was easy. But I wonder, now. MT seemed to be intent on rebuilding all entries based on the new target time, but that was just appearances. It offered a rebuild but did nothing of the sort. No changes. All the dates that were wrong are still wrong. I wonder if there's a button to push somewhere. Has the timezone info been lost?

And what's the right etiquette? Plus, the timezone in itself contains some interesting information (for me at least), but only if it's accurate. Accuracy there would be hard to guarantee though, the browser can obtain some of that information from the OS, but that's not even accurate sometimes... Hm. Now if a GPS was permanently connected and feeding data into the system permanently and seamlessly...

Categories: technology
Posted by diego on August 15, 2005 at 11:42 PM

php's simplexml

One of my favorite features of PHP 5 is SimpleXML. In brief, it maps the structure of an XML document to as variables within PHP, with subnodes as variables within the root object (recursively) and the text value of a node (as you'd expect) to the value of the variable. For example, the code in PHP 5 to parse RSS 2.0 and Atom 0.3 feeds (maintaining a relatively clean structure, and just parsing for the basics) is super straightforward.

Observe!

<?php

  //each individual entry in the feed
  class FeedEntry {
    public $title;
    public $summary;
    public $content;
    public $created;
    public $modified;
    public $issued;
    public $id;
    public $link;
  	
    final function __construct() {
    }
  }

  //the feed (including simpleXML parsing)
  class Feed {
    public $xmlContent;
    //the resulting parsed entries
    public $entries = array();
	
    //the document information
    public $title;
    public $link;
    public $description;
    public $author;
    public $created;
    public $modified;
    public $issued;
		 
    final function __construct($xmlcontent) {
      $this->xmlContent = $xmlcontent;
    }
		 
    function parse()
    {
      $xml = simplexml_load_string($this->xmlContent);
      if ($xml['version'] == '2.0') { //rss 2.0
        $this->parseRSSFeed($xml);
      }
      else if ($xml['version'] == '0.3') { //atom  0.3
        $this->parseAtomFeed($xml);
      }
    }
		 
    function parseRSSFeed($xml) {
      $this->created = $xml->channel->lastBuildDate;
      $this->issued = $xml->channel->lastBuildDate;
      $this->modified = $xml->channel->lastBuildDate;
      $this->title = $xml->channel->title;
      $this->description = $xml->channel->description;
      $this->link = $xml->channel->link;
      foreach ($xml->channel->item as $item) {
       $this->parseRSSEntry($item);
      }
    }
		 
    function parseRSSEntry($entryToParse) {
      $entry = new FeedEntry();
      if ($entryToParse->description == '' &&
                  $entryToParse->title == '') {
        return;
      }
      if ($entryToParse->description !== '' &&
            $entryToParse->title == '') {
        $title = substr(strip_tags($entryToParse->description), 0, 50) . '...';
      }
      else { 
        $title = $entryToParse->title;
      }

      $entry->title = $title;
      $entry->summary = $entryToParse->description;
      $entry->content = $entryToParse->description;
	      
      $entry->created = $entryToParse->pubDate;
      $entry->issued = $entryToParse->pubDate;
      $entry->modified = $entryToParse->pubDate;
	      
      $entry->link =  $entryToParse->link;
      $entry->id = $entryToParse->guid;
		 		
      array_push($this->entries, $entry);
    }
		 
    function parseAtomFeed($xml) {
      $this->created = $xml->created;
      $this->issued = $xml->issued;
      $this->modified = $xml->modified;
      $this->title = $xml->title;
      $this->link = $xml->link;
      $this->description = $xml->tagline;
		 		
      foreach ($xml->entry as $entry) {
        $this->parseAtomEntry($entry);
      }
    }


    function parseAtomEntry($entryToParse) {
      $entry = new FeedEntry();

      $entry->title = $entryToParse->title;
      $entry->summary = $entryToParse->summary;
      $entry->content = $entryToParse->content;
	      
      $entry->created = $entryToParse->created;
      $entry->issued = $entryToParse->issued;
      $entry->modified = $entryToParse->modified;
	      
      $entry->link =  $entryToParse->link;
      $entry->id = $entryToParse->id;
		 		
      array_push($this->entries, $entry);
    }
  }

?>

This code can then be used as follows:

  $feed = new Feed($xmlcontent);
  $feed->parse();

  //print the title of each feed entry
  foreach ($feed->entries as $entry) {
    echo $entry->title;
  }

Where $xmlcontent is obtained through, say, a curl call:

  $ch = curl_init();

  curl_setopt($ch, CURLOPT_HTTPGET, 1);
  curl_setopt($ch, CURLOPT_URL,"http://www.dynamicobjects.com/d2r/index.xml");
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
  $xmlcontent = curl_exec ($ch);

  curl_close($ch);

All of which amounts to quite a lot of interesting functionality in a small set of easy-to-understand code.

Does it have limitations? Sure. Is it enough for many XML parsing jobs? Yes.

This is why, btw, PHP is so appropriate for building web stuff where there's so many user-driven elements and "point solutions" (taken to the extreme, one per page in a website).

Who needs reuse when (re)coding is so easy? :)

PS: I would like to read the following statement which I am making of my own free will and without being coerced in any (ouch!) way: Reuse is good, particularly with Java, and no one should ever say that it's not, lest they find themselves in the depths of hell, or possibly a Standards Body terminology & rules meeting. In summary: Java, good. PHP, bad. (Ruby, great.)

PS2: Those that can quote something else from the Simpsons episode which the previous PS is spoofing get extra credit.

PS3: Those who react to PS1 above without knowing the context of the Simpsons episode should refrain from commenting. There's a fine line between good and bad references, a fine line indeed, and while it's never hard to know when it's been crossed, I'd rather not be made aware of the distance traveled since.
Categories: soft.dev
Posted by diego on August 15, 2005 at 11:40 PM

Copyright © Diego Doval 2002-2011.