TWW - PHP/XML question

go to bottom

Message Boards » » PHP/XML question

Page [1]

quagmire02
All American
44225 Posts
user info
edit post

okay, i'm new to parsing/evaluating XML using PHP and this is serving as one of my tests for myself...basically, it's a small site map, outlined in an XML file called sitemap.xml:

<?xml version="1.0" encoding="utf-8"?>
<toc>
<links>
    <link section="about products clients"><a href="/about/">About</a></link>
    <link section="about contact products clients">Products
    <subsection>
        <link section="about contact products clients"><a href="/prod1/">Product 1</a></link>
        <link section="about contact products clients"><a href="/prod2/">Product 2</a></link>
        <link section="about contact products clients"><a href="/prod3/">Product 3</a></link>
    </subsection></link>
    <link section="products clients"><a href="/clients/">Client Login</a></link>
</links>
</toc>

that's not the real site map (because the real one's much longer and has nothing to do with products or clients or anything), but you get the idea...anyway, it's loaded in index.php:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Site Map Test</title>
</head>
<body>

<!-- begin navigation -->
<ul>
    <li><a href="index.php?p=about">About Us</a></li>
    <li><a href="index.php?p=clients">Information for Clients</a></li>
    <li><a href="index.php?p=contact">Contact Information</a></li>
    <li><a href="index.php?p=products">Product Information</a></li>
    <li><a href="index.php">Home</a></li>
</ul>
<!-- end navigation --->

<!-- begin content -->
<?php

// initializes new XML DOM document
$xmldoc = new DOMDocument();

// if the XML fails to load, displays an error
if(!$xmldoc->load("sitemap.xml")) {
    die("Failed to load XML file.");
}

// sets different section display criteria
if (isset($_GET['p'])) $PAGE = $_GET['p'];
switch ($PAGE) {
    case "about":
        $title = "About Us";
        $section = "about";
        break;
    case "clients":
        $title = "Information for Clients";
        $section = "clients";
        break;
    case "contact":
        $title = "Contact Information";
        $section = "contact";
        break;
    case "products":
        $title = "Product Information";
        $section = "products";
        break;
    default:
        $title = "Full Site Content";
        $section = "";
        break;
}

// if the section isn't blank (all but full contents)
// searches XML document for all occurrences of section attribute
if ($section != "") {
    foreach($xmldoc->getElementsByTagName('link') as $element) {
        if(!$element->hasAttribute('section') || strpos($element->getAttribute('section'),$section) == false) {
            $element->parentNode->removeChild($element);
        }
    }
}

// converts the filtered XML into a string
$filtered = $xmldoc->saveXML();

// replaces XML tags with HTML tags for display purposes
$filtered = str_replace("links","ul class=\"toc\"",$filtered);
$filtered = str_replace("link","li",$filtered);
$filtered = str_replace("subsection","ul",$filtered);
$filtered = str_replace("/link","/li",$filtered);
$filtered = str_replace("/subsection","/ul",$filtered);
$filtered = preg_replace("/ section=\"[^\"]*\"/","",$filtered);

// displays the title of and the filtered results
echo "<h1>".$title."</h1>".$filtered;

?>
<!-- end content -->

</body>
</html>

i don't get any errors when i open the index.php page...in fact, it shows the full site content just fine...but if i start selecting the links, while i don't get any errors, i don't get the correct displays...for example, if i select "about us", everything EXCEPT the "client login" link should show up (because they're all tagged that way), but that's not what happens...i get "products," "products 2," and "client login"

if you copy and paste those two snippets of code into their own files (sitemap.xml and index.php) and put them in their own folder, you can try it out for yourself...i'm sure i'm missing something crucial in my XML parsing, and it may be something VERY stupid on my part, but i don't see it

[Edited on January 28, 2009 at 9:47 AM. Reason : formatting]

1/28/2009 9:46:35 AM

evan
All American
27701 Posts
user info
edit post

first of all instead of using strpos i would explode the section attribute by whitespace, then use in_array() in your logic gate

i'm looking at the rest now.

1/28/2009 10:22:23 AM

quagmire02
All American
44225 Posts
user info
edit post

^ that's a good point...i probably should explode it, instead

but no, it's still not working...where you posted it, if you click on "about us", you get:

About Us
  - Products
      - Product 2
  - Client Login

but you SHOULD get (based on the tagging):

About Us
  - About
  - Products
      - Product 1
      - Product 2
      - Product 3

"client login" isn't tagged as "about", so it shouldn't up...conversely, both "about" and "product 1" and "product 3" SHOULD show up, but they don't

it might be because i'm using strpos(), though...i'm thinking that's the issue, since it seems to skip, but i don't immediately understand WHY...hmmm...

[Edited on January 28, 2009 at 10:29 AM. Reason : ah well, i'll keep this up just for reference

]

1/28/2009 10:27:47 AM

evan
All American
27701 Posts
user info
edit post

yeah

i think i see it now. the DOM parser parses EVERY tag it sees, not just your XML. it's picking up your <a> tags inside the link tags and removing them (via the removeChild in the if/then because they don't pass the !$element->hasAttribute('section') test).

also, santitize your superglobals before you use them, son exec() can do fun things.

1/28/2009 10:38:48 AM

quagmire02
All American
44225 Posts
user info
edit post

Quote :
"it's picking up your <a> tags inside the link tags and removing them (via the removeChild in the if/then because they don't pass the !$element->hasAttribute('section') test)."

i don't think so...the output is showing the <a> tags intact (because $element is set as 'link' only), yes?

Quote :
"also, santitize your superglobals before you use them, son exec() can do fun things."

QFT...i have a sanitizing function that's run on the superglobal inputs...i just didn't include it in here

[Edited on January 28, 2009 at 10:55 AM. Reason : .]

1/28/2009 10:53:40 AM

scud
All American
10804 Posts
user info
edit post

you have markup inside of markup and that's just a no-no. If you use a CDATA block you can tell the parser not to treat what's inside as parsed data

Instead of:
<link section="about contact products clients"><a href="/prod1/">Product 1</a></link>

Consider:
<link section="about contact products clients"><[CDATA[<a href="/prod1/">Product 1</a>]]></link>

1/28/2009 12:04:04 PM

BigMan157
no u
103381 Posts
user info
edit post

for

strpos($element->getAttribute('section'),$section) == false

use === instead of ==

you're getting a return position of 0, which the double equal doesn't differentiate from false but the triple equal does

this should get you partway there

// if the section isn't blank (all but full contents)
// searches XML document for all occurrences of section attribute
if ($section != "") {
    foreach($xmldoc->getElementsByTagName('link') as $element) {
        if(strpos($element->getAttribute('section'),$section)===false) {
            $element->parentNode->removeChild($element);
        }
    }
}

[Edited on January 28, 2009 at 12:21 PM. Reason : might as well just post it]

[Edited on January 28, 2009 at 12:21 PM. Reason : damn user tags

]

1/28/2009 12:19:41 PM

quagmire02
All American
44225 Posts
user info
edit post

^^ doing that gives me errors:

Quote :
"Warning: DOMDocument::load() [domdocument.load]: StartTag: invalid element name"

^ why do you say "partway"? that seems to do the trick

1/28/2009 12:25:22 PM

BigMan157
no u
103381 Posts
user info
edit post

leftover from various edits

1/28/2009 12:36:17 PM

quagmire02
All American
44225 Posts
user info
edit post

ah, in that case...many thanks to everyone's help...it's working splendidly, now...i don't think i'd ever have caught the necessity of the third =

1/28/2009 12:37:18 PM

evan
All American
27701 Posts
user info
edit post

oh, hah, yeah, that's why i hate strpos.

=== is explicit equal, so 0 !== false. it matches on both value and type, so bools can't equal ints.
== is just plain equal, so 0 == false. it doesn't give a crap about types.

strpos returns false if the string wasn't found anywhere, but 0 if it's the first character, which is the case for your "about" stuff.

1/28/2009 12:58:54 PM

Noen
All American
31346 Posts
user info
edit post

I know you are just learning this, but this is an incredibly bad way to be parsing XML.

I would very very highly recommend learning to use the XML parser built into PHP5+ http://www.php.net/xml

It's a royal pain in the ass to learn and setup for small parsing activities like you are doing here, but in the long run if you plan on doing anything real with XML it will quickly save you a ton of time and headaches in the long run.

1/28/2009 2:00:03 PM

quagmire02
All American
44225 Posts
user info
edit post

^ i don't mind suggestions as to better ways to do things...i'm just curious as to the reasons behind the suggestion...what's bad about the way i'm parsing it? a lot of overhead? messy?

is simplexml just a subset of the xml parser, or are they separate?

i don't plan on doing much with xml (at least, i don't have much cause to, right now)...really, i was bored at work and thought that a flat xml file would serve the purpose of a basic sitemap pretty easily and so i figured i'd screw around

[Edited on January 28, 2009 at 3:48 PM. Reason : .]

1/28/2009 3:46:36 PM

evan
All American
27701 Posts
user info
edit post

traversing the DOM tree gets ugly in a hurry when you have even a moderately complex document. it's not fun at all.

simplexml is just another extension, like libxml or the xml parser or any of the other XML extensions.
http://us3.php.net/manual/en/refs.xml.php

1/28/2009 4:17:33 PM

Noen
All American
31346 Posts
user info
edit post

^hit the nail on the head. Handling simple structures is pretty easy to code-your-own, but it gets very unpleasant quickly when you start flexing your xml muscles.

And like so many things in PHP, it's worth learning how the parser works to understand the basics, and then go find an extension library to obfuscate the calls and make life easy on you. I learned this lesson the hard way back when php5 first hit, trying to write my own full parser implementation. The deeper I got into it, the more I kept having to refactor the code to get its functionality expanded.

I ended up using libxml + a few modifications and it made life a lot more fun

1/28/2009 5:52:31 PM

quagmire02
All American
44225 Posts
user info
edit post

so, in the collective opinion of those who know more than me...simplexml or xml parser?

also, the suggested code for this:

Quote :
"you have markup inside of markup and that's just a no-no. If you use a CDATA block you can tell the parser not to treat what's inside as parsed data"

didn't work...anyone have suggestions as to what i should put there? i mean, i wasn't aware that you couldn't have markup within markup, but if that's the case, how should i structure it?

i'm not denying that i should learn the xml parser, for future reference, but are there any suggestions as to how i should restructure/recode the xml document itself? this is purely for my own edification...if i'm not following the standards (in that, technically, my coding/structure is wrong or breaks the rules), please let me know what i should change

thxu, all

[Edited on January 29, 2009 at 9:13 AM. Reason : .]

1/29/2009 8:59:31 AM

evan
All American
27701 Posts
user info
edit post

he was mainly talking about how you have the 'a' tags within 'link' tags without explicitly stating that those are not part of your xml markup but are content.

and also how you have text data and xml markup within the same tag (the 'link' tag with Products at the end of it, immediately followed by the 'subsection' tag)

neither of those are correct xml basically, only put one type of data within a tag. if it's more xml markup, fine. if it's html, stick it in a cdata block.

and personally i like the xml parser, it's more flexible

1/29/2009 9:22:56 AM

qntmfred
retired
42449 Posts
user info
edit post

Quote :
"xml parser"

1/29/2009 9:56:10 AM

A Tanzarian
drip drip boom
10996 Posts
user info
edit post

When you guys (who do this professionally) are outlining an XML document, what do you take into consideration when deciding if information should be included as an attribute or as element content?

1/31/2009 1:11:19 PM

Message Boards » Tech Talk » PHP/XML question

Page [1]

go to top

Admin Options : move topic | lock topic