Web Services Librarian

Parsing the Libguides Blog XML

The Problem:

We’re in the middle of a web migration from Drupal to Libguides CMS. I have been able to switch over to Zurb Foundation for a mobile-first framework and really like the format my University uses to display blog posts. I was able to easily mock up how I want the information to be displayed.

However, the web things did not want to cooperate. First off, Springshare’s blog widget clearly does not follow the same design philosophy our team has envisioned for the new site. And customization options were scant.

After some deliberation and digging, I decided to go all in on retrieving and parsing XML data from the blog’s RSS feed and formatting the output using Javascript. Since I couldn’t find a clear, step-by-step guide on how to do this from my Google searches, I figured this little post might be useful to someone.

Or not, feel free to comment on how I should have done this or that. Note that while this project is in a usable place, the final code is subject to further refinement. Still, this should be enough to get started. Also, remember that while this post strives to present a nice, clean, organized way to complete the project in a single afternoon with your favorite cup o’ joe in hand, the learning process is always a whole lot messier…

The Process:

  1. Start with your strength. Mine is HTML/CSS. So I created the widget exactly as I wanted the final product to look. Insert all your classes, set up your styling, enter dummy (or real!) text, links, headings, buttons,… everything! And, of course, verify that it looks right in your final environment.
  2. Minify your HTML content. You’re going to use Javascript Code to append content and it can be temperamental with white space and other formatting schemas we use to read HTML easier. I like to use HTML Minifier to complete this process.
  3. Replace double-quotes (“) with single-quotes (‘). You should be able to do this pretty quickly in your text editor. I use Adobe Brackets and I can easily do a quick Find and Replace.
  4. Set up your XMLHttpRequest. It’s actually pretty standard and plain:
    
    var xhttp = new XMLHttpRequest();
    xhttp.onreadystatechange = function() {
    if (this.readyState == 4 && this.status == 200) {
    myFunction(this);
    }
    };
    xhttp.open("GET", "yourURL", true);
    xhttp.send();
  5. Set up your function where all it’s doing is appending your HTML to your
    . This is a template that will help keep you organized. This is a time saver for people like me, who make TONS of typing mistakes.Test the page and make sure you get the right result. It should be the exact same output since all you’re doing is appending the exact same HTML to the

    . No dynamic data is being fed to the page yet.

    document.getElementById("demo").innerHTML = "<some html>"
  6. Parse the XML data and save results in variables. Here are the tags I used to populate my variables:
    1. Title:The <title> tag in the Libguides RSS XML begins with the Title of your blog. You may choose to either feed this dynamically or hard code it into the HTML template. I decided to take the second route since I do not anticipate a need to dynamically feed this information. However, this does mean that the index begins at [1] for your latest post instead of the standard [0].
      var latesttitle = xmlDoc.getElementsByTagName("title")[1].childNodes[0].nodeValue;
      
    2. Summary:The <summary> tag is a 255 character summary of your post content. It is perfect for a snippet that leads to a “read more” call to action. Note that if you go for this option, you will need to fill out this field for every post.
      var latestsum = xmlDoc.getElementsByTagName("summary")[0].childNodes[0].nodeValue;
      
    3. Date:The <updated> tag provides a timestamp for the post. You can use the method substringData() to pick the part of this timestamp you would like to display. I selected a portion that would display the “MM-DD” posted to date.
      var latestdate = xmlDoc.getElementsByTagName("updated")[0].childNodes[0].substringData(5, 5);
      
    4. URL:The <link> tag contains an href attribute that provides the post URL. In order to access this data, you must declare which attribute you plan to access (even though there is only one). Again, since the whole XML Feed has already used this tag to share the blog’s main URL, the index must be moved up one to [1].
      var latesturl = xmlDoc.getElementsByTagName("link")[1].getAttribute('href');
      
    5. IMGThis is the tricky one. The Libguides Blog RSS XML does not provide a nice, clean tag for this piece of the puzzle. Instead, they include a catch-all <content> tag which includes a bunch of un-parseable HTML inside a <![CDATA[ … ]]> section. Essentially, I had to retrieve this tag and use the substringData() method to pick out the img src.
      
      var latestimg = xmlDoc.getElementsByTagName("content")[0].childNodes[0].substringData(20, 56);
      
  7. Check to make sure the variables are working. I set up a quick alert function to verify that everything was being saved right. This will not be part of the final code:
    alert("latesttitle" + "latestsum" + "latestdate" + "latesturl" + "latestimg");
  8. Now we can set up the HTML to receive this variable data and thus update with every new post. Anywhere you need dynamic information, break off the HTML with a double quote and insert your variable. For instance:
    ...href=" + latesturl + " target='_blank'>...

A Solution:

And now we have a working widget! Your HTML is just a <div> with the ID of your choosing. Your CSS is taking care of styling. And here is the JS code in its entirety. I have replaced my blog URL with “YOUR” and made sure the classes in the HTML were generic. You can drop all of this into a LibGuides box. Play around as you please!

https://pastebin.com/embed_iframe/17B6WtsU

More Development

Could this code be cleaner? You bet it could. As I mentioned earlier, my strength lies elsewhere and the process of creating this js solution has been akin to taking on a tennis ball shooter with my temples. Here are a couple things I’d like to see:

  1. Date formatting. I’m not entirely pleased with the way the date is formatted. I’d rather have the “MMM” abbreviation followed by the “DD” day. I’m sure the data can be taken and converted using more js. I just needed to get this into production.
  2. Some type of loop might have made this code shorter. The latest post is featured differently than the next two posts, so I figured it wasn’t too much work to just write it out. But I can see if your instance had more standardized formatting, you might see some value in spending time looping through the data to automate the project.
  3. New post writer’s guide. This is not about the code itself, but we will need to make sure new posts follow a specific format due to the nature of the code in order to make sure things display right.
    1. Every post must have a summary
    2. Every post must begin with an image
    3. That image must follow the naming convention so the substring captures the SRC correctly

 

%d bloggers like this: