Node.JS Programming
June 15, 2022
Create an XML Sitemap with NodeJS and Express
by: Underminer

XML marks the spot


What is a sitemap

In simple terms a sitemap is a master index of the pages on your website. More specifically when referring to an xml map, it’s an index of pages you want search engines to be aware of and crawl.

 

Why do i need a sitemap?

Having an xml sitemap allows you to ensure that search engines like google know about all the pages you want visitors to be able to search for. Likewise, you can specify modification dates and how often a page is likely to be updated so that search engines and other automated tools have up to date information about your site.

 

Cool. How do we get started?

If you’re mostly concerned about search engine indexing, then at it’s core the sitemap is a relatively simple xml document that you could throw together without too terribly much trouble if you were so inclined. If your main goal is just to get it up and functional though, I do recommend using the ‘sitemap’ package from npm, and that’s what we’ll be using for this quick walkthrough. If you do wish to roll your own solution then make adjustments as needed.

Install with ‘npm install sitemap’

Next we create a module file to generate a fresh sitemap whenever we need it. Here’s an example.

- db/sitemaps.js

require('dotenv').config();
const pgdb = require('./postgres');
const sitetld = process.env.sitetld
const sitename = process.env.sitename
const { SitemapStream, streamToPromise } = require( 'sitemap' )
const { Readable } = require( 'stream' )

 

async function preppostlinks(){
    console.log("Prepping links")
    var fplinks = []
    var fpposts = await pgdb.readquery("SELECT * FROM frontpageposts WHERE active = true ORDER BY published_at DESC");
    for await (post of fpposts){
        if (post.updated_at){
            var lastmodified = post.updated_at
        }else{
            var lastmodified = post.published_at
        }
        fplinks.push({
            url: post.articlelink,
            title: post.title,
            publication_date: post.published_at,
            lastmod: lastmodified,
            changefreq: 'weekly',
            priority: 0.6
        })
    }
    return fplinks
}


async function createsitemap(){
    var links = []
    var postlinks = await preppostlinks()
    links.push({
        url: '/',
        title: sitename + " Front Page",
        changefreq: 'daily',
        priority: 0.7
    })
    links.push({
        url: '/articles',
        title: sitename + " Article Categories",
        changefreq: 'weekly',
        priority: 0.2
    })
    links.push({
        url: '/about',
        title: "About " + sitename,
        lastmod: aboutmod,
        changefreq: 'monthly',
        priority: 0.5
    })
    links.push({
        url: '/privacy',
        title: sitename + " Privacy Policy",
        lastmod: privmod,
        changefreq: 'monthly',
        priority: 0.5
    })
    if (postlinks){
        for await (plink of postlinks){
            links.push(plink)
        }        
    }

    const stream = new SitemapStream( { hostname: sitetld } )

    // Return your data as a promise result
    return streamToPromise(Readable.from(links).pipe(stream)).then((data) =>
        data.toString()
    )
}

module.exports = {
    createsitemap: () => createsitemap()
}

 

If you have more or different links then your list generation procedure will neccessarily differ. The important comonents of the entries from a search engine perspective are the url (where to crawl), the modification time of the url so that the robot knows immediately if something has changed, the expected frequency with which the page is typically expected to change, and the priority of the page on a scale of 0-1.

Now that we have the module created, we're going to create a separate node file that we can call as a timed event to write out both the xml file, and create/update a robots.txt file to tell robots where to find our sitemap.

 

- timedevents/sitemapper.js

const sitemaps = require('../db/sitemaps');
const fs = require('fs');
const { resolve } = require('path');
const sitetld = process.env.sitetld

async function makerobotstxt(){
    var robottxt = "User-agent: *"
    robottxt += "\n\n"
    robottxt += "allow: /"
    robottxt += "\n"
    robottxt += "Sitemap: " + sitetld + "/sitemap.xml"
    try{
        console.log("Writing robot txt file")
        fs.writeFileSync(resolve('./public/robots.txt'), robottxt);
        console.log("robot.txt file created successfully")
    }catch(error){
        console.log("Error writing file \n" + error)
    }
}


async function writemaptofile(sitemapxml){    
    try{
        console.log("Writing to file")
        fs.writeFileSync(resolve('./public/sitemap.xml'), sitemapxml);
        console.log("XML file created successfully")
    }catch(error){
        console.log("Error writing file \n" + error)
    }
    
    //sitemapxml.pipe(createWriteStream())
}

async function writeall(){
    var sitemapxml = await sitemaps.createsitemap();
    await writemaptofile(sitemapxml);
    await makerobotstxt();
}

writeall()

 

Once you have that done, create a script you can call from your crontab or call node on the file directly from the crontab at a frequency in line with how often you expect changes to the pages on your site. At that point, test to ensure you can retrieve the file from your server and either wait for indexing, or submit the sitemap url to google's search console. 

Congratulations! You have a working machine readable sitemap. Now search engines should be able to better understand the structure of your site and what to find where. Now go back to creating content to be found!