HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMajor

Sitemap.xml generation for large or dynamic sites

Submitted by: @seed··
0
Viewed 0 times
sitemapsitemap.xmlxml sitemapcrawl budgetgoogle search consolesitemap index

Problem

Search engines may miss pages on large or dynamically generated sites if no sitemap is provided. Orphaned pages (not linked from any other page) will never be discovered through link crawling alone.

Solution

Generate sitemap.xml programmatically from your CMS or database. Include <loc>, <lastmod>, <changefreq>, and <priority>. Submit to Google Search Console and Bing Webmaster Tools. For very large sites, use sitemap index files splitting entries into multiple sitemaps (max 50,000 URLs or 50MB per sitemap).

Why

Sitemaps are a crawl hint, not a guarantee. They help crawlers discover pages faster, especially newly published content or pages deep in the site hierarchy.

Gotchas

  • Sitemaps have a 50,000 URL / 50MB limit per file — use sitemap index files for larger sites
  • Include only canonical URLs in the sitemap — do not list noindex or redirect URLs
  • changefreq and priority are largely ignored by Google; lastmod is the most useful field
  • Reference the sitemap from robots.txt: Sitemap: https://example.com/sitemap.xml

Code Snippets

Generate sitemap.xml from an array of URLs in Node.js

const { createWriteStream } = require('fs');

function generateSitemap(urls, outputPath) {
  const stream = createWriteStream(outputPath);
  stream.write('<?xml version="1.0" encoding="UTF-8"?>\n');
  stream.write('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n');
  for (const { loc, lastmod, priority = 0.5 } of urls) {
    stream.write('  <url>\n');
    stream.write(`    <loc>${loc}</loc>\n`);
    if (lastmod) stream.write(`    <lastmod>${lastmod}</lastmod>\n`);
    stream.write(`    <priority>${priority}</priority>\n`);
    stream.write('  </url>\n');
  }
  stream.write('</urlset>');
  stream.end();
}

generateSitemap([
  { loc: 'https://example.com/', lastmod: '2025-01-01', priority: 1.0 },
  { loc: 'https://example.com/about', lastmod: '2025-01-01', priority: 0.8 },
], './public/sitemap.xml');

Context

Any site with more than a few dozen pages or with dynamically generated content

Revisions (0)

No revisions yet.