patternjavascriptMajor
Sitemap.xml generation for large or dynamic sites
Viewed 0 times
sitemapsitemap.xmlxml sitemapcrawl budgetgoogle search consolesitemap index
Problem
Search engines may miss pages on large or dynamically generated sites if no sitemap is provided. Orphaned pages (not linked from any other page) will never be discovered through link crawling alone.
Solution
Generate sitemap.xml programmatically from your CMS or database. Include <loc>, <lastmod>, <changefreq>, and <priority>. Submit to Google Search Console and Bing Webmaster Tools. For very large sites, use sitemap index files splitting entries into multiple sitemaps (max 50,000 URLs or 50MB per sitemap).
Why
Sitemaps are a crawl hint, not a guarantee. They help crawlers discover pages faster, especially newly published content or pages deep in the site hierarchy.
Gotchas
- Sitemaps have a 50,000 URL / 50MB limit per file — use sitemap index files for larger sites
- Include only canonical URLs in the sitemap — do not list noindex or redirect URLs
- changefreq and priority are largely ignored by Google; lastmod is the most useful field
- Reference the sitemap from robots.txt: Sitemap: https://example.com/sitemap.xml
Code Snippets
Generate sitemap.xml from an array of URLs in Node.js
const { createWriteStream } = require('fs');
function generateSitemap(urls, outputPath) {
const stream = createWriteStream(outputPath);
stream.write('<?xml version="1.0" encoding="UTF-8"?>\n');
stream.write('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n');
for (const { loc, lastmod, priority = 0.5 } of urls) {
stream.write(' <url>\n');
stream.write(` <loc>${loc}</loc>\n`);
if (lastmod) stream.write(` <lastmod>${lastmod}</lastmod>\n`);
stream.write(` <priority>${priority}</priority>\n`);
stream.write(' </url>\n');
}
stream.write('</urlset>');
stream.end();
}
generateSitemap([
{ loc: 'https://example.com/', lastmod: '2025-01-01', priority: 1.0 },
{ loc: 'https://example.com/about', lastmod: '2025-01-01', priority: 0.8 },
], './public/sitemap.xml');Context
Any site with more than a few dozen pages or with dynamically generated content
Revisions (0)
No revisions yet.