792

July 8th, 2024 × #SEO#Sitemaps#Web Development

Perfect Sitemaps for SEO

Wes and Scott discuss why you need a sitemap, what should be in it, and how to generate and submit it properly for SEO.

or
Topic 0 00:00

Transcript

Wes Bos

shout out to CJ for filling in there. But it feels good to be back in the horse

Topic 1 00:34

Wes is back from paternity leave

Wes Bos

I love that because you don't have to fight with the time zones either. You just, like, you just look at your phone. Where do I need to be right now? Although Wes I was there, I did I literally missed the boat.

Topic 2 03:21

Scott wonders why we need sitemaps and what should be in them nowadays

Wes Bos

Yeah. That's good. And, you've built the initial site map for the syntax website as Wes, and that was really nice because I've been going through not anymore, but probably over the last 6 months, I've been watching the Google Webmaster Tools trying to get our content indexed. Since we made, like, a pretty major shift in terms of, like, additional pages, there was a lot more content on the website from the old 1. It was kinda interesting to see, like, how do you tell Google, hey.

Wes Bos

There's now, what, an extra 1500 pages on this website.

Topic 3 05:30

Sitemaps don't help ranking but help crawlers find relevant content

Wes Bos

and understand, like, what the general structure of your application is without it having to guess. Right? Yeah. Yeah. You can't use it to trick Google into pages that are not linked from anywhere. Like, Google still has to be able to find that this is a page you're telling me about, but where have you linked from it? Right? Is it is it being linked from another website? Is it being internally linked from inside of the page? Like, for for us, it was the transcript page, which is Mhmm. It was a brand new page, and I wanted all of that to be indexed because it's a lot of lot of good information. And that's, like, very good for SEO if you're searching for a specific topic. In fact, I find that when I Google for a specific syntax episode, I'll often the transcript page will actually come up before the actual show notes page because the transcript page has literally every word we've we've spoken inside of it. Yeah. But, initially, I had a hard time getting those, like, indexed from Google. And, it was a mix of, like, how often is it updated, should I be crawling this page, and all the stuff we'll talk about today.

Topic 4 06:55

Plain text sitemap is just URLs, one per line

Wes Bos

And it does does sitemap have to be I might be getting ahead of us ourselves right now, but does a sitemap have to be named sitemap.xml? Or That's just a good question. Like, a meta tag that you can you can put?

Topic 5 07:23

Sitemap can be named anything, not just sitemap.xml

Wes Bos

That's really handy if you have for whatever reason, you don't have control over top level routes, because your application doesn't allow you to do that, it would be nice to to be able to do that. I probably would still try my darnest to make it sitemap.xml because it like robots Scott txt, it's a standard

Topic 6 08:54

XML sitemap is most flexible and allows more metadata

Wes Bos

Yeah. If you if you go and peruse I I do this a little bit myself to find unlisted URLs on websites.

Wes Bos

Like, my my wife was really excited about this dress coming out once, so I wrote a little scraper that would download the site map every so often, and it would the site map often lists even all the images that were uploaded to the website, all of the pages that are on the website. And, often, those pages are public, but they're not linked anywhere just yet. So it's kinda security bay by obscurity. So you can download the site map and, and see all of the pages of the website, and you can sort of peruse through that looking for unlisted pages.

Wes Bos

But often especially with, like, Shopify websites, you'll find Scott of like an index site map, and then it links off to tags site map and product site map and blog post site map. Each 1 has their own site map.

Topic 7 11:06

Last mod date is the only sitemap field search engines use now

Wes Bos

Priority, change frequency, and last modified.

Wes Bos

I would say, like, priority doesn't matter because the days of telling Google what's important are are are over. They can figure that out themselves.

Wes Bos

I'm gonna say frequency is is important because if you have a page that is frequently updated, that needs to be reindexed every hour or something that's like a blog post and you'll never update that again The answer is is that change frequency

Topic 8 12:45

We should update the Syntax sitemap fields

Wes Bos

are on the syntax 1. And you're telling us we you only need last mod?

Topic 9 13:25

Getting all Syntax content indexed recently got much easier

Wes Bos

the syntax website, and it's crazy looking at the webmaster tools both, like, over the last 6 months, a year, getting all of the pages indexed and finally to a point where Google knows about every single page. Because, like, even when we migrated, there was a point where, like, you couldn't find specific episodes on Google. Like, it was not finding at all, so we had to really work at that. But the Google changed their algorithm recently, and I I posted a tweet about this. We've mentioned it on the last episode as well.

Wes Bos

The amount that we're showing up on search results JS just we went right up with that algorithm change. So you Sanity.

Wes Bos

We're not even doing the best practice here, and Google's, like, obviously showing our stuff a lot more frequently,

Wes Bos

URLs on the syntax website is we have forward slash shows, and that needs to be indexed.

Topic 10 14:56

Parameters and future/unpublished pages should not be in the sitemap

Wes Bos

And then we also have forward slash shows and type equals hasty, tasty, or supper. That needs to be indexed.

Wes Bos

But the pages of every single 1 of them, like page 1, page 2, etcetera, those don't need to be indexed because well, no. The pages do need to be indexed, but the some of the search filters do not need to be indexed. And I remember I had to write a very complex thing to sort of figure out what the canonical URL was because there's unlimited combinations of the query params of, like, pages, how many per page. That was the other 1. And a couple other filters that, like, there's there's unlimited. And if you go into the Google Webmaster Tools, it says something like 6, 000 pages are not being indexed Yeah. Because you told us not to.

Wes Bos

And I was like, good. Like, those yeah. We don't want you to index the page 4 of 15 per page.

Wes Bos

things that you have, like, being blocked in your robots Scott txt. I got 1 more here, and this JS, a problem we had is these shows.

Topic 11 16:17

Only published, non-redirect pages should be included

Wes Bos

The basically, the way that we create our site map is we just query the database for all the shows, and we query the database for all the guests. And, basically, anything that's a page, we just query it and and use a function to generate the URL for it. Right? But in that case, we were we forgot to filter out for future shows, and it was telling Google, hey. There's a page here.

Wes Bos

And then Google would go to that URL, and it would find this page is coming soon.

Wes Bos

And that was a bit of an issue because when it was published, then Google would would not know about the content until it eventually crawled that page again. So we had to filter that out and say,

Topic 12 18:49

Hand-writing a large sitemap takes too much effort

Wes Bos

Yeah. Like, they they have this concept of pages. If it's, like, totally from scratch, like your personal website or, the Syntax website where, like, there's no concept of a page, right, You can just like Scott said, you can concatenate a string and throw it out the door. I would probably keep a array of pages and just store them as, like, objects and then grab some sort of, like, JSON to XML plug in off NPM and then convert it out the out the door.

Topic 13 19:37

Store sitemap pages as data objects first before outputting as XML

Wes Bos

Sitemaps are pretty simple, so I don't know if that's if that's overkill versus just concatenating a string or not. But when it comes to, like, oh, did I already add this 1? You Node, does this URL exist previously? Well, let me search for it in the array.

Wes Bos

If that's the case, then it's it's sometimes nicer to to deal with, like, a actual data object first. And then right before you kick it out the door, convert it to XML because that sucks working with XML.

Topic 14 20:44

Submit sitemap to Bing and Google webmaster tools

Wes Bos

I was just looking at our search console, and it says discovered videos.

Wes Bos

That's probably worth doing. I always I often wonder that. You Node, like, you go to the video tab of Google search, how to get your video to show up on that tab. I think I thought it was a mix of, like, the proper XML or or that what's that? LD JSON? Yeah. JSON LD, which is for linking data. That's used sort of like a meta tag. But instead of putting it in the head of the document, you simply just dump the JSON into the body, and Google will pick it up there. But it looks like you can also there's also specific video tags for sitemap.xml, which will tell Google about videos, which is neat.

Wes Bos

Cool. Yeah.

Wes Bos

1 more tip I have here is cache them. Your sitemap can be 1 of the largest files that is accessible to your website. And if they are generated on demand, that can be very taxing on your database. Yes. If it's it's literally querying every single record in your database in a lot of cases and looping over it or at least pages. And and then that file itself is is fairly large because it's it's all text. Right? And it's possibly an attack vector against your bill, both your database bill as well as your if you're if you're using something like, a render or a Vercel to to generate the site map .XML and you don't have the proper caching headers on those, then that could be somewhere where somebody could just continually hit it, and it will it will cause a very large bandwidth bill on your end. So throwing caching headers, putting a CDN in front of it, probably a good idea.

Topic 15 23:10

Cache sitemaps to avoid heavy DB and bandwidth loads

Wes Bos

Grab a t shirt. Century.shop.

Wes Bos

Peace.

Share

Play / pause the audio
Minimize / expand the player
Mute / unmute the audio
Seek backward 30 seconds
Seek forward 30 seconds
Increase playback rate
Decrease playback rate
Show / hide this window