Sunday 6 September 2015

Sitecore Sitemap Part 1




In this blog I will talk about sitemap in sitecore and I will guide you for both creating a sitecore xml sitemap for SEO optimization and creating an html sitemap to be used in your  website.


In this part I will provide some clarification about the following as introduction to our following implementation:
  • Sitemap?
  • Robots.txt file
  • Sitecore and XML files security. 
  • Sitemap.xml structure and tags clarification  
  • What a good sitecore sitemap module should have.  


Sitemap?


At the beginning you need to know that search engines like Google, ping ... etc use the  sitemap xml file to better crawl your website, better crawling and indexing allow users to find your website pages better, this xml sitemap can only contain 50,000 URLs per file and is limited to 10MB in size; the question here is what if your xml sitemap file exceed that limit? Then you need to split your sitemap into more than one xml file and create an xml index file to gather these sub files together.
 

Robots.txt:



A Robots.txt is defined by Google as " Is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers. The file uses the Robots Exclusion Standard, which is a protocol with a small set of commands that can be used to indicate access to your site by section and by specific kinds of web crawlers (such as mobile crawlers vs desktop crawlers).



In our case here we will show how we can use this file to tell the search engine where to find the sitemap.xml file.

The above can be done using the following syntax:

Sitemap: {{ site.url }}/sitemap.xml



You can find more information about this file here.



Sitecore .xml files security



As you probably know sitecore prevent xml files access as security reason just like accessing the licesene.xml so you need to add extra configuration to handle this and it can be done by adding the following handler to allow accessing any xml file start with the name sitemap:



<add verb="GET" path="sitemap_*.xml" type="System.Web.StaticFileHandler"name="allow xml sitemap" />

Sitemap.xml structure and tags clarification



Now let's talk about the sitemap.xml file structure including the required tags and optional tags. and lets check the following as a sample for a one entry in a sitemap.xml file:





<url>
        <loc>http://www.MySite.com/AboutUs</loc>
        <lastmod> 2015-08-26T07:53:49+03:00</lastmod>
        <changefreq>daily</ changefreq>
        <priority>0.5</ priority>
</url>



As you can see from the above sample the following are the tags provided:

  1. loc : Which represents the absolute url for the page.(Required tag) 
  2. lastmod : Which represents the last modification data for that page ( Optional tag ) 
  3. changefreq: Which tells the search engine crawler how frequency this page is changed; which increase the chance of search engine crawler visits to this page ( Optional Tag) 
  4.  priority : Which tells the search engine how important is this page among others in your website ( Optional Tag ).  



What a good sitecore sitemap module should have



Sitecore Market place has many sitemap modules that help you configure this features with simple configuration steps but I didn't find a module cover all of the below:

  • Support multi-site.
  • Support multilingual. 
  • Support the all optional tags mentioned above. 
  • Support HTML sitemap component. 
  • Support allow/disallow specific items appearance within sitemap.xml or component. 
  • Support submitting sitemap into common search engines.



In part 2 of this blog I will provide you with a detailed steps for implementing the above features for a full sitemap functionality. 
 

No comments:

Post a Comment