sitemap.xml and robots.txt
sitemap.xml and robots.txt examples and urls configuration.
urls.py:
from django.views.generic.simple import direct_to_template urlpatterns += patterns('', (r'^robots\.txt$', direct_to_template, {'template': 'robots.txt', 'mimetype': 'text/plain'}), (r'^sitemap\.xml$', direct_to_template, {'template': 'sitemap.txt', 'mimetype': 'text/xml'}), )
Or use TemplateView for Django version above 1.4:
from django.views.generic import TemplateView urlpatterns += patterns('', (r'^robots\.txt$', TemplateView.as_view( template_name='robots.txt', content_type='text/plain')), (r'^sitemap\.xml$', TemplateView.as_view( template_name='sitemap.xml', content_type='text/xml')), )
templates/sitemap.xml:
<?xml version='1.0' encoding='UTF-8'?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>http://mysite.com/somepage/</loc> <lastmod>2013-01-01</lastmod> <changefreq>weekly</changefreq> <priority>1.00</priority> </url> </urlset>
templates/robots.txt:
User-agent: Yandex Disallow: /admin Disallow: /static Disallow: /media Host: mysite.com User-agent: Goolebot Disallow: /admin Disallow: /static Disallow: /media User-agent: * Crawl-delay: 30 Disallow: /admin Disallow: /static Disallow: /media
Note, You should extend robots.txt by urls You don't wan't to be indexed by search crawlers.
Opposite to robots.txt, sitemap.xml should contains urls of pages You want search engines knows about.
Links:
- http://fredericiana.com/2010/06/09/three-ways-to-add-a-robots-txt-to-your-django-project/
- http://www.wordsinarow.com/xml-sitemaps.html
Licensed under CC BY-SA 3.0