sitemap.xml and robots.txt

sitemap.xml and robots.txt examples and urls configuration.

urls.py:

from django.views.generic.simple import direct_to_template


urlpatterns += patterns('',
    (r'^robots\.txt$', direct_to_template,
        {'template': 'robots.txt', 'mimetype': 'text/plain'}),
    (r'^sitemap\.xml$', direct_to_template,
        {'template': 'sitemap.txt', 'mimetype': 'text/xml'}),
)

Or use TemplateView for Django version above 1.4:

from django.views.generic import TemplateView


urlpatterns += patterns('',
    (r'^robots\.txt$', TemplateView.as_view(
    template_name='robots.txt', content_type='text/plain')),
    (r'^sitemap\.xml$', TemplateView.as_view(
        template_name='sitemap.xml', content_type='text/xml')),
)

templates/sitemap.xml:

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <url>
        <loc>http://mysite.com/somepage/</loc>
        <lastmod>2013-01-01</lastmod>
        <changefreq>weekly</changefreq>
        <priority>1.00</priority>
    </url>
</urlset>

templates/robots.txt:

User-agent: Yandex
Disallow: /admin
Disallow: /static
Disallow: /media
Host: mysite.com

User-agent: Goolebot
Disallow: /admin
Disallow: /static
Disallow: /media

User-agent: *
Crawl-delay: 30
Disallow: /admin
Disallow: /static
Disallow: /media

Note, You should extend robots.txt by urls You don't wan't to be indexed by search crawlers.
Opposite to robots.txt, sitemap.xml should contains urls of pages You want search engines knows about.

Links:

Licensed under CC BY-SA 3.0