I use PrestaSitemapBundle on some on my Symfony websites to automatically generate their sitemap.xml files. I recently stumbled upon an issue about these files showing up in Google search results while they should not.

I found that one should add a X-Robots-Tag header with a noindex value to each sitemaps response headers so that Google knows that this content should not be indexed. (Robots meta tag and X-Robots-Tag HTTP header specifications)

The problem here is that we cannot use a <Location> (or <LocationMatch>) rule for non-static resources (URLs that are handled by Symfony and not directly served from filesystem) since this rule would be treated by apache after the <Directory> rule in which the URL rewriting magic happens: at this point, all URLs look like /app.php.

The solution is to generate the sitemap files using PrestaSitemapBundle's dumper command:

  1. Let Symfony know the URL of the production environment by providing its host and scheme in parameters.yml:

    # parameters.yml
    router.request_context.host: www.my-website.fr
    router.request_context.scheme: https
    
  2. Add a rule to gitignore the generated files (this one will match both sitemap.xml and sitemap.default.xml)

    # .gitignore
    web/sitemap*.xml
    
  3. Be sure to regenerate the sitemaps on deployment. I'm using Simple-Deployment-Script so I just have to add a custom command:

    # deploy.json
    {
        # ...
        "commands": [
            "php bin/console presta:sitemap:dump"
        ]
    }
    

Now, all we have to do is to update apache's vhost and add this section:

<LocationMatch "/sitemap.*.xml">
    Header always set X-Robots-Tag "noindex"
</LocationMatch>

/sitemap.*.xml will match sitemap, sitemap.default.xml, sitemap-authors.xml, ...

Now let's reload apache and test this: X-Robots-Tag: noindex appears in the Response Headers list in Chrome's debugger Network tab, good job!