06 Jul 2022 starting note.
Sitemap
Create a sitemap (guidelines: https://www.sitemaps.org/protocol.html):
- free Sitemap generator: https://www.xml-sitemaps.com/
- or Python (see NN: Sitemap Generator with Python)
- or with Pelican, this plug-in is available: https://github.com/pelican-plugins/sitemap (to test)
Insert the following line anywhere in robots.txt
, specifying the path to your sitemap:
Sitemap: https://example.com/sitemap_location.xml
robots.txt
20 Sep 2022
Hiding pages
Disallow:
to list strings in URLs NOT to crawl.
On pages to hide from search engines, eg "Contact Us", add to HTML page:
<meta name="robots" content="noindex">
Adding to search engines
For each, create account, upload file with authentication key to root of server for verification, then submit /sitemap.xml
URL:
https://www.google.com/webmasters/tools/sitemap-list
Bing
https://www.bing/toolbox/webmaster
Submit changes via API with Python, to avoid relying on inconsistent organic crawling: https://brianli.com/submitting-changed-urls-to-bing-webmaster-tools-with-python/
Guidelines for webmasters:
Yandex
Russian search engine
Baidu
Chinese search engine