How to reference a site in search engines

my ongoing notes on referencing my sites to search engines

06 Jul 2022 starting note.

Sitemap

Create a sitemap (guidelines: https://www.sitemaps.org/protocol.html):

Insert the following line anywhere in robots.txt, specifying the path to your sitemap:

Sitemap: https://example.com/sitemap_location.xml

robots.txt

20 Sep 2022

Hiding pages

Disallow: to list strings in URLs NOT to crawl.

On pages to hide from search engines, eg "Contact Us", add to HTML page:

<meta name="robots" content="noindex">

Adding to search engines

For each, create account, upload file with authentication key to root of server for verification, then submit /sitemap.xml URL:

Google

https://www.google.com/webmasters/tools/sitemap-list

Bing

https://www.bing/toolbox/webmaster

Submit changes via API with Python, to avoid relying on inconsistent organic crawling: https://brianli.com/submitting-changed-urls-to-bing-webmaster-tools-with-python/

Guidelines for webmasters:

Yandex

Russian search engine

https://webmaster.yandex.com/

Baidu

Chinese search engine

https://zhanzhang.baidu.com/

links

social