proxy.golang.org : github.com/itmayziii/robotstxt
Package robotstxt implements the Robots Exclusion Protocol, https://en.wikipedia.org/wiki/Robots_exclusion_standard, with a simple API. A large portion of how this package handles the specification comes from https://developers.google.com/search/reference/robots_txt. In fact this package tests against all of the examples listed at https://developers.google.com/search/reference/robots_txt#url-matching-based-on-path-values plus many more. 1. User Agents are case insensitive so "googlebot" and "Googlebot" are the same thing. 2. Directive "Allow" and "Disallow" values are case sensitive so "/pricing" and "/Pricing" are not the same thing. 3. The entire file must be valid UTF-8 encoded, this package will return an error if that is not the case. 4. The most specific user agent wins. 5. Allow and disallow directives also respect the one that is most specific and in the event of a tie the allow directive will win. 6. Directives listed in the robots.txt file apply only to a host, protocol, and port number, https://developers.google.com/search/reference/robots_txt#file-location--range-of-validity. This package validates the host, protocol, and port number every time it is asked if a robot "CanCrawl" a path and the path contains the host, protocol, and port.
Registry
-
Source
- Documentation
- JSON
purl: pkg:golang/github.com/itmayziii/robotstxt
License: MIT
Latest release: about 6 years ago
First release: about 6 years ago
Namespace: github.com/itmayziii
Stars: 2 on GitHub
Forks: 0 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 25 days ago