Here’s a rundown of the 27 signals we use:

  1. Low number of pages found Our crawlers discovered only a small number of pages on this domain. This is not an inherent problem, but many spam sites have small numbers of pages, hence the correlation.
  2. TLD correlated with spam domains This domain's top-level domain extension (e.g. .info, .cc, .pl, etc) is one that many spam domains use.
  3. Domain name length The length of the subdomain and root domain is similar to those used by spam sites.
  4. Domain name contains numerals Like many spam sites, this domain name contains numeric characters.
  5. Google Font API Present This domain does not use special fonts (e.g. Google Font API). Lacking this feature was common among spam sites we found.
  6. Google Tag Manager Google Tag Manager is almost never present on spam sites.
  7. Doubleclick Present The Doubleclick ad tag is almost never present on spam sites.
  8. Phone Number Present Spam sites rarely have real phone numbers present on their pages.
  9. Links to LinkedIn Almost no spam sites have an associated LinkedIn page, hence lacking this feature is correlated with spam.
  10. Email Address Present Email addresses are almost never present on spam sites.
  11. Defaults to HTTPS Few spam sites invest in SSL certificates; HTTPS is often a good trust signal.
  12. Use of Meta Keywords Pages that use the meta keywords tag are more likely to be spam than those that don't.
  13. Visit Rank Websites with very few visits in clickstream panels were more often spam than those with high numbers of visits.
  14. Rel Canonical Utilizing a non-local rel=canonical tag is often associated with spam.
  15. Length of Title Element Pages with very long or very short titles are correlated with spam sites.
  16. Length of Meta Description Pages with very long or very short meta description tags are correlated with spam sites.
  17. Length of Meta Keywords Pages with very long meta keywords tags are often found on spam sites.
  18. Browser Icon Spam sites rarely use a favicon; non-spam sites often do.
  19. Facebook Pixel The Facebook tracking pixel is almost never present on spam sites.
  20. Number of External Outlinks Spam sites are more likely to have abnormally high or low external outlinks.
  21. Number of Domains Linked-To Spam sites are more likely to have abnormally high or low unique domains to which they link.
  22. Ratio of External Links to Content Spam sites are more likely to have abnormal ratios of links to content.
  23. Vowels/Consonants in Domain Name Spam sites often have many sequential vowels or consonants in their domain name.
  24. Hyphens in Domain Name Spam sites are more likely to use multiple hyphens in their domain name.
  25. URL Length Spam pages often have abnormally short or long URL path lengths.
  26. Presence of Poison Words Spam sites often employ specific words that are associated with webspam topics like pharmaceuticals, adult content, gaming, and others.
  27. Uses High CPC Anchor Text Spam sites often employ specific words in the anchor text of outlinks that are associated with webspam topics like pharmaceuticals, adult content, gaming, and others.
Top crossmenuchevron-down