Nonprofit Common Crawl Offers a Database of the Entire Web, For Free, and Could Open Up Google to New Competition | MIT Technology Review

Common Crawl supplies a database of over five billion Web pages in the hope that it will inspire new research or online services.

Google famously started out as little more than a more efficient algorithm for ranking Web pages. But the company also built its success on crawling the Web—using software that visits every page in order to build up a vast index of online content.

A nonprofit called Common Crawl is now using its own Web crawler and making a giant copy of the Web that it makes accessible to anyone. The organization offers up over five billion Web pages, available for free so that researchers and entrepreneurs can try things otherwise possible only for those with access to resources on the scale of Google’s.