Most IT and business decision makers are familiar with the practice of email archiving—the indexing, storage and retrieval of email content for purposes of eDiscovery, regulatory compliance, internal investigations, maintenance of business records, etc. Roughly one-half of mid-sized and large organizations retain their email in a true archiving system, fewer retain files in SharePoint or other repositories, but very few retain Web sites or Web pages.
I believe that’s a practice that needs to change. Retaining Web pages and entire Web sites should be a best practice for any organization because of the importance of demonstrating if content was (or was not) available on the Web and when that content was (or was not) available.
The Wayback Machine has been a pioneer in the archival of Web content, archiving Web sites on a somewhat regular schedule for years. Reviewing old Web content in the Wayback Machine yields some fascinating results. From a business perspective, however, there are a number of good reasons to archive Web content. For example:
- FINRA Rule 2210(b)(2) requires that “Members must maintain all advertisements, sales literature, and independently prepared reprints in a separate file for a period of three years from the date of last use.” Under Rule 2210, “a firm that co-brands any part of a third-party site, such as by placing the firm’s logo prominently on the site, is responsible for the content of the entire site.”
- The Government of Canada has published Guidance on Implementing the Standard on Web Accessibility, which requires Canadian government entities to “define default timelines for review, retention and disposal of Web pages (including Web applications).”
- FRCP Rule 26 requires that expert witnesses whose testimony is introduced during legal proceedings offer “the witnesses’ qualifications, including a list of all publications authored in the previous 10 years.” Since a large proportion of many experts’ publications are blog posts or other Web-based content, it is increasingly important for an archive of these works to be available to all parties during a legal proceeding.
- Web archiving can be useful in capturing content from websites and then searching that content for potential violations of copyright or trademark infringement.
- A Web archiving solution can be useful when researching various types of competitive messages as part of promotional campaigns. For example, a hotel chain might wish to archive the content of its three leading competitors’ websites to determine when specific messages were posted to the Web and when they were taken down. This data can then be correlated with sales information, marketing reports and other research to determine which messages were more or less effective.
- Website archiving can be helpful in demonstrating competitors’ registration of a domain name for nefarious purposes. For example, Innervision Web Solutions’ used the domain name “DellComputersSuck.com”. Dell contended that Innervision had used this domain name in order to redirect visitors to the Innervision Web site for commercial gain. Dell was able to prove this contention based on Web content that they had archived, they proved that Innervision had registered the domain in bad faith, and Dell was able to have this domain transferred to their ownership.
- These are just a few examples of the various use cases for Web archiving, although there are many more. There are a handful of Web archiving solutions currently available, offered by Smarsh, Reed Technology, Iterasi, PageFreezer, Hanzo and some other vendors, but we anticipate that the market for these solutions—and the number of vendors offering them—will grow substantially in the near future.