Missing and broken links happen in our courses more often than we would like. They are tough to discover, tough to fix, and frustrating for students in the middle of a course. Seriously, they are pretty bad.

To paraphrase the 1946 film It’s a Wonderful Life (and paraphrasing that film is our ideal standard mode of discourse), a missing webpage leaves an awfully big hole.

Fortunately, there is something to be done: with a tool called the WayBack Machine over at the Internet Archive, you can find cached (saved) versions of most webpages, as long as you know the URL of what you are trying to find.

Here is what to do:

  1. Visit archive.org/web
  2. Paste or type in the URL of the page your are looking for
  3. Select from the available versions of the page presented to you
  4. Spend hours and hours exploring the Internet Archive (optional)

That is all there is to it. Appropriately, given our film tribute above, here is a video that lays it all out in moving pictures:

4 thoughts on “Filling in the blanks with the WayBack Machine

  1. I’ve recommended the WayBack Machine many times in answer to queries here about broken links. Although not always a perfect solution, it works in enough most of the time.

    Saylor have inadvertently made the use of this resource more difficult by the practice of removing known bad links (and replacing them with a message that the resource was not available). As I suggested in the forums and via email a long time ago, it would have been more helpful to leave the old, broken, link in place with a note that it was currently unavailable–thus giving a better starting point for those with the initiative to search out the missing resources.

    1. I have been tripped up myself, in the past, by the missing URLs. The main reason is to prevent additional reports (people are remarkably capable of skipping past warning text) and also, to a lesser extent, SEO sorts of concerns.

      In practice, having unlinked text plus a warning message still generates new broken link reports. Changing the automatic text and perhaps styling the existing text in a “deprecated” fashion could help. Maybe we could alter the URL as well to de-link it or otherwise add rel=”nofollow” or that sort of thing.

      We are also likely to experiment soon with keeping canonical course versions on GitHub with automatic syncing to production courses. I think we are ready to talk about that quite soon. That would enable people to recommend changes directly and allow us to easily accept them, pushing them straight to production. GH hosting also makes it easier for people to fork courses and/or for us to refer inquiries about adapting our materials to GH. (GH is not exactly the most layperson-friendly thing there is, but I think there could also be a layer in which anyone can log problems with a simple web form, which would also enable some very simple, public issue-tracking.)

      I’ll leave off this comment before I go too far off-topic, but in general we are driving toward more transparency, more access, semi-formal volunteer opportunities, and less sweat for everyone. We’re also going to schedule a public office hours in a couple weeks that should address Moodle, eportfolio, forums, courses, our website…everything.

      1. What about listing known bad link under the missing resources ‘We still need…’ section? At least a bad link in that section should not attract new reports! By all means remove the link and list purely as plain text if that keeps everybody happier and you can still keep the existing text message in the main page (if your underlying data structure allows).

        As we have discussed before, there needs to be a rather slicker process to submit revised links, especially where it is merely a case of a link change without any content variation or a move to an archived copy where the original resource is no longer available. Assuming that Saylor keep a copy of the original content it should be a purely administrative task to review these simple changes (by ‘simple’ I mean that there is no academic judgement of content relevance or quality). I’ll be interested to see where the GH idea leads.

Comments are closed.