UPDATE: HERITRIX was conceived in January 2004
It is not the first time it happens.
I started reading “a memory of webs past” by Arial Bleicher ( IEEE Spectrum pages 26-37), and though: “hey, I’ve done that before”.
The text narrates how HERITRIX, an archival crawler works. Although I have never gone further, my undergraduate thesis was written down in the text lines. And hey, I wrote it back in 2003, 6 years after the first archivers where developed and without any knowledge of their existence.
Could it be a sign that I should restart my old project?
My research (back in 2003) didn’t target archival of web content, but a novel method for indexing content. It could be very well suited for the archival task with some minor adjustments.
I was very pleased to see that the basics of such project are so similar to what I have had imagined.
Maybe it is time to go back to the drawing board.