[Isis-users] Old CDS-ISIS archives

De Smet Egbert egbert.desmet at ua.ac.be
Mon Apr 11 22:21:05 CEST 2011


Luciano,

I have in fact already done so. I run a Perl-script over the texts, which puts the typical mail-tags (e.g. Reply-To: etc.) as 'tagged-text' with the values of the fields. Then I run a text-to-ISIS converter over it to convert in ISO2709. It worked rather well in sample set of the ISIS archives, after I found a way to avoid the quoted e-mails in the body of the message (which are the same tags as the real ones) to be considered as a new e-mail. This was months ago however, so I will now (if I find some time) refresh my memory about it and send you some sample data.
Then the thing is to get the whole bunch of files (they are many as they are per month for each year at least since 1994) over here and to run the conversion on it.
If the structure of the messages (I suppose that is what you mean by RFC-2822) is sufficiently constant, we could go for the database-approach (i.e. ABCD), if not we could go for your approach keeping it at a flat-text level with full-text indexing like Google Desktop's.
The advantage of the database solution of course being one could search on 'From' fields (to do a statistic on who are the most active senders ;-) ) as well as 'Subject' searches on top of the full text of the messages themselves. I remember I succeeded with that with my first tests.

Egbert de Smet
Univ. of Antwerp

________________________________________
From: isis-users-bounces at iccisis.org [isis-users-bounces at iccisis.org] on behalf of Luciano Ramalho [luciano.ramalho at bireme.org]
Sent: Monday, April 11, 2011 10:06 PM
To: isis-users at iccisis.org
Subject: Re: [Isis-users] Old CDS-ISIS archives

In order to convert the mailing list archives to ISIS we need to map
the e-mail headers, as defined by I to ISIS fields. Also,
multipart messages, such as those containing HTML or attachments, need
to have their bodies mapped to several fields.

I am new to ISIS, so I don't know about previous experiences
converting mailing list archives to ISIS. Does anyone know of a
pre-existing mapping between RFC-2822 messages and ISIS records, or
should we develop one especially for this task?

Best regards,

Luciano



2011/4/9 Luciano Ramalho <luciano.ramalho at bireme.org>:
> 2011/4/8 Henk Rutten <hlrutten at xs4all.nl>:
>> It’s more a matter of lack of time. It should be possible to convert the old
>> archives, not to Mailman, but for instance to an ABCD database. The only
>> thing is, that we didn’t have time to do it yet. I’m very sorry!
>
> Hello, Henk,
>
> I am willing to help preserve and re-publish that archive.
>
> My proposal would be to convert each e-mail to an HTML page, with
> links to previous and next message by date and tables of contents by
> month. Then it would be just a matter of uploading the HTML to a
> public site and integrate a Google search box.
>
> I'd also investigate organizing the messages by thread, but since that
> is more complicated I'd initially focus on publishing everything
> chronologically, and then, after that is online and searchable,
> evaluate a way to organize by threads as well.
>
> Henk, if you can give me access to the archive files I'd immediately
> put them in a public Web site for anyone to download them in bulk, and
> start designing the conversion process, keeping the present mailing
> list informed of the progress. The resulting tools, datasets and files
> would be shared with all, so that anyone may reuse or republish them.
>
> Cheers,
>
> Luciano Ramalho
>
> PS. This is a personal project that I'd do in my spare time, as a
> programmer and librarian interested in helping preserve this resource
> for the history of computers in libraries.
>
> 2011/4/8 Henk Rutten <hlrutten at xs4all.nl>:
>> Dear Renate,
>>
>> It’s more a matter of lack of time. It should be possible to convert the old
>> archives, not to Mailman, but for instance to an ABCD database. The only
>> thing is, that we didn’t have time to do it yet. I’m very sorry!
>>
>> Have a nice day.
>>
>> Henk Rutten
>>
>> From: Renate Morgenstern [mailto:rmorgenstern at iway.na]
>> Sent: Friday, April 08, 2011 7:39 AM
>> To: hlrutten at xs4all.nl
>> Subject: Old CDS-ISIS archives
>>
>> Good day,
>> I am/was subscribed to the CDS-ISIS list for many years. I sometimes used
>> the archives when I was looking for help to solve a problem. Was it possible
>> to get the old archives converted to the Mailman archives?
>> Thanks and regards
>> Renate
>>
>>
>> --
>>
>> Renate Morgenstern
>>
>> P O Box 30664, Windhoek, Namibia
>>
>> Tel/Fax: 242124
>>
>> Fax to Email: 088637518
>>
>> Email: rmorgenstern at iway.na
>>
>> _______________________________________________
>> isis-users mailing list
>> isis-users at iccisis.org
>> To manage your own subscription options go to:
>> http://lists.iccisis.org/listinfo/isis-users
>> Or contact Henk Rutten: hlrutten at xs4all.nl
>>
>>
>
>
>
> --
> Luciano Ramalho
> supervisor de desenvolvimento || software development lead
> BIREME/OPAS/OMS || BIREME/PAHO/WHO
>



--
Luciano Ramalho
supervisor de desenvolvimento || software development lead
BIREME/OPAS/OMS || BIREME/PAHO/WHO
_______________________________________________
isis-users mailing list
isis-users at iccisis.org
To manage your own subscription options go to: http://lists.iccisis.org/listinfo/isis-users
Or contact Henk Rutten: hlrutten at xs4all.nl


More information about the isis-users mailing list