[Isis-users] MARC file import

Ernesto Spinak spinaker at adinet.com.uy
Tue Apr 2 08:52:42 CEST 2013


Dear Renate


The character encoding ANSI/ASCII byte occupies 256 bytes, but UTF-8 
uses 65536 in order to cover all alphabets

You can map the characters of ANSI/ASCII of any page-code of MS-DOS 
into an UTF-8 sequence using a gizmo, but you can not do the reverse, i.
e. map any UTF-8 character over 256 positions because they do not fit, 
and you can not solve the problem with a gizmo.

There are ISO standards for the transliteration of characters, so it 
might be possible to construct appropriate gizmos for many alphabets, 
but that does not solve all cases, such as Chinese or Japanese.

In short, if you have MARC records in Chinese or Japanese, you can not 
convert to ANSI. That's not the problem of MARC or ABCD. You should 
work in unicode, This is a project led by Egbert adn is very advanced, 
is part of ABCD v.2
We have CISIS in unicode working as beta test.

Regards
Ernesto Spinak


Hi Ernesto,

Thanks for sending the instructions for MARC import. The Project 
Gutenberg catalogue data contains books in Chinese, Arabic, Russian, 
etc. The format is UTF-8. After conversion with cisis, these characters 
are not displayed correctly.  What gizmo should be used and how is it 
entered?

Regards and thanks

Renate
> Renate
>
> There are three features to note when importing/exporting marc files 
with Cisis or Winisis, etc using ISO 2709 standard
>
>     length of the line, wich in Isis is 80 chars (default) but you 
can modify it using 0 or marc
>         including the end of line control, Isis uses two chars 
(LF+CR) by default or only one char in .mrc files 
>     subfield deliminter, which is  ^  in Isis, but in marc it is a 
non representable ascii char (below 32)  \asc 29 in .mrc
>     leader bytes transfered in the first 20 bytes of the ISO file, 
and is not accesible neither in Winisis or Pft 
>
>
> Items 2 and 3 are implemented in Winisis, but item 3 is a feature 
only of CISIS
>
>
> Therefore when you want to get the three items you have to do it 
using MX command,
> I'd recommed to use 1660 version or MX, because it gets all the 
leader bytes
>
>
> Let's say you want to import a .mrc file, from Library of Congress 
or Gutenberg project,
>
>
> mx iso=marc=source.mrc  isotag1=3000  gizmo=gizdelim  
create=myisisdb  now -all tell=100
>
>
>     iso=marc=source.mrc   is the source data, usually standard marc 
files comes with extension .mrc
>     isotag1=3000       you need this parameter so you move leader 
bytes into conventional field v30xx
>     gizmo=gizdelim    this is a gizmo to convert the delimiter 
\asc32 into our loved ^ as delimiter 
>
> If you don't have this gizmo  you can create it with the following 
instruction
>
> mx null count=1 "proc='a1/1F/a11/hex/a2/^/'" create=gizdelim
>
> Regards
> Ernesto Spinak
>



More information about the isis-users mailing list