[Isis-users] MARC file import
Ernesto Spinak
spinaker at adinet.com.uy
Tue Apr 2 08:52:42 CEST 2013
Dear Renate
The character encoding ANSI/ASCII byte occupies 256 bytes, but UTF-8
uses 65536 in order to cover all alphabets
You can map the characters of ANSI/ASCII of any page-code of MS-DOS
into an UTF-8 sequence using a gizmo, but you can not do the reverse, i.
e. map any UTF-8 character over 256 positions because they do not fit,
and you can not solve the problem with a gizmo.
There are ISO standards for the transliteration of characters, so it
might be possible to construct appropriate gizmos for many alphabets,
but that does not solve all cases, such as Chinese or Japanese.
In short, if you have MARC records in Chinese or Japanese, you can not
convert to ANSI. That's not the problem of MARC or ABCD. You should
work in unicode, This is a project led by Egbert adn is very advanced,
is part of ABCD v.2
We have CISIS in unicode working as beta test.
Regards
Ernesto Spinak
Hi Ernesto,
Thanks for sending the instructions for MARC import. The Project
Gutenberg catalogue data contains books in Chinese, Arabic, Russian,
etc. The format is UTF-8. After conversion with cisis, these characters
are not displayed correctly. What gizmo should be used and how is it
entered?
Regards and thanks
Renate
> Renate
>
> There are three features to note when importing/exporting marc files
with Cisis or Winisis, etc using ISO 2709 standard
>
> length of the line, wich in Isis is 80 chars (default) but you
can modify it using 0 or marc
> including the end of line control, Isis uses two chars
(LF+CR) by default or only one char in .mrc files
> subfield deliminter, which is ^ in Isis, but in marc it is a
non representable ascii char (below 32) \asc 29 in .mrc
> leader bytes transfered in the first 20 bytes of the ISO file,
and is not accesible neither in Winisis or Pft
>
>
> Items 2 and 3 are implemented in Winisis, but item 3 is a feature
only of CISIS
>
>
> Therefore when you want to get the three items you have to do it
using MX command,
> I'd recommed to use 1660 version or MX, because it gets all the
leader bytes
>
>
> Let's say you want to import a .mrc file, from Library of Congress
or Gutenberg project,
>
>
> mx iso=marc=source.mrc isotag1=3000 gizmo=gizdelim
create=myisisdb now -all tell=100
>
>
> iso=marc=source.mrc is the source data, usually standard marc
files comes with extension .mrc
> isotag1=3000 you need this parameter so you move leader
bytes into conventional field v30xx
> gizmo=gizdelim this is a gizmo to convert the delimiter
\asc32 into our loved ^ as delimiter
>
> If you don't have this gizmo you can create it with the following
instruction
>
> mx null count=1 "proc='a1/1F/a11/hex/a2/^/'" create=gizdelim
>
> Regards
> Ernesto Spinak
>
More information about the isis-users
mailing list