[Isis-users] J-ISIS Release Candidate 1.2

De Smet Egbert egbert.desmet at uantwerpen.be
Tue Jun 20 09:39:47 CEST 2017


Hello,

I agree fully with the advice of Wenke. Except maybe for one thing : it is not necessary to 'prepare' a database with fields 1,2 3 etc. : mx creates these fields anyway automatically, where column 1 becomes v1, column 2 becomes v2 etc. An 'FDT' (field definition table) is only necessary or useful if you want to actually start using such database with worksheets etc.

I always in cases of such conversion do a quick check with mx to see whether the records (shown on the screen) have the same number of fields (as they should !) and whether the total number of records match the number I know to be there. If not : indeed something is wrong with your incoming CSV.

Once you got your data into ISIS format you can put them into an ISO2709 file, in fact mx can do that in one step from a 'sequential' (CSV) input file :
mx seq=myCSV.csv,; iso=myCSV.iso now -all
In this example I added ',;' after the input name to tell mx that the separator in between the fields is the semi-colon (;) and not the default pipe | as expected by mx. So you can convert any CSV using any separator (supposed the values of the fields are quoted to neglect such separators within the quotes).
If you want you can re-order or re-number the fields to the structure you want in the end, but then you have to convert to a MST-XRF ISIS-database :
mx seq=myCSV,; create=myCSV now -all
For such re-ordering you need to use 'proc'-scripts (CISIS does not use CDS/ISIS 'reformatting FST's'), e.g. create a text-file 'convert.prc' with contents :
'd1'.
'<100>',v1,'</100>'
This means : delete the incoming v1 in the output record and put the contents of that v1 (before deleting it) into v100. So the XML-syntax is easy to understand. In between the two tags you can put any PFT, e.g.
'<100>, v1^a,'--',(v2|; |),'</100>' to concatenate subfield a of v1 with all occurrences of v2 separated by a semi-colon.

Then with your ISO2709 file you can run the import into J-ISIS, which has ISO2709 import as a standard feature (now much faster than the initial version in this respect), and from there on you have all the power of the ISIS PFT to manipulate and sort your data.
You could also skip the previously mentioned reformatting step with 'proc' and do it inside J-ISIS as it has the reformatting FST feature of WinISIS in the export function.

One warning : if your input data already contain Unicode characters, be careful with the ISO-2709 files because these last ones are fully based on the 'number of characters in the record' and that actually means 'number of bytes'. Unicode can be multi-byte (e.g. 2 bytes for one character) but mx will treat them as simply sequences of bytes and count them also in the ISO2709 header (where the total length of the record is put). But e.g. if you convert that ISO-file as a text-file with an editor to another character-set, 2 bytes might becomes just one and the number of bytes no longer matches. So don't try such manipulations, just pass on the ISO2709 data to J-ISIS.
But I think you are more of an expert on Unicode than (most of) us, so please let us know the results.

Summarizing : you need only 2 steps :
1. convert your CSV to ISO2709 with the mx-command
2. convert the resulting ISO2709 file into a J-ISIS database using the J-ISIS import feature.
(3. do all manipulation you need in J-ISIS, e.g. reformatting, indexing, sorting...)

Good luck with it !

Egbert de Smet
Universiteit Antwerpen
________________________________
From: isis-users [isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org] on behalf of Wenke Adam [wenkeadam at gmail.com]
Sent: Tuesday, June 20, 2017 8:04 AM
To: Hussain KH
Cc: <isis-users at iccisis.org>
Subject: Re: [Isis-users] J-ISIS Release Candidate 1.2

Dear Hussain,
I have converted a large number of databases from dozens of applications over the years, and the method I so far have found best, as it gives you control over every step, is the one described below.

Fangorn is still useful for small datasets, but breaks easily when it encounters unexpected caracters and doesn´t tell you what´s the problem.

Converting from csv to isis is easy with the cisis tools. The mx program has commands for that. (I don´t have the exact commands here at the moment, can send them tomorrow if you need).

Preparation for a clean csv is the most crucial task:

  1.  In a copy of the original spreadsheet, edit out any hard linebreaks in fields (cells) if there are any. Repeatable entries should only be separated by semicolon space: "item1; item2; item3..."
  2.  Also check for possible #, @, and | (pipe) signs in the text that mx could interpret as Isis commands.
  3.  When the editing is done, use OpenOffice to convert to csv, with pipes | as field separators.(I don´t see this option working in Excel)
  4.  Next step, prepare a new Isis database containing fields tags 1,2,3,4,5,6... etc, as many fields as columns in your spreadsheet. Don´t bother with field names, just call them 1,2,3,4,5, 6...
  5.  Convert the csv to iso and import into your newbase, and check how everything looks.
  6.  When satisfied with the results, use a conversion fst to convert newbase to the structure of your definitive database and import there.
  7.  If the records from csv in newbase are cut or seem to put the last fields of one record at the beginning of the next, there could be some more original excel/openoffice editing sign causing the problem. Happens f.ex. with the hard linebreak when contents have been copypasted from word or other wordprocessor. Or, some times you have to create your newbase with one more field than there are columns in the spreadsheet.
  8.  If you find errors like this, don´t fall for the temptation to edit them in newbase! Go back to your editing spreadsheet, correct there, and reimport.

Good luck!

Regards

Wenke




2017-06-19 12:24 GMT-04:00 Hussain KH <hussain.rachana at gmail.com<mailto:hussain.rachana at gmail.com>>:
Thanks Jean.

I have already contacted Dr. Prakash. Indeed he has done a marvellous work in Tamil.

J-ISIS can do a lot of things in Indian context.

I have data in MS Access in Malayalam script and doing data correction now. Apart from MarcEdit is there any other way (like the old Fangorn) to convert to isis db from delimited text? Or,  have you attempted a method to take directly from a csv?

Thanking


സസ്നേഹം
- ഹു _______________________
If you optimize everything,
you will always be unhappy.
— DONALD KNUTH

On Mon, Jun 19, 2017 at 1:12 AM, Jean-Claude Dauphin <jc.dauphin at gmail.com<mailto:jc.dauphin at gmail.com>> wrote:
Dear Hussain,

Thank you for taking the time to report your findings. I think that J-ISIS will help you to do the job, don't hesitate to ask
for help if needed.
I recently tested the VIAF data set data which contains several non European content. It was needed to install the MS Arial Unicode font which is not any more provided by Microsoft with Windows 10. I can prepare a step by step instructions on how
to download a free MS Arial Unicode Font   and how to install it on Windows 10.
I think it could be useful to contact Dr Prakash Ira who successfully produced a Digital Library for Tamil Imprints Using J-ISIS.  ira.prakash at gmail.com<mailto:ira.prakash at gmail.com>

Best wishes
Jean-Claude


On Sun, Jun 18, 2017 at 7:35 AM, Hussain KH <hussain.rachana at gmail.com<mailto:hussain.rachana at gmail.com>> wrote:
Dear Jean-Claude and ISIS Friends

I've just unzipped jisis_suite.11.June.2017 for windows and just opened. I haven't yet started working.

My immediate project is to prepare a print ready copy of bibliography of Malayalam books published during 2001-2005. It comes around more than 6,000 in 62 subject.
I've been searching for a most suitable package for the purpose. Since I'm an expert in cds/isis since 1986 and prepared many printed bibliographies (Bamboo Bibliography in 1990 was an acclaimed one) I know the capability of isis to do the job. Alas! it is only in English.

Past three days I was madly searching for a package, reading and going through many  like EndNote, Zotero, Mendely, DB/TextWorks, Jabref, DBs for Latex, etc. etc. not finding the excellent sorting and formatting facilities provided by isis.
Very disappointedly I googled "cds isis unicode 2017" and to my great excitement and luck I came to the latest J-ISIS, thanks to the untiring pursuit of Jean-Claude.

Now I'm starting my work, the 9th volume of 'Grandhasoochi', a unique bibliography in all indian languages with J-ISIS. Though I haven't explored I'm confident that I can accomplish it.

I haven't used isis for the last three years after my retirement. Now I'm recollecting all that I did in 25 years with the great isis.  I'll communicate all my new findings in using Unicode Malayalam. Since I'm a member in developing Unicode language technology in Malayalam and designed Unicode fonts based on traditional script (Rachana, Meera, Keraleeya, Uroob, Tamil Meera- Meera Inimai) I hope I can expose many things related to Indic scripts.

May your efforts find outstanding results in unknown countries and languages.

Loving and Thanking


സസ്നേഹം
- ഹു _______________________
If you optimize everything,
you will always be unhappy.
— DONALD KNUTH

On Tue, Jun 13, 2017 at 5:31 PM, Ernesto Spinak <ernesto_luis_96 at hotmail.com<mailto:ernesto_luis_96 at hotmail.com>> wrote:
Jean Claude
Good news, thanks for your effort
Ernesto Spinak

________________________________________
De: isis-users [isis-users-bounces+ernesto_luis_96=hotmail.com at iccisis.org<mailto:hotmail.com at iccisis.org>] en nombre de Jean-Claude Dauphin [jc.dauphin at gmail.com<mailto:jc.dauphin at gmail.com>]
Enviado: domingo, 11 de junio de 2017 16:55
Para: <isis-users at iccisis.org<mailto:isis-users at iccisis.org>>; Jean-Claude Dauphin
Asunto: [Isis-users] J-ISIS Release Candidate 1.2

Dear ISIS Users,

Please find for your consideration the 11 June 2017 Release Candidate of J-ISIS. The Release Candidate (RC) is a beta version with potential to be a final product, which is ready to release unless significant bugs<https://en.wikipedia.org/wiki/Computer_bug> emerge.
J-ISIS 11 June 2017<https://github.com/J-ISIS/J-ISIS/releases/download/v1.2/jisis_suite.11.June.2017.zip>
The Release Note  describes the main Improvements and Bug fixes of J-ISIS 11 June 2017 Release Candidate
J-ISIS 11 June 2017 Release Note<https://github.com/J-ISIS/J-ISIS/blob/master/J-ISIS%20release%201-2.pdf>

<https://kenai.com/projects/j-isis/downloads/download/jisis_suite%2015%20February%202016%20RC.zip>
You will find below a summary of the major bug fixes and improvements, but please read the release note at it contains more details and screen shots.

As usual, I would be very grateful if you could take the time to try J-ISIS. All your comments, suggestions, improvement requests and bug descriptions are welcome.

Best wishes,
Jean-Claude

J-ISIS Release Candidate 1.2


I.               Fixes to the J-ISIS Print Format



1)     Repeatable literals were not working as expected with field dummy selectors (D or N)
|Hello|d270 was producing an empty string even if field 270 was present


2)     Conditional literals with subfield dummy selectors (D or N)
“Hello”d270^d was always producing Hello as output even if no subfield ^d was present
 Same for “Hello”n270^d,

3)      MFN command was raising an error in REF function expressions like:

ref(mfn,

if p(v19) and v19^x<='0'then", "d963^i,

(if v19^x<='0'then|<b>|v19^a*2|</b>|,| |v19^b fi)

fi,

)

4)     Extracting a fragment of a Subfield specifying only the offset (*offset) was not working

V270^a*2 for example

5)     String function F(expr-1 ,expr-2,expr-3)default width value

6)     String functions S, SS, and CISIS functions LEFT, MID, REPLACE, and RIGHT were not working in repeatable group.

For example

(if s(v270^d) <> '1966' then '****' else '1966' fi/)

7)     New Print Format Command for Unconditional Literals <text> …</text>

Plain text or most probably HTML formatting can now be imbedded between the <text> and </text> tagging commands, it works like unconditional literals.

II. Print Format for Repeatable Subfields

Subfield occurrences

It is possible to access individual occurrences of a repeatable subfield by specifying the occurrence number or range, enclosed in square brackets, immediately following the field selector or field selector followed by occurrence selector. For examples:

V270[1]^a[2],v270[1]^a[2]

It is possible to display specific occurrence of a repeatable subfield, narrowing the output to one or a range of occurrences of a repeatable subfield by specifying the occurrence number or range, enclosed in square brackets, immediately following the field selector.

v10^a[1] for example:

It is coded as follows:

[<index> [..<upper index>]]

 <index> and <upper index> refer to the first (or unique) and last occurrences, respectively. If the specified <index> is greater than the actual number of occurrences, no output is generated. The same occurs if data subfield is not repeatable and <index> is set to a number equal or greater than 2. However, if <index> is set to 1 and it is used in a non-repeatable subfield, content is normally output. This component must be used outside a repeatable group; otherwise, <upper index> is ignored. If double dot (..) is used and <upper index> is missing LAST is assumed. The LAST keyword is set with the value of total occurrences of a data subfield.

III.         Print Format Global Variables
Global variables are stored in a virtual ISIS record which is a collection of fields, fields may be repeatable and have occurrences, and fields or occurrences may have subfields. The record, field and subfield concepts are identical to ISIS.

Global variables are referenced by the letter G followed by the tag of the field. The G (a mnemonic code for Global variable) followed by the virtual record tag is the command telling J-ISIS that you want to assign or extract a field. It may be entered indifferently in upper or lower case.
Global variables can be assigned data through the Print Format commands:
g100:=((v25/)),(g100^a/)
g10 := (v10^a)

You may assign or change the value of a global variable as follows:

Gn:=(format) (for example: G5:=(v10)).

Note that the parentheses around format are required.
 Global variables can be extracted for output like V variables just by replacing the V by G that means that data will be extracted from the virtual record. It supports repeatable groups as well.

Please note that it is a first attempt to implement Global variables and that specific functions could also be implemented to further manipulate them. Please let me know if it is worth to continue working in this direction.

IV.         New Paging feature into DB Browser and Terms Dictionary
Databases could be huge. If a database has millions of records and all records are loaded into memory, it will consume a huge amount of memory and will of course be very slow. As a matter of facts, user will probably only look at 10 or maybe 20 records depending on the viewport size, there is no need to download all the records locally. That’s the reason why the paging feature was introduced into the DB browser and Terms Dictionary Browser modules.
To make it easy to use the Paging feature, a page navigation toolbar provides the interface to do the navigation.
10 000 records are loaded per page and the user can scroll easily and fast through the page records. For example, the VIAF database has near 32 million records (31 305 939 records exactly)


V.             Export features to select search results and using a hit file to drive output are now implemented

You can now export records retrieved from search as well as export records following the order defined by a hit file produced by the PrintSort module

Note: A hit file manager will be developed in the future to better manage search hit files and hit sort files


VI.         The Number of Terms in the index is now stored in an external file to avoid the time consuming task of counting them.

The /indexes directory contains a subdirectory called master that contains the main index files generated by Lucene open-source search software<http://lucene.apache.org/>. A new file named “termscount.properties” is now generated by J-ISIS to keep the number of terms in the index as well as a time stamp, and is stored in the /indexes/master folder. The number of terms in the index is only computed when the index has changed and replaced with the new time stamp in the external file.


For databases with more than 2 millions records, it reduces considerably the time spent to get the database information.


--
Jean-Claude Dauphin

jc.dauphin at gmail.com<mailto:jc.dauphin at gmail.com><mailto:jc.dauphin at gmail.com<mailto:jc.dauphin at gmail.com>>

https://github.com/J-ISIS<http://kenai.com/projects/j-isis/>

http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org
_______________________________________________
isis-users mailing list
isis-users at iccisis.org<mailto:isis-users at iccisis.org>
To manage your own subscription options go to: http://lists.iccisis.org/listinfo/isis-users
Or contact Henk Rutten: hlrutten at xs4all.nl<mailto:hlrutten at xs4all.nl>




--
Jean-Claude Dauphin

jc.dauphin at gmail.com<mailto:jc.dauphin at gmail.com>

https://github.com/J-ISIS<http://kenai.com/projects/j-isis/>

http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org


_______________________________________________
isis-users mailing list
isis-users at iccisis.org<mailto:isis-users at iccisis.org>
To manage your own subscription options go to: http://lists.iccisis.org/listinfo/isis-users
Or contact Henk Rutten: hlrutten at xs4all.nl<mailto:hlrutten at xs4all.nl>




--
Wenke Adam
Asesora Sistemas de Doc & Inf
Santiago
Chile
Cel: +56-9-890 21 630
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20170620/f6c52c1d/attachment.html>


More information about the isis-users mailing list