[Isis-users] J-ISIS Release Candidate 1.2

Wenke Adam wenkeadam at gmail.com
Tue Jun 20 15:35:20 CEST 2017


Thank you, Egbert, for the extensive reply to Hussain.

My point, when suggesting the actual creation of a isis database as a
middle step was, that in my experience, data ported from spreadsheets to
Isis so often contain hidden editing signals and other "surprises" that you
really need to have full visual control of the results in order to detect
them. I even make a full inverted file for this.

A raw listing of the data presented by mx in a black comand window is not
the ideal setting for effective proofing of the results of a conversion.
Except, of course, if the number of records is very small and you are sure
the original csv is flawless. Only then can you go directly from csv to
final base with mx, provided the csv columns are in the exact same order as
the isis fields.

Regards to all

wenke


2017-06-20 3:39 GMT-04:00 De Smet Egbert <egbert.desmet at uantwerpen.be>:

> Hello,
>
> I agree fully with the advice of Wenke. Except maybe for one thing : it is
> not necessary to 'prepare' a database with fields 1,2 3 etc. : mx creates
> these fields anyway automatically, where column 1 becomes v1, column 2
> becomes v2 etc. An 'FDT' (field definition table) is only necessary or
> useful if you want to actually start using such database with worksheets
> etc.
>
> I always in cases of such conversion do a quick check with mx to see
> whether the records (shown on the screen) have the same number of fields
> (as they should !) and whether the total number of records match the number
> I know to be there. If not : indeed something is wrong with your incoming
> CSV.
>
> Once you got your data into ISIS format you can put them into an ISO2709
> file, in fact mx can do that in one step from a 'sequential' (CSV) input
> file :
> mx seq=myCSV.csv,; iso=myCSV.iso now -all
> In this example I added ',;' after the input name to tell mx that the
> separator in between the fields is the semi-colon (;) and not the default
> pipe | as expected by mx. So you can convert any CSV using any separator
> (supposed the values of the fields are quoted to neglect such separators
> within the quotes).
> If you want you can re-order or re-number the fields to the structure you
> want in the end, but then you have to convert to a MST-XRF ISIS-database :
> mx seq=myCSV,; create=myCSV now -all
> For such re-ordering you need to use 'proc'-scripts (CISIS does not use
> CDS/ISIS 'reformatting FST's'), e.g. create a text-file 'convert.prc' with
> contents :
> 'd1'.
> '<100>',v1,'</100>'
> This means : delete the incoming v1 in the output record and put the
> contents of that v1 (before deleting it) into v100. So the XML-syntax is
> easy to understand. In between the two tags you can put any PFT, e.g.
> '<100>, v1^a,'--',(v2|; |),'</100>' to concatenate subfield a of v1 with
> all occurrences of v2 separated by a semi-colon.
>
> Then with your ISO2709 file you can run the import into J-ISIS, which has
> ISO2709 import as a standard feature (now much faster than the initial
> version in this respect), and from there on you have all the power of the
> ISIS PFT to manipulate and sort your data.
> You could also skip the previously mentioned reformatting step with 'proc'
> and do it inside J-ISIS as it has the reformatting FST feature of WinISIS
> in the export function.
>
> One warning : if your input data already contain Unicode characters, be
> careful with the ISO-2709 files because these last ones are fully based on
> the 'number of characters in the record' and that actually means 'number of
> bytes'. Unicode can be multi-byte (e.g. 2 bytes for one character) but mx
> will treat them as simply sequences of bytes and count them also in the
> ISO2709 header (where the total length of the record is put). But e.g. if
> you convert that ISO-file as a text-file with an editor to another
> character-set, 2 bytes might becomes just one and the number of bytes no
> longer matches. So don't try such manipulations, just pass on the ISO2709
> data to J-ISIS.
> But I think you are more of an expert on Unicode than (most of) us, so
> please let us know the results.
>
> Summarizing : you need only 2 steps :
> 1. convert your CSV to ISO2709 with the mx-command
> 2. convert the resulting ISO2709 file into a J-ISIS database using the
> J-ISIS import feature.
> (3. do all manipulation you need in J-ISIS, e.g. reformatting, indexing,
> sorting...)
>
> Good luck with it !
>
> Egbert de Smet
> Universiteit Antwerpen
> ------------------------------
> *From:* isis-users [isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org]
> on behalf of Wenke Adam [wenkeadam at gmail.com]
> *Sent:* Tuesday, June 20, 2017 8:04 AM
> *To:* Hussain KH
> *Cc:* <isis-users at iccisis.org>
> *Subject:* Re: [Isis-users] J-ISIS Release Candidate 1.2
>
> Dear Hussain,
> I have converted a large number of databases from dozens of applications
> over the years, and the method I so far have found best, as it gives you
> control over every step, is the one described below.
>
> Fangorn is still useful for small datasets, but breaks easily when it
> encounters unexpected caracters and doesn´t tell you what´s the problem.
>
> *Converting from csv to isis is easy with the cisis tools*. The mx
> program has commands for that. (I don´t have the exact commands here at the
> moment, can send them tomorrow if you need).
>
> *Preparation for a clean csv is the most crucial task:*
>
>    1. In a copy of the original spreadsheet, edit out any hard linebreaks
>    in fields (cells) if there are any. Repeatable entries should only be
>    separated by semicolon space: "item1; item2; item3..."
>    2. Also check for possible #, @, and | (pipe) signs in the text that
>    mx could interpret as Isis commands.
>    3. When the editing is done, use OpenOffice to convert to csv, with
>    pipes | as field separators.(I don´t see this option working in Excel)
>    4. Next step, prepare a new Isis database containing fields tags
>    1,2,3,4,5,6... etc, as many fields as columns in your spreadsheet. Don´t
>    bother with field names, just call them 1,2,3,4,5, 6...
>    5. Convert the csv to iso and import into your newbase, and check how
>    everything looks.
>    6. When satisfied with the results, use a conversion fst to convert
>    newbase to the structure of your definitive database and import there.
>    7. If the records from csv in newbase are cut or seem to put the last
>    fields of one record at the beginning of the next, there could be some more
>    original excel/openoffice editing sign causing the problem. Happens f.ex.
>    with the hard linebreak when contents have been copypasted from word or
>    other wordprocessor. Or, some times you have to create your newbase with
>    one more field than there are columns in the spreadsheet.
>    8. If you find errors like this, don´t fall for the temptation to edit
>    them in newbase! Go back to your editing spreadsheet, correct there, and
>    reimport.
>
> Good luck!
>
> Regards
>
> Wenke
>
>
>
>
> 2017-06-19 12:24 GMT-04:00 Hussain KH <hussain.rachana at gmail.com>:
>
>> Thanks Jean.
>>
>> I have already contacted Dr. Prakash. Indeed he has done a marvellous
>> work in Tamil.
>>
>> J-ISIS can do a lot of things in Indian context.
>>
>> I have data in MS Access in Malayalam script and doing data correction
>> now. Apart from MarcEdit is there any other way (like the old Fangorn) to
>> convert to isis db from delimited text? Or,  have you attempted a method to
>> take directly from a csv?
>>
>> Thanking
>>
>>
>>
>> *സസ്നേഹം *
>> *- ഹു _______________________*
>> *If you optimize everything,*
>> *you will always be unhappy. *
>> *— DONALD KNUTH*
>>
>> On Mon, Jun 19, 2017 at 1:12 AM, Jean-Claude Dauphin <
>> jc.dauphin at gmail.com> wrote:
>>
>>> Dear Hussain,
>>>
>>> Thank you for taking the time to report your findings. I think that
>>> J-ISIS will help you to do the job, don't hesitate to ask
>>> for help if needed.
>>> I recently tested the VIAF data set data which contains several non
>>> European content. It was needed to install the MS Arial Unicode font which
>>> is not any more provided by Microsoft with Windows 10. I can prepare a step
>>> by step instructions on how
>>> to download a free MS Arial Unicode Font   and how to install it on
>>> Windows 10.
>>> I think it could be useful to contact Dr Prakash Ira who successfully
>>> produced a Digital Library for Tamil Imprints Using J-ISIS.
>>> ira.prakash at gmail.com
>>>
>>> Best wishes
>>> Jean-Claude
>>>
>>>
>>> On Sun, Jun 18, 2017 at 7:35 AM, Hussain KH <hussain.rachana at gmail.com>
>>> wrote:
>>>
>>>> Dear Jean-Claude and ISIS Friends
>>>>
>>>> I've just unzipped jisis_suite.11.June.2017 for windows and just
>>>> opened. I haven't yet started working.
>>>>
>>>> My immediate project is to prepare a print ready copy of bibliography
>>>> of Malayalam books published during 2001-2005. It comes around more than
>>>> 6,000 in 62 subject.
>>>> I've been searching for a most suitable package for the purpose. Since
>>>> I'm an expert in cds/isis since 1986 and prepared many printed
>>>> bibliographies (Bamboo Bibliography in 1990 was an acclaimed one) I know
>>>> the capability of isis to do the job. Alas! it is only in English.
>>>>
>>>> Past three days I was madly searching for a package, reading and going
>>>> through many  like EndNote, Zotero, Mendely, DB/TextWorks, Jabref, DBs for
>>>> Latex, etc. etc. not finding the excellent sorting and formatting
>>>> facilities provided by isis.
>>>> Very disappointedly I googled "cds isis unicode 2017" and to my great
>>>> excitement and luck I came to the latest J-ISIS, thanks to the untiring
>>>> pursuit of Jean-Claude.
>>>>
>>>> Now I'm starting my work, the 9th volume of 'Grandhasoochi', a unique
>>>> bibliography in all indian languages with J-ISIS. Though I haven't explored
>>>> I'm confident that I can accomplish it.
>>>>
>>>> I haven't used isis for the last three years after my retirement. Now
>>>> I'm recollecting all that I did in 25 years with the great isis.  I'll
>>>> communicate all my new findings in using Unicode Malayalam. Since I'm a
>>>> member in developing Unicode language technology in Malayalam and designed
>>>> Unicode fonts based on traditional script (Rachana, Meera, Keraleeya,
>>>> Uroob, Tamil Meera- Meera Inimai) I hope I can expose many things related
>>>> to Indic scripts.
>>>>
>>>> May your efforts find outstanding results in unknown countries and
>>>> languages.
>>>>
>>>> Loving and Thanking
>>>>
>>>>
>>>>
>>>> *സസ്നേഹം *
>>>> *- ഹു _______________________*
>>>> *If you optimize everything,*
>>>> *you will always be unhappy. *
>>>> *— DONALD KNUTH*
>>>>
>>>> On Tue, Jun 13, 2017 at 5:31 PM, Ernesto Spinak <
>>>> ernesto_luis_96 at hotmail.com> wrote:
>>>>
>>>>> Jean Claude
>>>>> Good news, thanks for your effort
>>>>> Ernesto Spinak
>>>>>
>>>>> ________________________________________
>>>>> De: isis-users [isis-users-bounces+ernesto_luis_96=
>>>>> hotmail.com at iccisis.org] en nombre de Jean-Claude Dauphin [
>>>>> jc.dauphin at gmail.com]
>>>>> Enviado: domingo, 11 de junio de 2017 16:55
>>>>> Para: <isis-users at iccisis.org>; Jean-Claude Dauphin
>>>>> Asunto: [Isis-users] J-ISIS Release Candidate 1.2
>>>>>
>>>>> Dear ISIS Users,
>>>>>
>>>>> Please find for your consideration the 11 June 2017 Release Candidate
>>>>> of J-ISIS. The Release Candidate (RC) is a beta version with potential to
>>>>> be a final product, which is ready to release unless significant bugs<
>>>>> https://en.wikipedia.org/wiki/Computer_bug> emerge.
>>>>> J-ISIS 11 June 2017<https://github.com/J-ISIS
>>>>> /J-ISIS/releases/download/v1.2/jisis_suite.11.June.2017.zip>
>>>>> The Release Note  describes the main Improvements and Bug fixes of
>>>>> J-ISIS 11 June 2017 Release Candidate
>>>>> J-ISIS 11 June 2017 Release Note<https://github.com/J-ISIS
>>>>> /J-ISIS/blob/master/J-ISIS%20release%201-2.pdf>
>>>>>
>>>>> <https://kenai.com/projects/j-isis/downloads/download/jisis_
>>>>> suite%2015%20February%202016%20RC.zip>
>>>>> You will find below a summary of the major bug fixes and improvements,
>>>>> but please read the release note at it contains more details and screen
>>>>> shots.
>>>>>
>>>>> As usual, I would be very grateful if you could take the time to try
>>>>> J-ISIS. All your comments, suggestions, improvement requests and bug
>>>>> descriptions are welcome.
>>>>>
>>>>> Best wishes,
>>>>> Jean-Claude
>>>>>
>>>>> J-ISIS Release Candidate 1.2
>>>>>
>>>>>
>>>>> I.               Fixes to the J-ISIS Print Format
>>>>>
>>>>>
>>>>>
>>>>> 1)     Repeatable literals were not working as expected with field
>>>>> dummy selectors (D or N)
>>>>> |Hello|d270 was producing an empty string even if field 270 was present
>>>>>
>>>>>
>>>>> 2)     Conditional literals with subfield dummy selectors (D or N)
>>>>> “Hello”d270^d was always producing Hello as output even if no subfield
>>>>> ^d was present
>>>>>  Same for “Hello”n270^d,
>>>>>
>>>>> 3)      MFN command was raising an error in REF function expressions
>>>>> like:
>>>>>
>>>>> ref(mfn,
>>>>>
>>>>> if p(v19) and v19^x<='0'then", "d963^i,
>>>>>
>>>>> (if v19^x<='0'then|<b>|v19^a*2|</b>|,| |v19^b fi)
>>>>>
>>>>> fi,
>>>>>
>>>>> )
>>>>>
>>>>> 4)     Extracting a fragment of a Subfield specifying only the offset
>>>>> (*offset) was not working
>>>>>
>>>>> V270^a*2 for example
>>>>>
>>>>> 5)     String function F(expr-1 ,expr-2,expr-3)default width value
>>>>>
>>>>> 6)     String functions S, SS, and CISIS functions LEFT, MID, REPLACE,
>>>>> and RIGHT were not working in repeatable group.
>>>>>
>>>>> For example
>>>>>
>>>>> (if s(v270^d) <> '1966' then '****' else '1966' fi/)
>>>>>
>>>>> 7)     New Print Format Command for Unconditional Literals <text>
>>>>> …</text>
>>>>>
>>>>> Plain text or most probably HTML formatting can now be imbedded
>>>>> between the <text> and </text> tagging commands, it works like
>>>>> unconditional literals.
>>>>>
>>>>> II. Print Format for Repeatable Subfields
>>>>>
>>>>> Subfield occurrences
>>>>>
>>>>> It is possible to access individual occurrences of a repeatable
>>>>> subfield by specifying the occurrence number or range, enclosed in square
>>>>> brackets, immediately following the field selector or field selector
>>>>> followed by occurrence selector. For examples:
>>>>>
>>>>> V270[1]^a[2],v270[1]^a[2]
>>>>>
>>>>> It is possible to display specific occurrence of a repeatable
>>>>> subfield, narrowing the output to one or a range of occurrences of a
>>>>> repeatable subfield by specifying the occurrence number or range, enclosed
>>>>> in square brackets, immediately following the field selector.
>>>>>
>>>>> v10^a[1] for example:
>>>>>
>>>>> It is coded as follows:
>>>>>
>>>>> [<index> [..<upper index>]]
>>>>>
>>>>>  <index> and <upper index> refer to the first (or unique) and last
>>>>> occurrences, respectively. If the specified <index> is greater than the
>>>>> actual number of occurrences, no output is generated. The same occurs if
>>>>> data subfield is not repeatable and <index> is set to a number equal or
>>>>> greater than 2. However, if <index> is set to 1 and it is used in a
>>>>> non-repeatable subfield, content is normally output. This component must be
>>>>> used outside a repeatable group; otherwise, <upper index> is ignored. If
>>>>> double dot (..) is used and <upper index> is missing LAST is assumed. The
>>>>> LAST keyword is set with the value of total occurrences of a data subfield.
>>>>>
>>>>> III.         Print Format Global Variables
>>>>> Global variables are stored in a virtual ISIS record which is a
>>>>> collection of fields, fields may be repeatable and have occurrences, and
>>>>> fields or occurrences may have subfields. The record, field and subfield
>>>>> concepts are identical to ISIS.
>>>>>
>>>>> Global variables are referenced by the letter G followed by the tag of
>>>>> the field. The G (a mnemonic code for Global variable) followed by the
>>>>> virtual record tag is the command telling J-ISIS that you want to assign or
>>>>> extract a field. It may be entered indifferently in upper or lower case.
>>>>> Global variables can be assigned data through the Print Format
>>>>> commands:
>>>>> g100:=((v25/)),(g100^a/)
>>>>> g10 := (v10^a)
>>>>>
>>>>> You may assign or change the value of a global variable as follows:
>>>>>
>>>>> Gn:=(format) (for example: G5:=(v10)).
>>>>>
>>>>> Note that the parentheses around format are required.
>>>>>  Global variables can be extracted for output like V variables just by
>>>>> replacing the V by G that means that data will be extracted from the
>>>>> virtual record. It supports repeatable groups as well.
>>>>>
>>>>> Please note that it is a first attempt to implement Global variables
>>>>> and that specific functions could also be implemented to further manipulate
>>>>> them. Please let me know if it is worth to continue working in this
>>>>> direction.
>>>>>
>>>>> IV.         New Paging feature into DB Browser and Terms Dictionary
>>>>> Databases could be huge. If a database has millions of records and all
>>>>> records are loaded into memory, it will consume a huge amount of memory and
>>>>> will of course be very slow. As a matter of facts, user will probably only
>>>>> look at 10 or maybe 20 records depending on the viewport size, there is no
>>>>> need to download all the records locally. That’s the reason why the paging
>>>>> feature was introduced into the DB browser and Terms Dictionary Browser
>>>>> modules.
>>>>> To make it easy to use the Paging feature, a page navigation toolbar
>>>>> provides the interface to do the navigation.
>>>>> 10 000 records are loaded per page and the user can scroll easily and
>>>>> fast through the page records. For example, the VIAF database has near 32
>>>>> million records (31 305 939 records exactly)
>>>>>
>>>>>
>>>>> V.             Export features to select search results and using a
>>>>> hit file to drive output are now implemented
>>>>>
>>>>> You can now export records retrieved from search as well as export
>>>>> records following the order defined by a hit file produced by the PrintSort
>>>>> module
>>>>>
>>>>> Note: A hit file manager will be developed in the future to better
>>>>> manage search hit files and hit sort files
>>>>>
>>>>>
>>>>> VI.         The Number of Terms in the index is now stored in an
>>>>> external file to avoid the time consuming task of counting them.
>>>>>
>>>>> The /indexes directory contains a subdirectory called master that
>>>>> contains the main index files generated by Lucene open-source search
>>>>> software<http://lucene.apache.org/>. A new file named
>>>>> “termscount.properties” is now generated by J-ISIS to keep the number of
>>>>> terms in the index as well as a time stamp, and is stored in the
>>>>> /indexes/master folder. The number of terms in the index is only computed
>>>>> when the index has changed and replaced with the new time stamp in the
>>>>> external file.
>>>>>
>>>>>
>>>>> For databases with more than 2 millions records, it reduces
>>>>> considerably the time spent to get the database information.
>>>>>
>>>>>
>>>>> --
>>>>> Jean-Claude Dauphin
>>>>>
>>>>> jc.dauphin at gmail.com<mailto:jc.dauphin at gmail.com>
>>>>>
>>>>> https://github.com/J-ISIS<http://kenai.com/projects/j-isis/>
>>>>>
>>>>> http://www.unesco.org/isis/
>>>>> http://www.unesco.org/idams/
>>>>> http://www.greenstone.org
>>>>> _______________________________________________
>>>>> isis-users mailing list
>>>>> isis-users at iccisis.org
>>>>> To manage your own subscription options go to:
>>>>> http://lists.iccisis.org/listinfo/isis-users
>>>>> Or contact Henk Rutten: hlrutten at xs4all.nl
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jean-Claude Dauphin
>>>
>>> jc.dauphin at gmail.com
>>>
>>> https://github.com/J-ISIS <http://kenai.com/projects/j-isis/>
>>>
>>> http://www.unesco.org/isis/
>>> http://www.unesco.org/idams/
>>> http://www.greenstone.org
>>>
>>
>>
>> _______________________________________________
>> isis-users mailing list
>> isis-users at iccisis.org
>> To manage your own subscription options go to:
>> http://lists.iccisis.org/listinfo/isis-users
>> Or contact Henk Rutten: hlrutten at xs4all.nl
>>
>>
>
>
> --
> Wenke Adam
> Asesora Sistemas de Doc & Inf
> Santiago
> Chile
> Cel: +56-9-890 21 630
>



-- 
Wenke Adam
Asesora Sistemas de Doc & Inf
Santiago
Chile
Cel: +56-9-890 21 630
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20170620/2fda424d/attachment.html>


More information about the isis-users mailing list