[Isis-users] Former command "lr"

De Smet Egbert egbert.desmet at uantwerpen.be
Fri Oct 19 15:28:47 CEST 2018


Hi,


2 remarks :

- in ABCD 2.0 you can also use the 'bigisis' variety of ISIS-databases, allowing records up to 1 Mb.

- if that is still not enough, you should indeed go for another software, however we can recommend J-ISIS which, like ABCD2.0, uses the Tika-library for text-extracting and, unlike ABCD2.0, Lucene for indexing text document.


We have successfully used ABCD2.0 for indexing full-text records, e.g. in the 'DubCore' demo database, where the full-text is extracted by a script (in the 'utilities/extra' menu) by Tika and saved in a text (or html)-file, which is indexed with - as you did - a 'cat' command to serve the text-file as input for the indexing-technique 8 (but indexing with 'm'-parameter of fullinv/m). See more detailed instructions in the updated ABC-of-ABCD manual for v2.0.

All this works fine and (very) fast, what is missing is relevance ranking. Hence the suggestion to use J-ISIS (see its 'Digital Library' example) as it uses Lucene which has relevance ranking. In J-ISIS there are also no more limits re record-size.

ABCD v3.0 (developed and tested currently) is using J-ISIS and therefore will also use full-text and relevance ranking.


Egbert de Smet
Universiteit Antwerpen


________________________________
From: Leandro Vicente <leandro_biblioteca at hotmail.com>
Sent: Friday, October 19, 2018 3:05 PM
To: spinaker; De Smet Egbert; isis-users at iccisis.org
Subject: Re: [Isis-users] Former command "lr"

Hi Egbert, Ernesto and group


Thanks for the comments, really appreciate . Ernesto, I was not aware of the $$REF trick, it was good to know. For some reason my web search didn'd get such content from Gilda's wiki. Nice. Egbert, I decided to work another way but 'join' would be fine as well. I think 'join' would give me the same results under the same limits.

Anyway, it's fair to share with you the way a handled the demands.

Some institution are willing to search into table of contents. So, they OCR the pages and paste it on MFN (by either traditional ABCD worksheet or a PHP form over IsisScript). The point is, there is a 32kb limitation documented and experienced.

I'm working with table of contents written in text files. Then, I use append method over 1300 table of contents files (such batch is created automatically by PHP), and retag, mxcp, and so on. I have a PHP script that read each table of content file and splits it if size gets over 32kb. So, I may have something like ID1132A.txt, ID1132B.txt, ID1132C etc, each one under 30kb (there are table of contents with 25 pages or even more, specially in law literature).

Working on a separate database or not, the point is, how to index into the dictionary all the table of contents once we have this 32kb limitation? We are stuck under the formatting language interpreted by mx because there is always need of using a FST. And unfortunately we can't use $$REF on it. So, we again miss 'LR' implementation here.

On FST, a possible way is (working with an auxiliary database):

505 4 ref->sum(l->sum(|IDA|v8),(v505+| |)),
505 4 ref->sum(l->sum(|IDB|v8),(v505+| |)),
505 4 ref->sum(l->sum(|IDC|v8),(v505+| |)),

Another way may be (working right from the text files):

505 4 ref->sum(l->sum(|ID|v8t),(s(cat('C:\ABCD\www\bases\book\pfts\pt\IDA',v8,'.txt')))),
505 4 ref->sum(l->sum(|ID|v8t),(s(cat('C:\ABCD\www\bases\book\pfts\pt\IDB',v8,'.txt')))),
505 4 ref->sum(l->sum(|ID|v8t),(s(cat('C:\ABCD\www\bases\book\pfts\pt\IDC',v8,'.txt')))),

On another development, I also splited each table of content paragraph into a new MFN (with PHP and IsisScript). Worked fine, retrieved fine, but on the indexing process under a bibliographic database, 'LR' is in need again. I mean, I handled huge, mass text information under ISIS technology this way, but only for retrieving and displaying purposes. We can't 'REF' it to another database without 'LR'.

To reach that goal I left ISIS solution behind from this point on, and started working with something more elastic.

So, thanks again and, if someone is aware of an alternative to 'LR' within FST, please let us know.



Leandro

________________________________
From: spinaker <spinaker at adinet.com.uy>
Sent: Thursday, October 18, 2018 4:45 AM
To: De Smet Egbert; Leandro Vicente; isis-users at iccisis.org
Subject: Re: [Isis-users] Former command "lr"

Dear Victor

You are right (in part) ...
Because of this limitation (there is no "LR" code) Guilda implemented a solution
Please, take a look in
http://abcdwiki.net/wiki/es/index.php?title=Formatos_de_salida_que_integran_la_informaci%C3%B3n_de_bases_de_datos_relacionadas
Formatos de salida que integran la información de bases de datos relacionadas

" ... Ahora bien, según establece la teoría, la función L recupera solo el mfn del primer registro localizado a través de la expresión de búsqueda suministrada, y en la relación entre la base bibliográfica y sus copias tenemos que un registro bibliográfico puede relacionarse con varios registros en la base de datos de copias.

 Ante esta situación, ABCD generó su propia forma de acceso a las relaciones de este tipo, incluyendo en la pft, como preliteral incondicional, el siguiente comando:
         /'$$REF:Base_de_datos, Formato, Expresion de búsqueda'/

etc

Regards
Ernesto Spinak



El 18/10/2018 a las 4:32, De Smet Egbert escribió:

Hi,


you are right : CISIS doesn't have that LR() function and also for me this has created lots of problems. We asked for a quotation to implement this with some ex-Bireme experts who know the CISIS-coding well but their price was exorbitantly high, so the idea was dropped.

However I solved most of my issues with a detour via the 'join=' parameter of mx, for this case operating on the same database itself (meaning : joining the database with itself...). That parameter indeed will add all occurrences pointed to by the postings of the search-key into the resulting joined record. With some extra processing (with PFT or proc) the fields added with tags 3200x can be used as normal fields.


Egbert de Smet
Universiteit Antwerpen


________________________________
From: isis-users <isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org><mailto:isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org> on behalf of Leandro Vicente <leandro_biblioteca at hotmail.com><mailto:leandro_biblioteca at hotmail.com>
Sent: Thursday, October 18, 2018 4:20 AM
To: isis-users at iccisis.org<mailto:isis-users at iccisis.org>
Subject: [Isis-users] Former command "lr"

Hi all,


I apologize in advance if this topic has been already addressed into the group list, but I didn't find anything.

In Winisis we use to have commands "l" and "lr". Apparently command "lr" was not implemented in CISIS. As we know, "l" returns the first MFN of a given key, and "lr" used to return all MFNs indexed under such key.

In FST I have
100 4 ref->author(l->author(|ID|v8),(v100+| |)),

which looks for authors in db "author". The point is, "l" returns only the first MFN with ID=X and we may have lots more.
Does anyone have a tip on how could we possibly retrieve all MFNs once "lr" is not implemented?


Thanks,


Leandro



_______________________________________________
isis-users mailing list
isis-users at iccisis.org<mailto:isis-users at iccisis.org>
To manage your own subscription options go to: http://lists.iccisis.org/listinfo/isis-users
Or contact Henk Rutten: hlrutten at xs4all.nl<mailto:hlrutten at xs4all.nl>



--
  .^.                                .^.
  ( )                                ( )
  ===                                ===
 =[=]================================[=]=
  | |  Ernesto Spinak                | |
  | |  spinaker at adinet.com.uy<mailto:spinaker at adinet.com.uy>        | |
  | |  Montevideo, Uruguay           | |
  | |  tel/fax  (598) 2622-3352      | |
  | |  celular  (598) 99612238      | |
 =[=]================================[=]=
  ===                                ===
  ( )                                ( )
   V                                  V
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20181019/fddba67e/attachment.html>


More information about the isis-users mailing list