[Isis-users] Fw: Larger database in ABCD

Egbert De Smet egbert.desmet at uantwerpen.be
Sat Apr 18 12:36:07 CEST 2020



Eustache,


having had a look at your database, I think moving the contents of your v10 (I don't know their real meaning so I just named them 'links to partners') and using REF(L)) is the best solution.

Here are the steps I used :

  1.  create a 'primary key' into v99 of gasp, by a proc putting the MFN into v99 :
mx gasp "proc='<99>',mfn(1),'</99>'" copy=gasp now -all; using the MFN itself is not safe as it can change by export/import actions
  2.  create a PFT 'v10.pft' which prints contents of v10 into a text-file with the CISIS id2i format :
'!ID 'mfn(1)/, '!099!',v99/,('!v001!',v10/,), #
  3.  dump of the v10-contents in a tagged text file :
mx gasp pft=@v10.pft now -all > partners.id
  4.  create the database partners with id2i :
id2i partners.id create=partners
  5.  create an FST 'partners.fst' for the new database, making the primary key immediately searchable : 1 5 '/PA_/', v99  (note the use of the prefix PA_ but in fact not necessary)
  6.  index that new database with that FST :
mx partners fst=@ fullinv/ansi=partners
  7.  delete the original v10 in gasp to avoid indexing problems in gasp by considerably reducing the record-size :
mx gasp "proc='d10'" copy=gasp now -all
  8.  to test how to include the links (the partners), use a PFT like this one (I named it 'link2p.pft' :
'MFN=',v99/,ref->partners(l->partners('PA_'v99),('link='v1/)) and test it :
mx gasp pft=link2p.pft, which should give you results like :

MFN=1
link=^aRedeemers Univ^cMowe^dNigeria^zuniv
..
MFN=2
link=^aNasarawa State Univ^bFac Agr^cLafia^dNigeria^zuniv
link=^aKogi State Univ^bFac Agr^cAnyigba^dNigeria^zuniv

So from now on you can include the PFT-statement
                 ref->partners(l->partners('PA_'v99),('link='v1/))
to any PFT to show your original v10, without having it inside the database and by doing so avoiding the indexing problem due to the volume of v10-contents. I tested indexing (using the BigISIS mx in Linux, but you could even try with the standard mx of ABCD to preserve incremental indexing, in Linux it worked o.k.).

So by this procedure you can avoid the indexing problem, even using the default CISIS-versions (mx and wxis). This only requires from now on to add the info which you used to enter in v10, now into a v1 of a separate database.

I am attaching the files I created except the bigger ones with the databases since you are using Windows anyway.

I am hoping this 'model' could also serve others in the ABCD-community having similar problems : make ABCD a bit relational.


Egbert de Smet
Universiteit Antwerpen


________________________________
From: Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com>
Sent: Saturday, April 18, 2020 8:22 AM
To: Egbert De Smet
Subject: TR: [Isis-users] Larger database in ABCD










====================================================================

Eustache  Mêgnigbêto

Tél. (+229)  95910242 – (+229) 21147935

09 BP 477 Saint  Michel, Cotonou (République du Bénin)

Google  Scholar :  https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023080319&sdata=SbbljtxJkSYYRYWToLChsFa1oCBuYORmz7u30vvNWw4%3D&reserved=0>

Web personnel :  http://eustachem.ilemi.net<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023080319&sdata=OYqEydcKchYe1SmWT3M%2Bt0kqfLQXXoZXJCwPAdbnOFo%3D&reserved=0>

Review  activities :  https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.com%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023090311&sdata=yS2EU5ceztHfALuaHQ8wgbXJjGNH6c4SIO2lH9Dk69U%3D&reserved=0>







De : Eustache Mêgnigbêto [mailto:eustache.megnigbeto at outlook.com]
Envoyé : samedi 18 avril 2020 07:13
À : isis-users at iccisis.org
Objet : RE: [Isis-users] Larger database in ABCD





Dear Egbert,



Records with more than 250 are stored correctly ; they can be edited and saved in ABCD.

In attachment I send the iso file, the fdt and the fst.



PS : Meanwhile, I changed the fst so that an occurrence is indexed only if its number is less than or equal to 250 ; and the IF is then created and updated.



Best,





====================================================================

Eustache  Mêgnigbêto

Tél. (+229)  95910242 – (+229) 21147935

09 BP 477 Saint  Michel, Cotonou (République du Bénin)

Google  Scholar :  https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023090311&sdata=SJNqku7G9ygBDmswGxPXxIOeA6jsNPkytO%2FfEs4nVHM%3D&reserved=0>

Web personnel :  http://eustachem.ilemi.net<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023100308&sdata=WdRz1Y%2FyHl95296mR%2BYQW7c0P5vNrqOCwdIly2QYXYA%3D&reserved=0>

Review  activities :  https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.com%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023110305&sdata=WH9jBqWjGLDSLdTM3o8guLxzuVienJiufDKm18TKFXU%3D&reserved=0>







De : Egbert De Smet [mailto:egbert.desmet at uantwerpen.be]
Envoyé : vendredi 17 avril 2020 11:23
À : Eustache Mêgnigbêto; isis-users at iccisis.org<mailto:isis-users at iccisis.org>
Objet : Re: [Isis-users] Larger database in ABCD



Eustache,



as explained earlier in other messages on this list, not the number of occurrences itself is limited, but the total size of these occurrences filling the max-recordsize of CISIS. That is the main limiting factor and also the main reason that I wanted ABCD2.x (soon 2.2) to work with other varieties of CISIS, mainly BigISIS as that one does do incremental indexing as opposed to FFI.  FFI is more aiming at 'static' databases with larger records. The max no. of records is in my humble opinion not the crucial factor : it is still '2 to the power of 24' (due to the setup of XRF), meaning more than 16 million and enough for most applications.

Now, back to your concrete problem : I see 2 options, i.e.

  1.  try to avoid such high number of occurrences (repeats of a field) by moving them to another database and use REF(L(), the semi-relations feature of ISIS. You could also consider splitting the record over more than one in the same database and using REF(L() to the database itself ('internal REF') rather than to another one.
  2.  testing the records with BigISIS. However we only have it currently only in Linux, while we might need to try to re-compile CISIS for BigISIS now that Windows 64-bits is totally 'normal' (it wasn't at the time).  That means : records up to 1Mb (but possibly still not enough) but total size of the database up to 512 Gb.

Can I ask you : do you see such a MFN with high number of occurrences (250 or more ?) to be stored correctly in the MST, only not being possible to get indexed ? That, i.e. the record itself being not too large but because of its size not indexable, is possible since for indexing some more temporary space is needed to store keys etc. in the 'virtual' ISIS-record CISIS always uses for internal manipulation. Then the problem is not the storage indeed but the indexing. By the way, in ABCD2.x we use external (and larger) text-files which still are indexed (for full-text indexing of repositories), so they are not within the record but referred to when indexing (with the parameter 'gload='). I doubt this to be a good solution, because still the virtual record is used while indexing, whether in the end the problem would be solved by doing this as opposed to having the occurrences stored within the record, but it could be worth a try.

But : if the text-files are too large they will be automatically split while referring with a key to the same 'mother record ID' in the new version of ABCD2.2. Dumping your 250+ occurrences field into an external text-file with your basic fields (title, author...) stored as Dublin Core meta-tags (preserved and stored automatically in each split record), therefore would also be possible.

If you send me a couple of such records (in ISO-format with accompanying FDT and FST), I could give it a try as a third option.





Egbert de Smet
Universiteit Antwerpen



________________________________

From: isis-users <isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org<mailto:isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org>> on behalf of Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com<mailto:eustache.megnigbeto at outlook.com>>
Sent: Friday, April 17, 2020 11:10 AM
To: isis-users at iccisis.org<mailto:isis-users at iccisis.org>
Subject: Re: [Isis-users] Larger database in ABCD



Dear Egbert,



I’m using the windows version of CISIS ffi to treat data i’ve downloaded from the web. Due to the problem with the inverted file key you drew my attention on, I limited data to import to the isis format database. So, in fact, the database is no longer « larger » as it should be.

I have 50,000 records and I continue adding new ones. Recenty, I added about 12,000 new records, but while updating the inverted file, I received back the message error fatal : fullinv/ifload. I don’t understand the meaning of this error, but I checked, using mx with dict= parameter and found that the IF was empty. Then I suspected the number of occurrences in one repeatable field. After checking, I noticed that within the new added records, some have a number of occurrences over 250.  Is such a number of occurrences the cause of the problem ?



Many thanks for your response





====================================================================

Eustache  Mêgnigbêto

Tél. (+229)  95910242 – (+229) 21147935

09 BP 477 Saint  Michel, Cotonou (République du Bénin)

Google  Scholar :  https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023110305&sdata=FnYpGg4JtmGDF9l8omsUgWA%2BX4WcPeyNcvmKxvfKVdI%3D&reserved=0>

Web personnel :  http://eustachem.ilemi.net<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023120300&sdata=JTkghWIowFgC32Xu%2BHvibjYmEe%2Fsfb9kuMZtMu%2FCkbM%3D&reserved=0>

Review  activities :  https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.com%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023130299&sdata=GcypNIP2KBPS%2BpwGHpAW2%2BwqIpJ5Tpo2Yv%2BKXNGlS5Q%3D&reserved=0>







De : Eustache Mêgnigbêto [mailto:eustache.megnigbeto at outlook.com]
Envoyé : mercredi 10 juillet 2019 09:52
À : Egbert De Smet
Objet : RE: Larger database in ABCD



Dea r Egbert,



I tried in windows with the ffi version, and it works. I will try under linux afternoon with the bigisis version.

Thank you very much.



Eustache M.



From: Egbert De Smet [mailto:egbert.desmet at uantwerpen.be]
Sent: mercredi 10 juillet 2019 08:45
To: Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com<mailto:eustache.megnigbeto at outlook.com>>
Subject: Re: Larger database in ABCD



Well, obviously the same way :

CISIS_VERSION = bigisis



Egbert de Smet
Universiteit Antwerpen



________________________________

From: Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com<mailto:eustache.megnigbeto at outlook.com>>
Sent: Wednesday, July 10, 2019 9:23 AM
To: Egbert De Smet
Subject: RE: Larger database in ABCD



Dear Egbert,



Many thanks,



And next how to activate the bigisis under linux ?



From: Egbert De Smet [mailto:egbert.desmet at uantwerpen.be]
Sent: mercredi 10 juillet 2019 08:09
To: Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com<mailto:eustache.megnigbeto at outlook.com>>; isis-users at iccisis.org<mailto:isis-users at iccisis.org>
Subject: Re: Larger database in ABCD



In the file dr_path.def of the database concerned (in its 'base'  folder) put the line :

CISIS_VERSION=ffi

You will note that with ffi no 'incremental indexing' (one record by one) will be possible, nor word-proximity searching. Better to use bigisis but that only works in Linux at this time.



Egbert de Smet
Universiteit Antwerpen



________________________________

From: isis-users <isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org<mailto:isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org>> on behalf of Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com<mailto:eustache.megnigbeto at outlook.com>>
Sent: Wednesday, July 10, 2019 8:56 AM
To: isis-users at iccisis.org<mailto:isis-users at iccisis.org>
Subject: [Isis-users] Larger database in ABCD



Dear Egbert,



I downloaded some data from the web and converted them to a text delimited format, then I used the id2i utility to convert to a ISIS database. However, I noticed that the data were too large to be handled with the standard CISIS utilities. So I used the FFI version of id2i to convert and mx to read, etc.

Now, I would like to know how to manage such a database with ABCD since the standard ABCD could not do and since in the ABCD 2.0f version, the subfolder FFI in the cgi-bin sub folder contains the necessary files ? In other words, what changes should I do in the cgi-bin folder to be able to operate the database with ABCD ?



Many thank in advance.





====================================================================

Eustache  Mêgnigbêto

Tél. (+229)  95910242 – (+229) 21147935

09 BP 477 Saint  Michel, Cotonou (République du Bénin)

Google  Scholar :  https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023130299&sdata=6VxAyFfExCb8Md5%2BhFdnyicZJPWVAcwfk8CamHooa9Q%3D&reserved=0>

Web personnel :  http://eustachem.ilemi.net<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023140289&sdata=r2oB1g0tCwuvJ68QEIldvT1Jba5%2FIvFdt3bYPB34JIU%3D&reserved=0>

Review  activities :  https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.com%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023150285&sdata=TMpVhxkPFA%2BG56nB94w5QcI4czAyNpdIs%2BcSXdy%2Be%2FI%3D&reserved=0>

[https://publons.com/media/initialcon/EM200.png]<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.com%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023150285&sdata=TMpVhxkPFA%2BG56nB94w5QcI4czAyNpdIs%2BcSXdy%2Be%2FI%3D&reserved=0>


Eustache Megnigbeto | Publons<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.com%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023160282&sdata=Umn9lpp8%2FAUcJp531L04VLgKhZvRUQjIqKpgWcR4WCw%3D&reserved=0>

publons.com

View Eustache Megnigbeto's profile on Publons with 22 publications and 17 reviews.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20200418/f774b341/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: addv99.prc
Type: application/x-mobipocket-ebook
Size: 22 bytes
Desc: addv99.prc
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20200418/f774b341/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: link2p.pft
Type: application/octet-stream
Size: 63 bytes
Desc: link2p.pft
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20200418/f774b341/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: partners.fst
Type: application/octet-stream
Size: 17 bytes
Desc: partners.fst
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20200418/f774b341/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v10.pft
Type: application/octet-stream
Size: 52 bytes
Desc: v10.pft
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20200418/f774b341/attachment-0002.obj>


More information about the isis-users mailing list