[Isis-users] Fw: Larger database in ABCD
Eustache Mêgnigbêto
eustache.megnigbeto at outlook.com
Sun Apr 19 21:45:27 CEST 2020
Dear Egbert,
Many thanks.
I followed the steps you gave below and the results are conform to what I
wanted.
Thanks once again.
Regards,
***
Eustache Mêgnigbêto
De : isis-users
[mailto:isis-users-bounces+eustache.megnigbeto=outlook.com at iccisis.org] De
la part de Egbert De Smet
Envoyé : samedi 18 avril 2020 11:36
À : isis-users
Objet : [Isis-users] Fw: Larger database in ABCD
Eustache,
having had a look at your database, I think moving the contents of your v10
(I don't know their real meaning so I just named them 'links to partners')
and using REF(L)) is the best solution.
Here are the steps I used :
1. create a 'primary key' into v99 of gasp, by a proc putting the MFN
into v99 :
mx gasp "proc='<99>',mfn(1),'</99>'" copy=gasp now -all; using the MFN
itself is not safe as it can change by export/import actions
2. create a PFT 'v10.pft' which prints contents of v10 into a text-file
with the CISIS id2i format :
'!ID 'mfn(1)/, '!099!',v99/,('!v001!',v10/,), #
3. dump of the v10-contents in a tagged text file :
mx gasp pft=@v10.pft <mailto:pft=@v10.pft> now -all > partners.id
4. create the database partners with id2i :
id2i partners.id create=partners
5. create an FST 'partners.fst' for the new database, making the
primary key immediately searchable : 1 5 '/PA_/', v99 (note the use of the
prefix PA_ but in fact not necessary)
6. index that new database with that FST :
mx partners fst=@ fullinv/ansi=partners
7. delete the original v10 in gasp to avoid indexing problems in gasp
by considerably reducing the record-size :
mx gasp "proc='d10'" copy=gasp now -all
8. to test how to include the links (the partners), use a PFT like this
one (I named it 'link2p.pft' :
'MFN=',v99/,ref->partners(l->partners('PA_'v99),('link='v1/)) and test it :
mx gasp pft=link2p.pft, which should give you results like :
MFN=1
link=^aRedeemers Univ^cMowe^dNigeria^zuniv
..
MFN=2
link=^aNasarawa State Univ^bFac Agr^cLafia^dNigeria^zuniv
link=^aKogi State Univ^bFac Agr^cAnyigba^dNigeria^zuniv
So from now on you can include the PFT-statement
ref->partners(l->partners('PA_'v99),('link='v1/))
to any PFT to show your original v10, without having it inside the database
and by doing so avoiding the indexing problem due to the volume of
v10-contents. I tested indexing (using the BigISIS mx in Linux, but you
could even try with the standard mx of ABCD to preserve incremental
indexing, in Linux it worked o.k.).
So by this procedure you can avoid the indexing problem, even using the
default CISIS-versions (mx and wxis). This only requires from now on to add
the info which you used to enter in v10, now into a v1 of a separate
database.
I am attaching the files I created except the bigger ones with the databases
since you are using Windows anyway.
I am hoping this 'model' could also serve others in the ABCD-community
having similar problems : make ABCD a bit relational.
Egbert de Smet
Universiteit Antwerpen
_____
From: Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com
<mailto:eustache.megnigbeto at outlook.com> >
Sent: Saturday, April 18, 2020 8:22 AM
To: Egbert De Smet
Subject: TR: [Isis-users] Larger database in ABCD
====================================================================
Eustache Mêgnigbêto
Tél. (+229) 95910242 (+229) 21147935
09 BP 477 Saint Michel, Cotonou (République du Bénin)
Google Scholar :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.g
oogle.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.
desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8e
af72202548136ef6%7C0%7C0%7C637227879023080319&sdata=SbbljtxJkSYYRYWToLChsFa1
oCBuYORmz7u30vvNWw4%3D&reserved=0>
https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr
Web personnel :
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.
ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba
2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C63722787902308
0319&sdata=OYqEydcKchYe1SmWT3M%2Bt0kqfLQXXoZXJCwPAdbnOFo%3D&reserved=0>
http://eustachem.ilemi.net
Review activities :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.c
om%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01
%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08
fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023090311&sdata=yS2EU5ceztHfAL
uaHQ8wgbXJjGNH6c4SIO2lH9Dk69U%3D&reserved=0>
https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/
De : Eustache Mêgnigbêto [mailto:eustache.megnigbeto at outlook.com]
Envoyé : samedi 18 avril 2020 07:13
À : isis-users at iccisis.org <mailto:isis-users at iccisis.org>
Objet : RE: [Isis-users] Larger database in ABCD
Dear Egbert,
Records with more than 250 are stored correctly ; they can be edited and
saved in ABCD.
In attachment I send the iso file, the fdt and the fst.
PS : Meanwhile, I changed the fst so that an occurrence is indexed only if
its number is less than or equal to 250 ; and the IF is then created and
updated.
Best,
====================================================================
Eustache Mêgnigbêto
Tél. (+229) 95910242 (+229) 21147935
09 BP 477 Saint Michel, Cotonou (République du Bénin)
Google Scholar :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.g
oogle.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.
desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8e
af72202548136ef6%7C0%7C0%7C637227879023090311&sdata=SJNqku7G9ygBDmswGxPXxIOe
A6jsNPkytO%2FfEs4nVHM%3D&reserved=0>
https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr
Web personnel :
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.
ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba
2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C63722787902310
0308&sdata=WdRz1Y%2FyHl95296mR%2BYQW7c0P5vNrqOCwdIly2QYXYA%3D&reserved=0>
http://eustachem.ilemi.net
Review activities :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.c
om%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01
%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08
fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023110305&sdata=WH9jBqWjGLDSLd
TM3o8guLxzuVienJiufDKm18TKFXU%3D&reserved=0>
https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/
De : Egbert De Smet [mailto:egbert.desmet at uantwerpen.be]
Envoyé : vendredi 17 avril 2020 11:23
À : Eustache Mêgnigbêto; isis-users at iccisis.org
<mailto:isis-users at iccisis.org>
Objet : Re: [Isis-users] Larger database in ABCD
Eustache,
as explained earlier in other messages on this list, not the number of
occurrences itself is limited, but the total size of these occurrences
filling the max-recordsize of CISIS. That is the main limiting factor and
also the main reason that I wanted ABCD2.x (soon 2.2) to work with other
varieties of CISIS, mainly BigISIS as that one does do incremental indexing
as opposed to FFI. FFI is more aiming at 'static' databases with larger
records. The max no. of records is in my humble opinion not the crucial
factor : it is still '2 to the power of 24' (due to the setup of XRF),
meaning more than 16 million and enough for most applications.
Now, back to your concrete problem : I see 2 options, i.e.
1. try to avoid such high number of occurrences (repeats of a field) by
moving them to another database and use REF(L(), the semi-relations feature
of ISIS. You could also consider splitting the record over more than one in
the same database and using REF(L() to the database itself ('internal REF')
rather than to another one.
2. testing the records with BigISIS. However we only have it currently
only in Linux, while we might need to try to re-compile CISIS for BigISIS
now that Windows 64-bits is totally 'normal' (it wasn't at the time). That
means : records up to 1Mb (but possibly still not enough) but total size of
the database up to 512 Gb.
Can I ask you : do you see such a MFN with high number of occurrences (250
or more ?) to be stored correctly in the MST, only not being possible to get
indexed ? That, i.e. the record itself being not too large but because of
its size not indexable, is possible since for indexing some more temporary
space is needed to store keys etc. in the 'virtual' ISIS-record CISIS always
uses for internal manipulation. Then the problem is not the storage indeed
but the indexing. By the way, in ABCD2.x we use external (and larger)
text-files which still are indexed (for full-text indexing of repositories),
so they are not within the record but referred to when indexing (with the
parameter 'gload='). I doubt this to be a good solution, because still the
virtual record is used while indexing, whether in the end the problem would
be solved by doing this as opposed to having the occurrences stored within
the record, but it could be worth a try.
But : if the text-files are too large they will be automatically split while
referring with a key to the same 'mother record ID' in the new version of
ABCD2.2. Dumping your 250+ occurrences field into an external text-file with
your basic fields (title, author...) stored as Dublin Core meta-tags
(preserved and stored automatically in each split record), therefore would
also be possible.
If you send me a couple of such records (in ISO-format with accompanying FDT
and FST), I could give it a try as a third option.
Egbert de Smet
Universiteit Antwerpen
_____
From: isis-users <isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org
<mailto:isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org> > on behalf
of Eustache Mêgnigbêto <eustache.megnigbeto at outlook.com
<mailto:eustache.megnigbeto at outlook.com> >
Sent: Friday, April 17, 2020 11:10 AM
To: isis-users at iccisis.org <mailto:isis-users at iccisis.org>
Subject: Re: [Isis-users] Larger database in ABCD
Dear Egbert,
Im using the windows version of CISIS ffi to treat data ive downloaded
from the web. Due to the problem with the inverted file key you drew my
attention on, I limited data to import to the isis format database. So, in
fact, the database is no longer « larger » as it should be.
I have 50,000 records and I continue adding new ones. Recenty, I added about
12,000 new records, but while updating the inverted file, I received back
the message error fatal : fullinv/ifload. I dont understand the meaning of
this error, but I checked, using mx with dict= parameter and found that the
IF was empty. Then I suspected the number of occurrences in one repeatable
field. After checking, I noticed that within the new added records, some
have a number of occurrences over 250. Is such a number of occurrences the
cause of the problem ?
Many thanks for your response
====================================================================
Eustache Mêgnigbêto
Tél. (+229) 95910242 (+229) 21147935
09 BP 477 Saint Michel, Cotonou (République du Bénin)
Google Scholar :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.g
oogle.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.
desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8e
af72202548136ef6%7C0%7C0%7C637227879023110305&sdata=FnYpGg4JtmGDF9l8omsUgWA%
2BX4WcPeyNcvmKxvfKVdI%3D&reserved=0>
https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr
Web personnel :
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.
ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba
2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C63722787902312
0300&sdata=JTkghWIowFgC32Xu%2BHvibjYmEe%2Fsfb9kuMZtMu%2FCkbM%3D&reserved=0>
http://eustachem.ilemi.net
Review activities :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.c
om%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01
%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08
fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023130299&sdata=GcypNIP2KBPS%2
BpwGHpAW2%2BwqIpJ5Tpo2Yv%2BKXNGlS5Q%3D&reserved=0>
https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/
De : Eustache Mêgnigbêto [mailto:eustache.megnigbeto at outlook.com]
Envoyé : mercredi 10 juillet 2019 09:52
À : Egbert De Smet
Objet : RE: Larger database in ABCD
Dea r Egbert,
I tried in windows with the ffi version, and it works. I will try under
linux afternoon with the bigisis version.
Thank you very much.
Eustache M.
From: Egbert De Smet [ <mailto:egbert.desmet at uantwerpen.be>
mailto:egbert.desmet at uantwerpen.be]
Sent: mercredi 10 juillet 2019 08:45
To: Eustache Mêgnigbêto < <mailto:eustache.megnigbeto at outlook.com>
eustache.megnigbeto at outlook.com>
Subject: Re: Larger database in ABCD
Well, obviously the same way :
CISIS_VERSION = bigisis
Egbert de Smet
Universiteit Antwerpen
_____
From: Eustache Mêgnigbêto < <mailto:eustache.megnigbeto at outlook.com>
eustache.megnigbeto at outlook.com>
Sent: Wednesday, July 10, 2019 9:23 AM
To: Egbert De Smet
Subject: RE: Larger database in ABCD
Dear Egbert,
Many thanks,
And next how to activate the bigisis under linux ?
From: Egbert De Smet [ <mailto:egbert.desmet at uantwerpen.be>
mailto:egbert.desmet at uantwerpen.be]
Sent: mercredi 10 juillet 2019 08:09
To: Eustache Mêgnigbêto < <mailto:eustache.megnigbeto at outlook.com>
eustache.megnigbeto at outlook.com>; <mailto:isis-users at iccisis.org>
isis-users at iccisis.org
Subject: Re: Larger database in ABCD
In the file dr_path.def of the database concerned (in its 'base' folder)
put the line :
CISIS_VERSION=ffi
You will note that with ffi no 'incremental indexing' (one record by one)
will be possible, nor word-proximity searching. Better to use bigisis but
that only works in Linux at this time.
Egbert de Smet
Universiteit Antwerpen
_____
From: isis-users <
<mailto:isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org>
isis-users-bounces+egbert.desmet=ua.ac.be at iccisis.org> on behalf of Eustache
Mêgnigbêto < <mailto:eustache.megnigbeto at outlook.com>
eustache.megnigbeto at outlook.com>
Sent: Wednesday, July 10, 2019 8:56 AM
To: <mailto:isis-users at iccisis.org> isis-users at iccisis.org
Subject: [Isis-users] Larger database in ABCD
Dear Egbert,
I downloaded some data from the web and converted them to a text delimited
format, then I used the id2i utility to convert to a ISIS database. However,
I noticed that the data were too large to be handled with the standard CISIS
utilities. So I used the FFI version of id2i to convert and mx to read, etc.
Now, I would like to know how to manage such a database with ABCD since the
standard ABCD could not do and since in the ABCD 2.0f version, the subfolder
FFI in the cgi-bin sub folder contains the necessary files ? In other words,
what changes should I do in the cgi-bin folder to be able to operate the
database with ABCD ?
Many thank in advance.
====================================================================
Eustache Mêgnigbêto
Tél. (+229) 95910242 (+229) 21147935
09 BP 477 Saint Michel, Cotonou (République du Bénin)
Google Scholar :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.g
oogle.com%2Fcitations%3Fuser%3DxQk_UhwAAAAJ%26hl%3Dfr&data=02%7C01%7Cegbert.
desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08fb2d544a8e
af72202548136ef6%7C0%7C0%7C637227879023130299&sdata=6VxAyFfExCb8Md5%2BhFdnyi
cZJPWVAcwfk8CamHooa9Q%3D&reserved=0>
https://scholar.google.com/citations?user=xQk_UhwAAAAJ&hl=fr
Web personnel :
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Feustachem.
ilemi.net%2F&data=02%7C01%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba
2b6f08d7e360df7a%7C792e08fb2d544a8eaf72202548136ef6%7C0%7C0%7C63722787902314
0289&sdata=r2oB1g0tCwuvJ68QEIldvT1Jba5%2FIvFdt3bYPB34JIU%3D&reserved=0>
http://eustachem.ilemi.net
Review activities :
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.c
om%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01
%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08
fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023150285&sdata=TMpVhxkPFA%2BG
56nB94w5QcI4czAyNpdIs%2BcSXdy%2Be%2FI%3D&reserved=0>
https://publons.com/researcher/503109/eustache-megnigbeto/peer-review/
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.c
om%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01
%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08
fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023150285&sdata=TMpVhxkPFA%2BG
56nB94w5QcI4czAyNpdIs%2BcSXdy%2Be%2FI%3D&reserved=0>
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpublons.c
om%2Fresearcher%2F503109%2Feustache-megnigbeto%2Fpeer-review%2F&data=02%7C01
%7Cegbert.desmet%40uantwerpen.be%7Cea295787a8d74eba2b6f08d7e360df7a%7C792e08
fb2d544a8eaf72202548136ef6%7C0%7C0%7C637227879023160282&sdata=Umn9lpp8%2FAUc
Jp531L04VLgKhZvRUQjIqKpgWcR4WCw%3D&reserved=0> Eustache Megnigbeto | Publons
publons.com
View Eustache Megnigbeto's profile on Publons with 22 publications and 17
reviews.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.iccisis.org/pipermail/isis-users/attachments/20200419/8b1f4359/attachment.html>
More information about the isis-users
mailing list