Example - InterWATER thesaurus - selecting concepts
Updated - Monday 23 October 2006
Examples
Selecting concepts / elements from a document
Abstract: “Domestic sewage (BOD5 = 288 mg/l) was allowed to settle for 5-6 hours and the supernatant was used for rearing the juveniles of the fish Cyprinus carpio in two dilutions for four months. The LC5O and BOD5 of the settled sewage were determined as 34% and 78.2 mg/l respectively. Control was run in freshwater fertilized with commercial manures and artificial feed. Growth length-weight relationship, condition factor and gastrosomatic index of the fishes grown in 10% settled sewage were comparable with those of control fishes. Ecological characteristics of different waters with special reference to productivity and the importance of low-cost settled sewage in pisciculture are discussed.”
Reading these parts of the document, we first of all list the concepts reflecting the author’s intentions, such as:
- Sewage
- LC5O
- Pisciculture
- BOD5
- Cyprinus carpio
- Growth
- Ecological characteristics
- Freshwater
- Productivity
In the document itself, we would find information, which would enable us to be more specific on some of the concepts. ‘Sewage’ is actually ‘raw domestic sewage’; ‘LC5O’ is considered in terms of the toxicity of the sewage to the fish and measured by a bioassay technique; the place where the work was carried out is Nagpur, India. Thus, our list of concepts could now be:
- Raw domestic sewage
- Pisciculture
- Bioassay
- Toxicity
- BOD5O
- Cyprinus carpio
- Growth
- Ecological characteristics
- Freshwater
- Productivity
- Nagpur, India
Refining - concepts necessary for indexing
We should next decide whether all these concepts are necessary for indexing into the system. ‘BIOASSAY’ is perhaps doubtful: the technique was indeed used, but in the text, we find the words ‘bioassay... following standard methods’. Would it be useful to a user of our system wanting information on bioassay? The answer is probably ‘No’, so we delete ‘bioassay’. ‘BOD5’ would appear to be a similar case, but we find in the text that BOD limitations are an important aspect of the study, so we retain that concept. ‘Growth’, meaning growth of the fish Cyprinus carpio, is rejected because it really refers to ‘productivity’, which is also in our list.
‘Nagpur, India’ is slightly doubtful. Locations must be indexed when knowledge of location would be helpful, such as information about a particular country or a country with similar conditions to another. Location is not useful when it refers to laboratory work that could be carried out in almost any country. We are in doubt about ‘Nagpur, India’. The title and abstract give no indication of location, and although the text does say that the work was done at Nagpur there is doubt about its relation to naturally existing conditions. In case of doubt, it is probably useful to keep the concept.
Thus, our list of concepts has been reduced by the deletion of ‘Bioassay’ and ‘Growth’. We take a final look at the document, to see whether we have followed the rule of indexing exhaustively. It occurs to us that, as the subject is the rearing of fish in sewage, we should know what those fish are eating to be productive. The answer is indeed in the text, although it is not mentioned in the abstract. We have selected ‘Ecological characteristics’, because we should be as specific as possible. This will have taken several minutes to read, but in practice, one probably arrives at the final list of concepts without writing anything during the intermediate stages, which are in reality thought processes.
Using the most specific appropriate preferred term
Now it is time to look into the thesaurus, to ‘translate’ the language used in the document into the indexing language of the descriptors. Let us take each of them in turn, and in so doing learn something about the characteristics of the InterWATER Thesaurus.
‘Raw domestic sewage’
Preferred terms are most readily located by way of the noun rather than their adjectives. Thus, we look first for SEWAGE in the alphabetical presentation; and we read its scope note: ‘This descriptor (=preferred term) should be used only for domestic sewage; otherwise see WASTEWATER’. That takes care of the ‘domestic’ in our concept. ‘Raw’ we cannot identify because SEWAGE has no NT (narrower term) such as RAW SEWAGE. We look in the thesaurus to see whether RAW occurs, and can only find the entries for RAW SLUDGE and RAW WATER. As neither of these is appropriate, we decide to use SEWAGE alone as representing the concept of ‘Raw domestic sewage’.
‘Pisciculture’
We look for this in the alphabetical presentation, and find it as a non-preferred term:
pisciculture
USE fish culture
We do not stop there: we look up FISH CULTURE itself, and see that it has a NT (narrower term) FISH FARMS, which is not applicable to our document, and three rts (related terms): ‘fish’, ‘fish feed’ and ‘fishponds’. FISH is not useful, because we have the name of a fish in our list of concepts (i.e., we can be more specific). FISHPONDS is not applicable. But FISH FEED strikes one as being useful, because that is what the document is considering. Consequently, we select FISH CULTURE and FISH FEED as indexing descriptors.
‘Toxicity’
Upon looking for this, we find it at once as a preferred term. It’s BT1 (broader term) is CHEMICAL QUALITY and BT2 is WATER QUALITY, so we know that this is the correct descriptor to use for the concept.
‘BOD5’
After first looking up BOD, we refer to BIOCHEMICAL OXYGEN DEMAND and discover no complications, so BIOCHEMICAL OXYGEN DEMAND is the preferred term for ‘BOD5’. Incidentally, BOD is not defined in the document; but if we had expressed the concept as ‘Biological oxygen demand’ instead of ‘BOD5’, and had then looked for that in the thesaurus, we would have found the entry.
biological oxygen demand
USE biochemical oxygen demand
‘Cyprinus carpio’
In the alphabetical presentation, one will find the entry
cyprinus
USE CARP
CARP will thus be the preferred term for this concept, because it has no NTs. (Most organisms in the thesaurus can be indexed only at this type of level or to a generic name, although some important pathogens can be indexed to species).
‘Algae’
Once again, this concept can be directly related to the preferred term ALGAE. This has several NTs; but, as the information in the document does not permit us to be more specific, we have to use ALGAE itself as the selected preferred term.
‘Freshwater’
This we find in the thesaurus as:
fresh water
USE RAW WATER
Upon looking at the word-block for RAW WATER, we find that it is the preferred term we need because a scope note tells us that RAW WATER is ‘untreated fresh water’:
‘Productivity’
This has a very terse word-block.
PRODUCTIVITY
rt biomass
For indexing the concept, either PRODUCTIVITY or BIOMASS would be equally appropriate, but we select the former as reflecting the actual word used for the concept. (This does, however, draw attention to the probable need to ask for both descriptors at retrieval.)
‘Nagpur, India’
Geographical names are spelled as they occur in the latest edition of the Times Atlas of the World. India, Nagpur will thus be the preferred term for this concept.
Having completed this introductory research in the thesaurus, we have derived the following preferred terms to represent the concepts identified in the document:
- SEWAGE for ‘Raw domestic sewage’
- FISH CULTURE for ‘Pisciculture’
- FISH FEED for a concept not initially identified
- TOXICITY identical with ‘Toxicity’
- BIOCHEMICAL OXYGEN DEMAND for ‘BOD5’
- CARP for ‘Cyprinus carpio’
- ALGAE identical with ‘Algae’
- RAW WATER for ‘Freshwater’
- PRODUCTIVITY identical with ‘Productivity’
- INDIA, NAGPUR for ‘Nagpur, India’
The above is a somewhat elementary example of the way to set about indexing a document with aid of this thesaurus.
Apart from the injunctions to index exhaustively (i.e., to cover all concepts that are finally decided upon as covering an author’s intentions) and specifically (i.e., at the most detailed level that the thesaurus allows, and only at that level), there is finally the question of consistency. It is unlikely that any two indexers will produce identical results, but one should at least attempt to produce consistent results. This is difficult to achieve, but it is helped by the requirement to be specific. If one used the most specific appropriate descriptors each time one indexes, then the chances are better that the indexing will be at a consistent level - and that retrieval will be more positive in its results.

