Fieldnotes and Text

Last modified by Mike on 2021/12/30 08:21

Chapter 4
Fieldnote and Textual Data

4.1    Resources for fieldnotes and other textual information

Fieldnotes are a fundamental tool for ethnographic research. The application of computers to the production and use of fieldnotes is a basic requirement if computers are to greatly benefit ethnographers.

The range of information which might be included within fieldnotes is quite diverse, ranging from notes on observations, interviews, house inventories and genealogical data to diagrams of house plans and the layout of agricultural plots (Ellen 1984:282). This chapter focuses on the use of computers to support the production and analysis of notes, interview transcripts and other textual documents - the textual components of fieldnotes. I have attempted to sidestep aspects of theory, focusing more on operations than the value of these operations, which is more or less dependent on the individual researcher. For a good overview of computer-based textual methods for qualitative ethnographic research, you should consult Pfaffenberger (1988). For a broader range of views Fielding and Lee (1991) is recommended.

Fieldnotes are, by their very nature, very cumbersome to use. They are a record of observations, narratives, and new insights, primarily organised by chronology rather than topic. Indeed, a part of their value is in this order, recording the ethnographer's development of different ideas and lines of inquiry. Although difficult to use, the fieldnotes are usually the most significant source of information both in the field and out: constantly referred to, updated, and cursed for incompleteness (cf. Jackson 1990).

A conventional method to make fieldnotes more accessible is to produce indexes, based upon classification codes and keywords chosen by the ethnographer. Although indexing fieldnotes is a favourite activity for those days when you just can not bear to face yet another difficult day, the notes are rarely indexed to satisfaction, since the indexing process itself is somewhat dependent on the stage of the investigation. (Ellen 1984; Seidel 1991).

Computer programs can enhance the use and analysis of textual documents using methods such as automatic indexing, concordances and high speed search. The main drawback is that these documents must be available to the computer and its programs in 'machine-readable' form. This basically means someone must type the manuscript into a computer. This is most easily done, of course, if a computer is available at the time of consolidating the notes or transcribing the interviews: in the field. However, paper notebooks are relatively cheap, portable, rarely malfunction and require little power. Computers are relatively expensive, generally less portable, occasionally malfunction and require rather more power. To justify these disadvantages, computers must yield significant benefits.

For many anthropologists, a small computer might be justified in the field for no other reason than to enter notes using a wordprocessor. Writing or typing fieldnotes into triplicate notebooks, as Ellen suggests (1984:282), is neither easy or fun. Once entered into a word processor, or perhaps a specialised fieldnote application, as many copies as required can be printed, merged into other documents based on subject headings or copied onto diskettes for safe deposit or mailing.

Most anthropologists who have used computers in the field take most of their rough notes using paper, entering and consolidating these onto the computer in the evening. There are still a few conditions not ideal for a using a computer capable of reliably storing fieldnotes; room temperature in the Antarctic, for example (although at least one anthropologist has used a computer there). There may be special hazards for computers in the field - dust, heat, humidity, insects and mildew, among others -but if you can use a typewriter of any description in your field conditions, then you should be able to use a computer there as well. A computer is more than a typewriter replacement. A better description is 'a typewriter, copier, filing cabinet and infinitely large wastebasket'.7

The simplest way to enter notes to type the notes with a word processor exactly as you would have written or typed them on a piece of paper (§1.3.1.1). This is the model which underlies word processors, so word processors are fairly well adapted to this particular approach. Even the most basic word processor is useful for note entry. Unfortunately, there is little you can do with a word processor alone, other than enter, edit, copy and review your notes. Pfaffenberger suggests that one of the problems with word processors is that, for the most part, these were originally developed to 'de-skill' typing, not for writing (1988:18). Word processors were certainly not developed to support ethnographic research. Other programs will be required for comprehensive search, access and analytic support.

There are alternatives to using a word processing program which may be advantageous. Text-oriented database programs (§4.6.3) usually have quite reasonable facilities for the entry and editing of text, and collect a number of operations 8 to assist in the organisation of material such as fieldnotes, as well as incorporating many text -oriented classifying, searching and collation operations (§4.6).

The most basic, and common, use of a 'computerised notebook' is to locate sections of the notes relevant to some topic of interest, replacing the more traditional indexing and eyes-over-pages methods. There are a range of partial computing solutions to these problems, some simple and some quite sophisticated. All are partial solutions, because no existing computer-based method can refer to the meaning of the notes in any useful way (even two anthropologists can have difficulty in agreeing on this). Computer programs are restricted to referencing literal structural features of the notes (Pfaffenberger 1988:41). If you type in the notes as you might with a typewriter, these structural features are limited to a model implicit in the technology of writing on paper: letters, words, lines, paragraphs, sections, pages and chapters. With such 'raw' texts most contemporary computer software is incapable of identifying structures more complex than paragraphs, or even lines, unless user-defined boundaries are indicated.

The content of a text can only be referenced by most programs in terms of explicit lexical features (ibid). In a ordinary text, most programs are limited to indexing, locating or counting specific words or phrases which appear in the notes. If you impose more discrete structure, then more useful work can be performed by the computer with programs designed to exploit that structure. Additional structure can be as simple as including distinctive classification codes and keywords as you consolidate your notes (§4.5), as is suggested for conventional fieldnotes (Ellen 1984; Sanjek 1990b; Seidel 1991), or as complex as imposing direct links between entries that you judge to be related (§4.6.5) or including formalised statements of content (§8.2; §8.3).

Computer operations for analytic reduction of notes are obviously for much more specialised use: a list of the frequencies of words in a text does not suit everyone's analytic needs. For others this is quite useful. Literary scholars have long used word frequencies to gain clues about authorship. Some have carried this even further, considering the frequencies of letters in the document. The problem with most analytic procedures for anthropological analysis of fieldnotes, is that, despite considerable noise to the contrary, fieldnotes are not a text in the same sense as a literary text 9, and the specific words and patterns of words are consequently of less interest. Unfortunately, this is all the computer can respond to unaided by additional references to structure. Pfaffenberger notes, 'In qualitative data, the significant patterns are not principally encoded in any form the computer can detect, namely in instances or absences of lexical items.' (1988:41). Most anthropological uses of analytic methods will depend on you imposing specific structure for this purpose (§4.6).

Fieldnotes serve another important purpose in anthropological analysis: to contextualise other material collected in the field, and the writing we do based our analysis of the fieldnotes. Other types of information are greatly enhanced if the note references relevant to the information can be easily retrieved (§4.4, Chapters 3 and 2). Computers are capable of representing almost all of the conventional forms of records that we make of the field; besides written records (and their structure) these include photographic material, videotape, audiotape, maps (§3.3; Chapter 5). High-performance computer environments have operations which support all of these data types, not only for display, but as objects which can be interactively annotated and interlinked (§4.6.5). For example with available hardware and software you can write notes, include audio samples or annotations in the text, and include synchronised, full-colour photographs or video sequences in the document (§ 3.3.1). If you have an inexhaustible supply of money, you can have a staggering array of data acquisition devices attached to your computer, with these inputs integrated into a single document.

4.2     Cautions and Encouragement

Many ethnographers are reluctant to trust their fieldnotes to a computer in the field, fearing 'computer wipeout' (Sanjek 1990c:38), although this view is countered by Trotter (1992:55-8). In some circumstances, especially in the past, this attitude was not altogether unjustified. For example, in my first fieldwork in Pakistan in 1982/83, due to unique circumstances I had an inadequate supply of computer diskettes (of very low capacity) with no way to get more. This factor, in conjunction with having a mains-powered computer in an area that sometimes had power and having no printer, I decided against entering my fieldnotes into my computer, though I did keep summaries and indexes of the notes on computer files. The notes might have fit, but along with other more pressing needs for diskettes, I decided I could not ensure that I could securely store my notes and have guaranteed access to them at all times.

Now, with the advent of very small, reliable, computers and printers this kind of situation should occur less often. Even if the computer fails while in the field, if you have a printer the worst outcome is that you have paper copies of your fieldnotes. Computer disk loss or failure is a more likely hazard. If you make daily copies of your notes onto diskettes with the same care as your paper notes, there is little chance of losing more than a few hours work. In some ways the data is more secure. I generally carry an extra copy of my notes on a single diskette with me at all times in the field. There is no one or two copies to be stolen, destroyed in a fire or lost.

Having said this, it is only fair to mention that everyone I have known has lost work due to carelessness at least once. The loss does seems to have a curative effect. Diskettes should be treated with a bit of care. Data loss on diskettes is almost always the result of careless treatment. This is not to say there are not real hazards, especially in many field sites and conditions 10. Diskettes should always be kept in sealed airtight boxes, exposed only when necessary. Disk drive slots should be covered with plastic tape, when not in use, to discourage dust and insects. Despite having a habit that computer disks are alleged to dislike, I have lost data on only three diskettes over 15 years of use, never in the field. One was left on the back seat of a car at 40C, another was left next to a phone which rang (erasing much of it), and the third was probably due to an airport x-ray machine (the x-rays are not really a problem… the conveyor belt and motor can be). In the third case I was able to retrieve the data using a disk utility for that purpose (§1.3.1.1). More likely hazards are losing a diskette or having some accident occur to it, such as a fire or a dousing in water. If you ensure that your notes are on more than one diskette, and that all the diskettes are not in one physical location, then you are exceedingly unlikely to lose the computer copy of your notes. The paper copy faces the same hazards as always.

Computers require power, but there are many full-featured computers which can be maintained by charging batteries with solar electricity. More specialised low-power models can operate for up to one year of daily use on perhaps 20 to 30 AA (LR6) alkaline batteries. Solid-state disks record information with security at least as good as paper. Solid-state flash memory, for example, safely stores the contents of up to 20 notebooks (350,000 words) on a chip which weighs about 20 grams and is 1.5x2.3 cm, for ten years with no power. A low-power computer useful for note taking and assistance in analysis of notes (e.g. a full-sized keyboard, two to four million characters storage capacity and plausible LCD (Liquid Crystal Diode) display) weighs as little as 1-2 kg in 1993, and can be expected to drop further. Even smaller computers which accept hand printing directly are available 11, with a weight as little as 300 grams.

4.3 Computing resources for notes

Although there are a wide range of computer-based operations which you can use to assist with entering, maintaining, accessing and analysing your notes, some will be more appropriate than others, depending on your particular approach to notes and what you use them for, and depending on how much structure and information you have added to your notes, since some computer operations will require specific conventions to do anything very useful.

You must, of course, make any judgements on the applicability of any particular operation to your research. This is more of a point than a truism. In both quantitative and qualitative approaches to analysis there has been a long history of abuse by graduate students and senior staff alike. Methods are only valid when these relate to specific analytic models; they are a part of an overall research process. Researchers constantly misuse quantitative methods such as statistics by applying inappropriate methods to inappropriate samples and making inappropriate interpretations of the results. The introduction of computers only compounds this problem by distancing researchers from the analytic processes (Johnson and Johnson_2 1990:175-6). It became possible to apply a statistical method 'because it is there', rather than for a theoretical purpose.

4.3.1 Defining contents and properties of fieldnotes

Seidel , as the author of a popular program for qualitative content analysis 12, relates his concern for computer-assisted qualitative analysis:

My concern is that qualitative data analysis might get reduced to this [data reduction], and that qualitative researchers might start working in this manner, not because it is the best or most appropriate way to proceed, but because the technology makes it easy for them to work in this way.    (1991:115)

The greatest trap in which one can be caught is to proceed with research and analysis using a singular formula, regardless of the situation to which the formula is to be applied. It is doubly important to evaluate a computer tool for its appropriateness. Moreover, computer programs generally are a part of the process, and none are total solutions, regardless of the claims within promotional literature. You must know the assumptions within the program (these are often very simple, but crucial), and match these to the results you wish to achieve.

To make productive use of computers at the core of your research, you must select computer methods that are compatible with that research. Poor results will come from the use inappropriate technique. To specify your requirements for computing resources to assist in access and analysis of fieldnotes or other textual materials, you must first identify the properties of fieldnotes and how these properties fit into your goals. In short, you specify the information in your fieldnotes, the purposes you currently use this information for, the operations you currently undertake to accomplish these purposes, and the benefits you want to achieve by use of a computer.

Although Jackson (1990) describes considerable variation in how anthropologists practice the processes, identified by Clifford (1990), of 'inscribing', 'transcribing' and 'describing' the components of fieldnotes, most anthropologists still seem to aspire more or less to the standards described by Ellen (1984) and Sanjek (1990b). The principal variation seems to focus on the word fieldnote, which Ellen avoids by referring to 'written records' and 'conventional notebooks' (1984:282). For our purposes, the exact placement of textual documents into categories is not strictly necessary. Notes, diaries, transcripts and other texts share many properties and requirements in the context of fieldwork. I shall refer collectively to these as notes or fieldnotes, except where distinctions are important.

4.3.2 Contents of fieldnotes

Ellen lists the following range of information which might appear in general notes:

(a)    Verbatim texts: words and phrases taken from informants. [with additional information to contextualize them]
(b)    Translations of verbatim statements.
(c)    Generalizations about behaviour in particular situations.
(d)    Crude summary translations from informants and actors statements; these are usually highly selective.
(e)    Descriptions of actions (or activities) perceived (for example, rituals). These are affected by our own cultural construction of what we regard to be "events".
(f)    Interpretation. Without it concise and full description is impossible.
(g)    Diagrams and illustrations in support of (a) to (e).
(h)    Quantified statements (e.g. lists of numbers), relating to (e).
            (1984:285)

This information, while varied, has one important structuring principle common to all field materials. Fieldnotes are the result of a process. They are produced over time, and have a chronological structure which is of considerable importance. The process of research has an impact on the content and use of notes. Earlier notes will often be more general in interests, more focused to the original proposal; developmental. As the research progresses, and the body of information increases, notes begin to become more focused on a smaller range of areas of observations, more detail is taken for granted, general observations are noted more briefly, literally ticked as having occurred. There is often a shift where parallel lines of reasoning become convergent, where fewer and fewer categories account for more and more cases. The chronology must be preserved, not only because it helps document the development of the particular models used to select information, but also to preserve the transition from mostly etic to mostly emic representation which will transpire in most ethnographic research.

4.3.3 Conceptual operations on fieldnotes

Some basic processes associated with the production of fieldnotes (derived from Ellen 1984 and Sanjek 1990b) are:

1.    Rough notes are taken in the course of fieldwork.
2.    The rough notes are transcribed and expanded into notes in the evening, filling in additional information from memory, tape recordings or other devices.
3.    The notes are classified, and keywords inserted.
4.    The notes are consolidated with other notes with cross-referenced.
5.    The notes are consolidated with other field data such as tapes, photographs, survey data and interview transcripts, with possibly some cross-reference.
6.    The notes may be copied and the copies stored into rough categories.
7.    The notes are indexed according to the classification in 3.

Most of these production processes are intended to facilitate the use of notes during the field research and afterward in 'writing up' (or down).Uses of notes include:

1.    Looking up matters of detail. This is time consuming, depending to some extent on keywords, and a large extent on memory and muscle.
2.    Finding fieldnotes relating to areas of classification. This is important in the formation of hypotheses, or for those who do not have hypotheses, the formation of ideas. These are located in part using the indexes for the notes, and in part using similar processes to those in A.
3.    Finding fieldnotes related by person, place or episode. Since much of what is observed are processes and many of these processes extend over weeks and months, it is often necessary to assemble these into an account of the overall process or episode. If there are cross-references, these can be activated. These notes are also located in the manner of 1.
4.    Identifying written context for interviews, surveys, genealogical records, photographs, video etc. Depends on the presence of a separate log of such materials, which can locate primary references. Secondary references located as 1.
5.    General browsing of notes with no specific objective in mind other than to reflect on what has been written, drawn or otherwise represented.

From these two lists it is clear that, content aside, most of the structural operations in the production of fieldnotes accommodate the operations associated with the use of fieldnotes. That is, the use of fieldnotes dominates the production of fieldnotes, at least in terms of our design. Added structure included in the production of fieldnotes represents an attempt to solve problems of use which emerge because of the complex contents of fieldnotes and shifts in emphasis which take place over the course of fieldwork (§4.3.2).

Even the content of fieldnotes is generally selected to be useful at some future time. Fieldnotes are not so much a conversation with ourselves as a conversion of experience into a tale with an unknown plot and no conclusion. Fieldnotes, and the writers of fieldnotes, do not create meaning, but rather, as might offer, 'defer meaning' until 'the silent tomb speaks' in the context assembled by many units which are 'the same but not identical'.

Most of the work of using fieldnotes is assembling this context. Although browsing through the notes can be a continually edifying experience, to do the work we call ethnography we must develop a set of samenesses within which we can identify differences. Thus the most common operation in the use of fieldnotes is 'Find'.

Finding relevant notes using a computer is not a trivial exercise, and requires no less structure than the manual method of use. The three basic methods which we manually use with fieldnotes are a) searching through the notes for relevant notes, b) looking up notes from an index we have been updating, and c) remembering relevant notes and using a combination of a) and b) to find them. We have the distinct advantage of being able to read the notes and to comprehend some form of overall meaning associated with this reading. At present this is not an operation available for general use on computers, but to a limited extent may become possible in the future.

Computers do possess a number of properties which makes it very easy to locate instances of literal text in the notes. This is the basic operation used, in some form, by virtually all present computer programs which support access to and analysis of documents. This suggests that most computer programs of this sort are likely to be used as a support tool, where the computer finds the literal text, and people decide how to exploit this capability, either manually or by using other supporting computer operations incorporated in the program to store or otherwise manage the results.

Relying on the basic 'Find' operation has a number of limitations. The most fundamental is that although there is a correlation between a word and a set of meanings, fieldnotes are not mere collections of words structured by writing, but are heavily contextualised, which makes 'meaning' a difficult proposition. Looking up words with a computer is a statistical exercise, where some of the notes retrieved will be 'hits' and others 'misses'. Still, with the right program a computer can find and record all instances in a matter of seconds, leaving you to work out which are useful.

The number of hits relative to misses can be increased. The two main methods are similar to methods anthropologists use with manual notes: classification codes and reference to context. Classification codes are the same as those we use in fieldnotes. Classifications attempt to denote a broad dimension of meaning which can be be applied to a unit of text in the form of a specific code word, such as 'MARRIAGE' or 'DEATH'. If you do a literal search for 'DEATH' you will pick up all notes which incorporate the code 'DEATH' within the note text. If the find operation in use is case sensitive 13, then capitalising the code terms is adequate to distinguish 'DEATH' from 'death' and 'Death'. However, when looking for literal text, 'death' and 'Death' are also considered different. Most find operations in programs make case-sensitivity optional, because most of the time it is a nuisance. A simple way to insure you can distinguish an instance of the code from an instance of the word is to prepend or append a rarely used punctuation character to the code term, e.g. 'ÞATH' 14.

Classifiers capture only one dimension of meaning, and, as research progresses you are likely to need greater specificity. You might, for example, want to examine notes which are classified with respect to both death and marriage. Conceptually, one solution is to find notes which contain the code "ÞATH', and then to search these notes to see if '%MARRIAGE' also appears. The implementation is a bit more complex.

The first problem is that we have to define the boundaries of a note in terms which are easy for most computer programs to use, a literal marker. If you type the notes as a normal text there is no purely lexical means of detecting when one note ends and the next begins. With some of the conventions (and non-conventions) that anthropologists use, such as noting place only where it changes and using half-a-dozen ways to represent dates, it is nearly impossible to write even a complex program which can always find note boundaries. Without note boundaries there is no way to determine that a given note contains two keywords. We could partially ignore this problem with a single note, because the word is in the note and if the computer shows you the word embedded in the note this is adequate in many cases 15. The simplest method of indicating notes is to mark note boundaries using a specific lexical convention. This can be something brief, such as '{' (if we are careful not to use this bracket elsewhere in the notes), or more explicit to human eyes, e.g. '{BEGIN' 16.

A second problem is related. Meanings can be correlated to a large number of different words. Any given search with a single word will miss relevant references because another word will be used. Thus there is all a 'missed' category of notes.

A third problem is related to the morphology of written language, e.g. words take different forms, sometime irregular, in different grammatical contexts. This can increase the number of missed notes.

4.4 Software requirements for fieldnote entry

There are advantages to entering notes with the wordprocessor of your choice, taking account of the recommendations in §1.3.1.1. The reason for this is not so much the virtue often given by designers of programs for note analysis, 'Most researchers already have word processors and would prefer not to learn to use another one.' (Davies 1991:62), but more an issue of flexibility. Good word processors are just not that hard to learn. Many good access and analysis programs have no (or poor) facilities for text entry, especially those written by academics, and these programs might not exist if the programmers had to replicate word processing operations in each program. Many researchers would probably not use these programs if they had to endure the kind of word processor an academic author might write.

There are alternatives to using a word processing program which can be advantageous. Text-oriented database programs (§4.6.3) usually have quite reasonable facilities for the entry and editing of text, and collect a number of operations to assist in the organisation of material such as fieldnotes, as well as providing many of the text -oriented classifying, searching and collation operations in §4.6. Most of these programs have idiosyncratic file formats which are incompatible with most other programs. If you choose this alternative, make sure that the program is capable of writing the text out into more conventional files, so that other programs can be used access and analyse your notes.

Many support programs expect input text organised into lines or paragraphs, with no special features, such as diagrams, fonts, diacritics, typestyles or special tab settings, so many of the 'advanced' features of a word processor cannot be used by these programs. These features may still be useful, since you can review and print the notes with the word processor, but if you use these features, the word processor or entry program should be able to create a file which consists of just ordinary text with no special characters or formatting.

In no case should you be worse off than if you wrote the notes into notebooks. For practical reasons, some practices such as writing in the margins, might be substituted for, especially since the text in computer form will be easy to edit. Unlike a notebook or a typed document, a computer-based document is never 'finished', despite the concern of Clifford (1990). If you are insistent about marginal notes, there are word processors which support these. Diagrams should be supported, but high quality diagrams are not still not particularly easy to draw without special hardware (§5.5).

4.5 Conventions for note entry and consolidation

Your intention to use of computers should not determine or 'control' the content of your fieldnotes in any significant way. However, most anthropologists impose considerable structure on their notes, spending considerable time classifying, coding and creating indexes for them (Ellen 1984:285-8).

There are several ways you can adapt your note entry and consolidation process to a computer. Most of the programs you might use to support your access and analysis of will expect some combination of the following six kinds of structure and/or information to be imposed on the note. These are presented in an order that tends to be roughly cumulative, but there are programs which expect these in just about any combination. You must figure out what you need, and try to find (or design) programs or combinations of programs which fill these needs.

The first kind of structure is more or less a 'conventional' note. You enter what you would have entered had you been following your usual method. This might be similar to Figure 4.1a. The advantage is that entry requirements are simple, and there are a lot of generic program types that support this kind of structure; index-making programs, concordances, keyword in context (KWIC), word frequency and simple text matching search programs, among others. The disadvantage is that the kind of information that can be reported is limited with respect to context. Because there is no computer-discernable structural information to divide the notes into units or the file into notes, the text is addressed in terms of words, lines, paragraphs and pages, which may not be adequate for your purposes (Pfaffenberger 1988:39-43).

This situation can be improved by indicating where the note begins, and where the note ends (Figure 4.1b). Although we can usually deduce this simple bit of information, most computer programmers expect you to tell them, either in a manner determined by the programmer, or more ideally, determined by yourself, designating a character or set of characters as a boundary marker, such as.typing '%END' after each fieldnote entry, or as in Figure 4.1b, '@N' for the beginning of a note. The same set of programs are used as for the conventional case. Some search programs (§4.6.3) can take advantage of this structure to allow more complex conditions for the search (two or more terms), and report in terms of more meaningful units. Some programs may require both a beginning and ending delimiter. When you use programs which need to know where note boundaries are, these programs will have a means of letting you designate the special boundary marker you employed.

Figure 4.1 (a) A conventional note (b) with lexical boundary marker

Since programs can only index and find lexical terms, you can improve the resolution of your search by explicitly marking the codes and classifiers you have chosen for your note sections. Again, we can usually distinguish between a code and the content of the note. Computer programmers usually expect you to conform to a convention which lets them identify a code in a program, or allow you to designate what the conventional indicator will be. This can be as simple as appending a character or two on either side of the code, e.g. '@Ethnic@' (Figure 4.2a), or the use of a introductory keyword, e.g. 'C: Ethnic' (Figure 4.2b). Again, the same set of programs apply, but with greater resolution (you get more of what you want, and less of what you do not want) (cf. Pfaffenberger 1988:39). Applying this level of structure lets you and the program agree on what is a code, and what is not, and you the ability to identify sections of note with code classifiers. An extension of identifying codes is to identify the scope of the code: what segment of the note does the code apply to. If you always classify whole notes, then this is equivalent to indicating note boundaries. If you associate classifiers with smaller sections of notes then a device similar to that used for note boundaries, that is, another set of conventions or special characters.

Keywords are more specific and idiosyncratic than codes, and usually designate the content of relatively small sections of a note (Sajek 1990). These are treated in the same manner as codes by most programs, although many programs permit you to selectively search for keywords or codes respectively if you use different lexical conventions to indicate keywords (Figure 4.2a), or the program provides an explicit convention for keywords (Figure 4.2b).

Some program make use of explicit fields (Figure 4.3). Fields are structured slots which identify and introduce information of a particular sort, and which programs (and programmers) can refer to and report. To take advantage of this level of structure requires a special kind of program such as a text-oriented database management system (TDBMS, §4.6.3). In a TDBMS fieldnote record you might include a 'Date' field in your notes (e.g.'Date: 21/2/93'). At a later time you can limit searches to notes that occur on a date, before a date, after a date or between two dates. Although it may be obvious to you that this is a date, without an identifying label, it will certainly not be obvious to a computer program. This sort of identifier is crucial to get best use out of many programs, since it is the only clue to some of the rough semantic content of the note. Some programs have special fields and meanings for things like date, speaker, place, id number, and have a built in notion that things identified as a date can be interpreted as a number with special meaning. Other programs permit you to designate all the fields, and you must give them a type so that any special interpretation required can be made. Most programs of this sort will include a date type.

Another kind of structure is only found in experimental programs at the moment, such as hypertext and knowledge-based representations.(Benfer et al 1991; Fischer and Finkelstein 1991). This is best characterised as explicitly encoding some aspects of the structure of the content of the notes, if not in full than at least to a significant degree. There are two main methods of doing this with material based on texts.

Hypertext (§4.6.5) is currently the more practical of the two, in the sense that good commercial software is available and more will become available, and a large amount of experimental academic software. Hypertext more or less allows you to record your understanding of how the different sections of the text fit together, so that this structure can be explicitly reproduced, instead of statistically reproduced as in the conventional index and search operations.

Knowledge-based representations involves some way of translating the natural-language text into a more formal structure which can be interpreted to the point of letting you address issues of content, entailment and association.This is intended to provide an unambiguous statement by the ethnographer regarding what they are attempting to say. This approach is discussed in §8.2.


Figure 4.2a                                             Figure 4.2b
Note with lexically coded classifiers         Note with explicitly coded                                                                                          and keywords                                     classifiers and keywords


Figure 4.3 Structured note with fields

4.6 Tools for access to notes

4.6.1    Wordprocessors

If you use a wordprocessor to enter your notes you will soon have a rather large collection of document files. In a year's fieldwork an ethnographer can easily write 1,500-3,000 pages of fieldnotes and diary entries. Most wordprocessors do not work well with thousands of pages in a single document file. With a 100 page limit per document file, 2,000 pages of fieldnotes will occupy between twenty and fifty different primary files (not all will be 'full'). If you put copies of notes into topic oriented files, another 60-150 files may be generated. Also there should be at least two independent backup copies on different disks in different locations at any one time. If you intend to use additional computer tools to access and retrieve notes, there are additional text-only (ASCII) files.

Obviously, you must plan how to organise and maintain these files. Naming files is important. Do not name files 'new notes' or 'latest notes'. Develop a system of naming primary files and copies of primary files which is clear and consistent. Use of dedicated note handling programs (Section 4.6.4), or hypertext editors (Section 4.6.5) will eliminate some of these organisational problems.

4.6.2    Basic search programs

Basic search programs are a class of programs which have existed for at least forty years. They are not specifically intended for access to notes, but more generally for locating literal text in files. They accept search terms (queries), including some range of Boolean queries (§4.1), and search through a designated set of files. Most programs support either a large number of files or very large files, often only limited in size by the general computer system.

To find all instances where the term 'marriage' is contextualised by the term 'divorce' you would start the program and type in response to the prompt something like:

FIND marriage AND divorce

Which might be read, 'find all units which contain both the terms marriage and divorce'. What will you get from this query? This depends on what the unit of co-occurrence is.

There is considerable variation among programs in this respect. Like a word processor, many use either lines or paragraphs as the unit, which does not really correspond to any usable unit for most research purposes (Pfaffenberger 1988:41-43). Worse, unlike a wordprocessor, the only context which is commonly reported is the unit itself. More advanced programs have facilities to let you designate, to some extent, both the searching unit and the reporting unit. There are two approaches to this. The more common approach is to let you designate for searching purposes how close the words in the search term must be to each other in words or lines. You might specify that divorce must be within twenty words of marriage. The reporting unit in this case might be designated as ten lines prior to and following the search terms. This adds some flexibility, but is still not really adequate.

Another, more useful, approach is to let you designate specific text which will delimit a search and reporting unit. Unless you want to use paragraphs, this usually means you must have inserted these delimiters in the files while entering them (§4.5), or you used a specialised editing program which inserts the delimiters for you.
Although this entails you taking on the task of literally imposing structure on the original note text, it has the benefit that you can retrieve meaningful units (assuming notes are meaningful units). This approach still lacks resolution for many problems with accessing notes. Although it is far better to retrieve notes as a unit rather than some arbitrary unit defined in terms of the structure of text, notes always contain an array of different sorts of information. If we want to limit searching or reporting to one of these sub-units of a note, for instance to a segment which consists of interview material, most search programs cannot accommodate this because although a delimiter may be designated, very few allow a beginning delimiter and ending delimiter, which would be necessary in this case. That is, when writing notes each type of entry in the note would require its own beginning and ending delimiter.

Another problem with most searching programs is that these both are over and under inclusive. These find search terms, usually with little reference to context, and subsequently locate entries which are of little interest (Pfaffenberger 1988:39). They look explicitly for the search terms, missing entries which are relevant, but fail to correspond to the specific terms (ibid:40-41). This problem can be limited somewhat by adding explicit classificatory terms, usually with a special keyword identifier consisting one or two unusual characters, e.g. '%%Kinship' (4.5).

Classifiers still suffer from the problems of internal note organisation. A single entry might have a number of classifiers which correspond to different sections of the notes, the same scope, a scope enclosed within that of other keywords, or overlapping scope with other keywords. Ideally a keyword could be specifically identified with researcher-specified a section of the notes, regardless of its inclusion within another, marked section, or indeed regardless of if it overlaps another existing segment. Most specialised note access programs satisfy this requirement (§4.6.4 and §4.6.5).

Some search programs search pre-computed indexes of a set of files, rather than the files themselves. Such programs are not necessarily the same as index-building programs, though some perform this operation as well. This variety of search program is not usually difficult to identify, because the distributors normally make quite a noise about its indexing capabilities. They are intended for people who produce lots of files and have trouble managing them. Most accept search terms, including Boolean queries (4.1), and search through a predetermined set of files (from the index). All support a large number of files, usually between fifty and two-hundred-fifty, and some can work with thousands. Some of these programs can be configured to automatically index any new files you may create in specific directories (which can include the entire disk).

In general, these programs find the text very quickly, often with apparently immediate response. The speed is derived from a previously constructed index of all the words in a user-specified list of files. You can usually specify a list of words not to include in the index, and common words (e.g. a, an the) are excluded automatically. This index is not usually intended for human use, and serves only to support the search function of the program. The results of a search can usually be displayed on the screen, or you can specify a file to put results into. Because the indexes are based on words, it is not easy to search for literal fragments of text which include more than one word, although this can be simulated using a Boolean search expression and a minimum search unit .Most have a provision for 'fuzzy' matches, where you can match a pattern against words, e.g. 'marri*' for all words starting with 'marri', '*marri*' for all words which include 'marri' within, such as married, marriage, remarried, unmarried .

Besides the problems for search programs in general, index-based search programs have another serious drawback. Indexing a large amount of data in a large number of files takes a lot of time. Some programs may take several hours, or even overnight, to index a body of data as large as a year's fieldnotes. The program cannot be used until the index is complete, since the index contains information necessary to find information in the files. After the fieldwork period this may be acceptable, since the notes should not be modified much. In the field the problem is more serious. Some programs can accommodate changes to one file, and are able to re-indexing it individually. Other programs must re-index everything each time a change is made to any file in the index.

The other main approach for general search programs is to scan through a designated set of files without benefit of an index. Their properties, advantages and faults are very similar to index-based searching programs. The main advantages of a scanning search program is that the text to be located can often be more flexibly specified, not limited to individual words, and because an index is not used there is no time spend building an index after changes have been made to the source documents. The main disadvantage is that scanning search programs are slower in retrieving text from the source documents. The speed of retrieval can range from very slow, perhaps a minute or two to scan a megabyte (220 or about one million characters) of text, to as little as eight seconds per megabyte. The average time for existing programs is about thirty seconds to one minute per megabyte.

4.6.3 Database management systems and fieldnotes

Database management systems (DBMS) (§2.3) are a class of programs for managing information which is relatively highly structured, compared with word processors or search programs (§4.6.2). Most conventional DBMS programs (§2.3) are too structured for use with textual materials such as notes, diaries and interview texts. However, the models underlying these programs have been applied to a form specialised for working with text.

A program of this sort is usually called a textual DBMS, a free-text DBMS or a text-oriented DBMS. The basic concept behind these programs is to combine the structure associated with a database management program with the flexibility of a search program. Unlike using a word processor, a plethora of files and a search program, many text-oriented DBMS offer a 'total solution', which basically means they include a word processor-like text entry program, operations to design entry form, facilities for managing whatever files are created on disk, support for locating, sorting and grouping information, and facilities for producing reports.

To begin you must design an entry form. With most recent DBMS this is a very simple operation. First you define some labels (sometimes called prompts) for the different categories of information, called fields, which will appear in a record (a complete entry such as a fieldnote). A entry form for fieldnotes might look something like Figure 4.4. You will probably have to specify the sort of information each field represents, e.g. Place is literal text, Age may be a number if appropriate, or literal text where other categories might be used, Date is a date (usually a specific type of data for such programs), and Time is a time (also usually a type of data). Note, Codes, and Keywords will be designated as variable text or long text, a special type which designates relatively long textual content. The usual maximum of most DBMS for the long text fields is about 30,000 characters (about 12-15 pages of notes), but some are limited only by the amount of storage on the disk. Contrast this with conventional DBMS, which typically allow between 64 and 256 characters as a maximum.

Having defined the form, you can begin entering information. The DBMS will put the form on the screen, and you fill in the appropriate blanks with the appropriate information.Features to look out for in text-oriented DBMS are:

Table 4.1. Possible entry form for fieldnotes. (adapted from Ellen 1984:286)

Note ID:    Place:
Date:    Time:
Informant Name:    Sex:    Age:    Status:
Codes:
Keywords:
Note:

1.    You should be able to enter information in any field in any order, and edit or re-enter a field if you wish. Control of where you are entering information should be easily under your control using either arrow keys or a pointing device such as a mouse, e.g. you shouldn't have to go through a complex 'conversation' or set of menus every time you want to change something..
2.    Support for text entry should be as good as a simple word processor.
3.    You should be able to examine other completed records while entering a record (not common).
4.    You should be able to change the format of a record. Otherwise you can find yourself with a lot of wasted effort. Make sure you are satisfied on this aspect before going to the field.
5.    You should be able to import data from other sources relatively easily.
6.    You should be able to export data for use by other programs.

Like the search programs (§4.6.2) text-oriented DBMS support searches using literal text, using a Boolean search expression if necessary. In addition searches can also be made by referencing the different categories in the record format. For example the following query:

SELECT fieldnote.*
WHERE fieldnote.sex = male
AND fieldnote.age < 27
AND fieldnote.note CONTAINS marriage
AND fieldnote.note CONTAINS divorce.

which can be read,

report all the fields of record type fieldnote when field sex of fieldnote is 'male' and field age of fieldnote is less than 27 and field note of fieldnote contains 'marriage' and field note of fieldnote contains 'divorce'.

We have to specify field note twice since the expression parsers for most programs are note smart enough to read an expression like 'fieldnote.note CONTAINS marriage AND divorce' in the way that we might intend. If you wanted, you could report the note number only, or any combination of fields, by using a different version of the 'Select' statement. Also some TDBMS are relational (§2.3.3), allowing queries and reports to refer to several relations or files.

In summary, text-oriented DBMS offer an integrated set of operations which are useful for many of the tasks associated with entry, access to and analysis of fieldnote material, and other textual sources. The basic search and report unit is always a record, which should consist of a single note or related text in conjunction with basic structured information such as time, place, consultant information and classificatory information about the note. Retrieval can be based on structured information such as consultant name, age or sex, unstructured information (in a computing sense), such as note fields, or a combination of the two. Retrieval can be either a single record, a group of records or a sorted group of records.

There are disadvantages, of course. Many of these programs work with a single file, which can become very large. A season's fieldnotes can easily reach over four million characters of storage. Since many of these programs make little use of indexes, they can be relatively slow with large amounts of notes.

Also, databases of this size are much too large for current floppy disks (although this limit may increase soon). This has two consequences. Firstly, these files are difficult to copy onto floppy disks (or small solid state disks) for security. You will require a special program to do so Secondly, you will almost certainly require either a hard disk drive or a large capacity solid-state disk, as it is useful to have access to all of your notes at one time. Hard disks are less suitable to some field circumstances because of increased power needs, and large capacity solid state-disk drives are expensive.

4.6.4 Programs specifically written to support access and analysis of fieldnotes

Programs explicitly designed to support access to and analysis of fieldnotes and other ethnographic textual sources are perhaps the only generic application which has emerged of disciplinary relevance to social anthropology. These program are basically text-oriented DBMS with additional operations for ethnographic research, although ethnographic research is fairly narrowly defined in this context. Anthropologists have been involved in the process (Sailer 1985, Werner 1982; Agar 1983, 1986; Pfaffenberger 1988), most programs which focus specifically on the analysis of ethnographic texts have been developed to support some form of 'strip' analysis (Agar 1986) or sociologically inspired models such as 'grounded theory' (Glasser and Strauss 1967; Richards and Richards_2 1991). When people refer to using computers for qualitative analysis, it is this species of program to which most refer. Although these programs can be useful for anthropologists, the particular formulation of ethnographic analysis these programs promote represents only a portion of what most anthropologists consider to be ethnographic analysis based on fieldnotes.

Pfaffenberger identifies three general processes to be addressed by computer methods for qualitative research; rewriting, coding and comparison (1988:26). Ellen's conventional approach is more field-oriented; consolidation, classifying, coding and indexing (1984:282-288), with a focus more on retrieval or access than comparison or analysis. In part these differences arise from a difference of naming and heading, but there is a more fundamental difference.

Ethnographic analysis based on grounded theory focuses on the discovery of codes and the creation of typologies of codes to describe and relate ethnographic segments or 'strips' (Agar 1986; Pfaffenberger 1988:27), from which emerges a framework of theory (Pfaffenberger 1988:28).

This process has some resonance with and use for social anthropology . Many of its proponents are anthropologists. However, the process is not adequate to completely define the kinds of access to fieldnotes required by most anthropologists in the field or afterwards. Anthropologists will use the capability to locate ethnographic instances coded by labels such as 'ethnic interaction', 'conflict', and 'conflicting perspectives (Pfaffenberger 1988:34). But anthropologists also require the ability to locate, to be vulgar, facts. By this I mean we must find the answers to specific questions relating to specific events and people, questions such as 'Just who is Abdul's third wife?', 'Who was that guy with Rubina', 'Who else talked about theft' or 'How many people live in that household?'.

4.6.5     Hypertext

These are points to consider when entering notes, but is it the best way to adapt to a new medium? Probably not, but it is probably the best way for an anthropologist to begin. As we shall see in other sections, the medium of a computer opens up a huge range of new ways to consider what a document is. Although we may not be aware, the use of pen and paper, typewriter and paper, or even a word processor imposes a hegemony of past technology on the manner in which we organise information. If you type, the transition to a word processor is a fairly friendly one, because word processors are intended to model typing on pieces of paper. Information is entered and displayed in a linear fashion. However, word processors are an imperfect imitation for the most part. Much more than paper, the view of the document is linear: top to bottom, page 2 to page 3. It is difficult to find operations that correspond to flipping pages. Most computers have screens that show less than half a page It is difficult to remove a page for later examination. Some word processors can come close to this level of access, but most do not. Although it may be possible to model very accurately a typewriter and paper, this is not necessarily the best manner in which to proceed with a computer. Although you may start with this method, it is not the best place to end.

The preceding programs have basically been a medium for replicating a part of what we already to with textual field material; enter and consolidate; classify and code; index; group and compare note entries based on a code or conjunction of codes; scan through for instances of specific detail. These are all ultimately based on the model of a pen and paper, and the various devices we have developed to aid us in accessing such technology.

Hypertext systems attempt to break away from this particular model by redefining the characteristics of a text. Instead of structure superimposed exclusively by different units of information sharing the coincidence of literal bits of text, hypertext is based on a conceptual model where units of information are explicitly linked to other units of information. Where the conventional model is more or less a passive statistical approach to search and access (some of the items will fit your needs, others do not), depending on regularity of form, hypertext extends this with active mechanical links for access, and can deal with quite idiosyncratic forms.Hypertext attempts to provide a medium where information can be stored embedded within a structure which is intrinsically non-linear. That is, a document which consists of units of information which can be accessed in many explicit orders, depending on the particular 'thread' the reader is interested in. Units of information are called nodes, and are defined very broadly. They can include text, tables, drawings, photographs, video and sound, programs, as well as complexes of different node types (§2.3.4); §3.3).

On the surface this sounds ideal for material such as fieldnotes. One of the basic objects of fieldnotes as a means of storing information is that when we come to analyse this material each piece of information will fit into many different structures; we do not want to tie the information to any specific form or model. As we view and review our material we begin to build different frameworks which order the material in the notes. Hypertext provides a means to record structural knowledge so that not only can we record these different views, but we can record the derivation of more abstract views (or analysis) as well.

Little about hypertext is 'automatic'. Hypertexts are 'authored', and the authoring process must be done by someone who is familiar with the material included in the hypertext. In the context of fieldnotes, authoring will consist of typing an entry (called a node) using a text editor provided by the hypertext authoring environment. This node will probably be in the last position of a chronological thread,. so that there is a thread which include the notes in chronological order. Indexing and coding can done by linking the node to a node which heads the thread for each index and code term. If a new thread for classification is required, you can begin one. If you need to merge existing threads into a super-category, then a new head node can be created to contain these, while retaining the individual threads. Individual words and phrases can be linked to nodes also. See Figure 4.4 for an example of an hypertext based field note editor.

[Figure 4.5 about here. Fieldnote scan program]

Tags:

Fieldnotes and Text

My Recent Modifications

Need help?