by Steve Mizrach

The Computer: an Anthropologist's Friend?

Most anthropologists do not tend to look upon the computer as a research tool. Apart from utilizing it to word process documents (journal articles) or send electronic mail to colleagues, anthropologists generally do not look upon the computer as a fundamental instrument in their research, unless they are doing quantitative research. This is because for most of them the computer is thought of as a number-crunching device (which is the literal meaning of computation.) It is used to statistically analyze (after the researcher comes 'home') various measurements garnered from surveys and observations in the field. What I intend to argue in this paper is that due to its versatility, the computer can be used just as effectively for qualitative research.

The increasing recognition of qualitative research as a systematic activity, and the development of software specifically designed for qualitative researchers (such as the Ethnograph), has led to some more academic attention to computers as an adjunct for qualitative methods. The kinds of qualitative data that researchers are interested in (music, video, text, speech, and other kinds of symbolic communication) have increasingly become digitizable. Further, as computers have become more lightweight, portable, and rugged (laptops and PDAs) and their communications capability has increased, they've become more useful for more than just crunching data after researchers come back from the field. They are changing the very way qualitative research is done.

Indeed, the computer may help answer some of the key dilemmas in qualitative methodology, and also can be used in all phases of a research design, from data discovery to presentation. While some qualitative researchers have begun discussing its use in the analysis phase, other phases have been neglected, and I discuss them here. What makes the computer a unique tool for the researcher is its reprogrammability: the only limits are in the hardware, and even that can be improved. Other functions of the computer are in the process of being developed, and this paper will also explore ways in which in the future the computer may become less of a clerk or accountant, and more of an active research assistant. Some day some anthropologists may even come to think of it as a friend.

Data Discovery: Online Databases, Online Ethnography

While many researchers have come around to using the computer as a tool for qualitative data analysis and management, fewer are using at as a source of data. This is largely due to the fact that, until recently, there have not existed sufficient means for locating and transporting data. Before the existence of wide-area networks, if you wanted data you had to order it on bulky tapes and other forms of storage that were difficult to index, search, and sort through rapidly. With the growth of the Internet, the emergence of networked quantitative databases (census data, agricultural information, climate measurements, business and marketing data, etc.) has grown rapidly, as has the accessibility of online library catalogues. Unfortunately for qualitative researchers, there is a lack of qualitative databases. Text archives are multiplying (such as the Project Gutenberg online text archive), but they tend to be simply online scans of classic literature, 'great books,' and key documents (such as the U.S. Constitution) - not transcribed interviews, speeches, or other informal texts. (Hudson and Atkinson1990.)

Fortunately, there is at least one text archive which anthropologists interested in cross-cultural comparison are now making use of - the Human Relations Area Files, or HRAF. They are a collection of ethnographies on most of the human societies on Earth, coded with a standardized coding system that assures inter-coder reliability. HRAF used to be a mostly paper and microfiche, but since 1987 it has been digitized and at least a small sample of cultures (sixty) are available now on CD-ROM. It is hoped that in the near future the maintainers of the HRAF will make the entire archive available on the Internet. The HRAF makes it possible to test cross-cultural hypotheses about the co-occurrence of the cultural phenomena embodied in its codes. But it's not truly a source of data discovery per se, because it consists of OPE (other peoples' ethnographies) which are not being continually updated, and has no textual data in the "raw." (Werner 1987.)

The main problem for qualitative research is not lack of text, though. With the explosion of text on Usenet, bulletin boards, mailing lists, and the World Wide Web, a researcher interested in collecting text unobtrusively on any possible topic would have almost no difficulty at all. (Pfaffenberger 1988.) New search interfaces make it possible to, for example, find every posting on "computer hackers" or "computer underground" and "phreaking" but not "hardware" on Usenet that has appeared for the last few days. The additional problem is that so much of this textual data is ephemeral. Not all of it is archived, so a person can gather everything written about a topic during the period they are paying attention to a mailing list - but they might not be able to see how attitudes toward that topic have changed over several years. People routinely put up sites with text on the Web - and then take those texts down when the site moves or is shut down. Data storage for the plenitudes of material posted to the Net (billions of words appear on the 5000 Usenet newsgroups each day), while inexpensive, isn't free, and the cost for archiving it all could be astronomical. (Fielding and Lee 1991.)

And while there may not be a surfeit of (current) text or quantitative data, databases of other types of networked qualitative data are quite small. Image and photo archives (of interest to visual anthropologists) are small and only starting to appear; unfortunately many are commercial photographs aimed at desktop publishers and multimedia designers, not researchers. They are composed of "staged" creative photos, not documentary photos of "life in the raw." There are no truly comprehensive audio (say, ethnomusicological archives of various cultures' mortuary songs) or video (say, all fictional and documentary films dealing with homosexuality in the 1950s) archives available through the Internet for downloading and searching. In part, this is because the technology for digitizing such material is quite new, and because these types of data require much more storage (even small video segments can take up more than a single floppy disk.) Beyond that, the hardware requirements for downloading (and decompressing and decoding) and viewing/hearing such material can be more extensive.

Worst of all, there doesn't exist any nonlinear search method, other than some new programs which allow a person to code video segments and then retrieve them for playback. There is no coding-and-search technique for music at this time, although one way to do this might be to allow a researcher to code segments of the audio stream, just as is done with video. With spoken-word audio transcripts, this may be quite a bit easier - and might capture some elements of intonation and expression lost in transcribing speech into text. Hypermedia is helping to deal conceptually this problem, allowing for linking between different kinds of media, and nonlinear playback. Still, the researcher must manually do the looking and listening, followed by coding. No program will retrieve all music samples that contain notes from a particular instrument, or respond to a request to "locate all video that documents or depicts a Zoroastrian fire ceremony." (Jones 1995.)

For truly original data discovery, computers are going to have to be seen as a tool for doing ethnography, and not just uncovering the data gathered by others or unobtrusively collecting posted material. This requires some reconception of many anthropological concepts, especially what it means to "gain entrance" to and "leave" "the field." (Kelle 1995.) Many researchers have attempted to clumsily use mailing lists and newsgroups for distributing poorly designed surveys and questionnaires. This results in few people responding, or responding appropriately. Others have attempted to use personal email for conducting interviews, but since it is an asynchronous communication mode, it loses out on the interactivity that an unstructured interview might have - you cannot modify the questions in their interview based on the person's answers. "Chat" software allows for real-time interactive conversation, but the only main advantage for it for researchers over a phone call might be that in most cases it's free, it can allow groups of respondents to respond simultaneously without drowning each other out, and what's said is already in text form, which can be captured along the way. (Anderson and Brent 1990.)

Teleconferencing might also some day restore some of the things lost in an IRC or email interview - such as the expression and appearance of the participants. But the researcher will still lack those hundreds of subtle paralinguistic cues that come from in-person interviewing - how the person sits, gestures, etc. (Chesebro and Bonsall 1989.) Also, what does it mean to do "participant observation" on the Net? Clearly, there are various sorts of rituals, jargons, and quirks emerging among net subcultures, and for some researchers, participating in these is enough. For other researchers, it means that one has to engage in some degree of IRL/F2F (in real life, face to face) contact with their informants, especially if this is part of their subculture -- but this can be expensive and impractical considering they are not all likely to be in the same place (unless the conferencing system is local.) Online ethnography can be a viable data discovery technique for qualitative researchers, but they need to start thinking about these questions.

Data Analysis and Management

Data can be gathered through the computer, through the use of online databases or ethnography, but in most cases, it is usually gathered through "low-tech" methods of tape recording, shorthand transcription, and typing out. This thus requires the intermediate step of entering all this data into the computer, before the analysis step can begin. Fortunately, there have been some advances in text recognition (OCR, or Optical Character Recognition) which may allow scanners to take out some of the drudgery of this work. What many qualitative researchers hope for next is voice recognition and audio transcription - you play interviews into a microphone, and then they appear on screen. This task, unfortunately, is deceptively hard, and may not be solved by programmers for another decade or more. But qualitative researchers looking to save time hope for it eagerly.

Assuming one's data is in electronic form, the next step is to sift through it, reduce it, sort it, manage it, analyze it, and reveal patterns. It is here where qualitative researchers have already begun developing a corpus of knowledge and software for the last decade. Existing software can already do some of the critical tasks for qualitative research. Garden-variety word processors can count words, use "macros" for tagging and retrieving, and cut and paste related text segments together. (Bernard 1994.) Spreadsheets can be used for ethnosemantic categorizing; flowcharts and diagrams can be created in drawing and drafting programs; database systems like FoxPro aimed at business users still do a fine job in managing ethnographic information. Still, qualitative researchers have specialized needs, and fortunately, even though they are a small market, there has begun to appear a series of programs specifically designed with those needs and functions in mind. (Brown 1990.)

These programs are still in an early development phase, and due to the fact that the qualitative research community is far smaller than the business community, programs like Ethnograph and Kwalitan have far less support, less documentation, fewer modes for user feedback, poorer interfaces, and longer update periods than programs such as, say, Lotus 1-2-3. They tend to be written by lone researchers, not teams of professional programmers, and so tend to be plagued by incompatibilities, bugs, and glitches. Most are not released cross-platform (for Macintosh OS, for DOS, for all versions of Windows OS, or for all varieties of UNIX OS) and thus may only run on one kind of microcomputer; worse, they lack facilities for sharing data with any other kind of computer. Some are not even commercial software and are instead made available as shareware or free in the public domain. The update schedule and support for such programs can be minimal or nonexistent (especially if they are orphans) and some have not changed since they were released.

Barring these limitations, these programs can be an invaluable asset to researchers. Weitzmann and Miles divide qualitative software (although it should be borne in mind that not all of these programs are designed by or for the qualitative research community solely) into six essential categories: text retrieval; textbase managers; code-and-retrieve programs; code-based theory-builders; and conceptual network-builders. (Miles and Weitzmann 1995.) These program types are not mutually exclusive, and some "all-in-one" programs have elements of each -- although it should also be noted that none of them do all of them well and comprehensively. Some researchers, like H.R. Bernard, would like to identify a sixth separate category - programs which generate quantitative output (i.e. matrices subject to statistical analysis) from qualitative data. (Bernard 1994.) However, the programs so far that do this were never designed to do it from the outset; and as with many "kludges" it doesn't work quite right.

Text retrieval programs count words, gather texts, and search for words or "strings" of characters. There is little that they do that word processors don't - except for the fact that they do it more efficiently, and maybe a little better. Textbase managers keep text together in a systematic, ordered manner (in records and fields), and are good at sorting text into subsets for comparison and contrast or search and retrieval. They are basically better than garden-variety database programs because they are more oriented toward text and its properties. Code-and-retrieve programs allow a person to mark text segments with keywords according to their significance, and then later retrieve all the "chunks" that occur under a certain code. Codes allow the researcher to identify similar phenomena in a text, even if it's not described in the same exact words; hence these programs allow more flexibility than textbase managers. (McCarty and Podolefsky 1983.)

Grounded theory-builders go beyond simple code-and-retrieve programs to allow researchers to conceptually organize and categorize their codes; annotate their data with memos; examine conceptual relations; extend their coding schemes; and even test hypotheses about their data. These programs have much more power than simple code-and-retrieve, but their organizational and conceptual schemes for theory-building also may impose more pre-existent limitations. The last category of software, conceptual network-builders, almost falls into the realm of data presentation, except for the fact that these programs create provisional network output for the researcher to "play with" themselves and manipulate, rather than prepare for display for others. Conceptual network-builders allow people to build and test theory through semantic networks of nodes and links. The best of these programs allow the person to truly work with concepts qualitatively - moving around elements in the network to see how the rest of the network changes, representing semantic difference through spatial separation, allowing links of variable strength and directionality. (Carley 1992.)

All of these software are oriented toward the qualitative analysis of qualitative data. They keep the person "close" to their texts and materials. Bernard finds them unsatisfactory, however, because while they do some rudimentary numerical functions (word counting), most don't do thorough quantitative data crunching or even at a minimum provide output suitable for same. The theory-builders facilitate systematic thinking about qualitative data, but don't isolate variables or identify the level of variation they contribute to outcomes. Concepts can be identified, but not measured and contrasted according to quantitative criteria. For these reasons, some critics claim qualitative software does not do "real" data analysis. These complaints should be filed in the same waste basket as others that claim qualitative research isn't "good" without a quantitative component. Basically, qualitative analysis software allows the same protections that quantitative programs do - against bias and misjudgment. So these programs deserve equal recognition.

Data Presentation: Visualization, Simulation

So the computer can be a tool for gathering and analyzing qualitative data. What about presenting it? Once again, most researchers have only used the capabilities of their computers to present their data in limited ways. They have thought of the computer as only part of the analysis phase. They write up word-processed documents with small snippets of spreadsheets or matrices or tables. Maybe they include some diagrams, charts, or graphs generated by other programs. If they are quantitatively oriented, they may include the statistical tests they used to eliminate variables. If interested in network analysis, they may include networks. Raw data may be included, perhaps in the form of field notes or informants' transcripts, line-numbered and coded. Photos and illustrations may appear in the margins. Researchers interested in multimedia may take key conclusions and observations and make neat quasi-interactive PowerPoint presentations out of them, instead of distributing them in print. But even they are not taking advantage of the full potential of the computer to present data in new and "paradigm-challenging" ways.

The capacity for the computer to produce scientific visualization is producing a "new paradigm" in the physical sciences, suggest some authors. (Glazier and Powell 1992.) Because of the movement from static tables to dynamic visualization (fractals, bifurcation trees, virtual reality, etc.) in the sciences, they point out, researchers are beginning to gain a greater appreciation of phenomena as being complex, heterarchical, indeterminate, "holographic," mutually causal, perspective-dependent, and continuously evolving. What the two bemoan is the fact that while computer visualization as a tool is pushing the sciences in this direction, the lack of its use among qualitative researchers has them emulating the "old paradigm" (positivism) of the physical sciences. Visualization is not just an aesthetically pleasing way of presenting data, they suggest. Instead, it can be one that leads to important conceptual and paradigmatic breakthroughs.

The problem for most qualitative researchers is that visualization as described in this article is not simply a chart or a diagram that sits on a page. It has to be viewed electronically, rather than transferred to text. It is a representation that a person can interact with through their own terminal - perhaps by zooming in, adjusting what variables they want highlighted, even immersively "moving" through it. This requires the use of the computer to present data in electronic form, not on static "hard copy," and most qualitative researchers are used to communicating to each other through words and texts. A good scientific visualization at a minimum allows the viewer to find their own visual patterns in phenomena - rather than requiring the original researcher to point them out - thus changing the collaborative nature of research. Beyond that, it might even allow the viewer to alter the parameters of the representation itself, and see what happens. What if a variable is removed from the flow diagram altogether...?

Visualization has a number of applications for qualitative researchers. The use of semantic conceptual networks has already been discussed. Other possibilities might be decision trees where viewers themselves test movement down the tree; flowcharts where they test behavioral flow; the ever-famous kinship diagrams, which used to be built out of Tinkertoys and Lego. All these modes of computer-based visualization for data presentation do more than just prove visually what the researcher might be arguing verbally; they really let the viewer test the hypothesis for themselves by interactively searching for the patterns they claim to describe. One researcher's visualizations might even be examined and analyzed by another's program. Visualizations tend to bring the context and relationships of data to the foreground. It's not hard to see how this might also help produce a "new paradigm" in the social sciences, and qualitative research in anthropology, as well. Strangely, this "new paradigm" looks a lot like the old holistic one, and the new reflexive one, we've argued for all along.

Another computer data presentation technique that could accompany visualization, but does in so little anthropological work, is that of simulation. Used properly, it can also be a great boon to qualitative methods. Almost all anthropologists have utilized computer simulations, whether they've played video games (such as Civilization or Lunar Lander), used the program SimEarth (or SimCity, SimLife, SimIsland, etc.), or ever fooled around with the addictive program Life. But they probably have never looked upon such simulations as research tools. That's unfortunate, and it's probably due to the fact that the assumptions (lunar gravity, etc.) that govern the behavior of simulations are there for entertainment value, maybe to challenge the person in some way. For a few innovative qualitative researchers, simulation is a new way to present and validate things discovered through other techniques. (Kelle 1995.)

Ecologists run "artificial life" simulations of biological evolution, and meteorologists run simulations of climate to test hypotheses and present arguments about how these phenomena work. Anthropologists hesitate to do this, but there's no intrinsic reason why we can't generate simulations of human societies, and then use them to present and test our assumptions. The assumption has always been that this is an extensively quantitative effort. However, a good simulation program usually already contains quantitative parameters as a given (the constants of nature.) The real value of the simulation is again the ability that it gives the researcher to present things qualitatively - visually, in space, perhaps in a way they can be interactively moved around. We have a model of what happens when overcrowding occurs in human societies. Built into our simulation are certain laws of interaction between persons based on our observations. In our simulation, each house represents x number of families. What happens when we move the houses too close together?... this is the power of simulation.

Simulation can help present a wide variety of qualitative relationships. Physicists often illustrate their physical laws through billiard ball collisions. If we believe certain things occur through human interactions under certain conditions (which is not positivism, just pragmatism), why not run a simulation where we demonstrate what happens between two people "colliding" under those circumstances? However, we should not be surprised that social simulation, just as is the case with physical modelling, normally requires a good deal of simplification of behavior. The fortunate thing is that the computer's interactivity and ability to represent data in multilayered or multileveled ways (as with GIS) returns a lot of complexity to our representation. Qualitative researchers should not be afraid of simulation - like visualization, it may validate some of the things we have been saying about human phenomena all along, and present our data in new and challenging ways that help our colleagues test our propositions.

Answering Problems in Computer Methodology

Using the computer may be an answer to many of our methodological questions in qualitative research. The computer is certainly a reliable instrument, since it will always produce the same results given the same inputs. The validity of its output is dependent on its programming, however. Some qualitative researchers see the computer as an essential element of adding some objectivity to research, since it cannot engage in self-deception or have a subjective bias. But its "objectivity" is questionable. If programmed to ignore extreme data points as unnecessary "anomalies," it will do so. If it is told to reduce qualitative material by eliminating "extraneous" data, it will follow its programming and do so. It will manifest the biases of its software, which are the biases of its programmers. The computer at present is incapable of making its own decisions or judgments, based on imprecise criteria and rules of thumb (as human researchers do) - but this might change in the near future.

Some qualitative researchers feel that the computer biases research toward positivism, quantification, and Western ethnocentrism. After all, the computer processes everything through precise, logical, binary/digital operations - when it is well known that human thinking and judgment is often based on "fuzzy logic," analog thinking, and association. It seems too "left brain" and abstract in its operations, ignoring the "right brain" contextual holism that seems to be an important part of cultural and local knowledge. These objections are based on how computers operate today. Parallel processing, neural networks, analog circuits, optical memory storage, perceptual-learning systems, and "biochips" (electronic logic gates involving living tissue) could produce computers that not only simulate human thinking, which is simultaneously quantitative and qualitative, but even physically emulate it. The word "computer" is insufficient to consider the evolving nature of these devices. It prevents us from seeing how they can be used qualitatively.

The biases of the first generation of computer designers - who saw them primarily as counting, calculating, and arithmetic machines - should not blind us to their potential qualitative uses. We are learning that the structure of a large number of qualitative representations (art, music, and poetry in particular) have quantitative bases, and can be "appreciated" or even generated by the computer. This does not mean that they are "reducible" to numbers, or that looking at phenomena this way denies creativity or aesthetics. The equivalence of qualitative and quantitative phenomena only tells us that they are commensurable, like matter and energy. One can be transformed into the other. It would be just as meaningful to declare that all numbers are qualitative - which certainly was the case for Western mathematics from the very beginning. The Pythagoreans had a deep sense of the qualities of various numbers, associating them colors, sounds, human activities, shapes, and passions.

Even literary critics in the humanities are coming around to realizing that the computer can assist them in their hermeneutic enterprise, answering questions of authorship, genre, and so forth. The computer does not "dehumanize" our subjects, and even if it can someday emulate human thought, it does not reduce our own humanity or uniqueness one wit. It does not always necessarily strip the richness or context of qualitative data, as some researchers has claimed. It can add another level of description to thick description. Cognitive science models of the self derived from the computer (such as Minsky's Society of Mind theory of the modularity of the brain) can help us decipher human consciousness, and thus add to our understanding of human social action. (Carley 1992.) Models of human behavior need not be mechanistic and static simply because they are generated from a computer.

Therefore, we need to look askance at suggestions that the computer is antithetical to humanistic, interpretive, engaged qualitative research. Many postmodernists (such as Baudrillard and Lyotard) have agreed that their insights about the undecidability of phenomena come from paradigmatic shifts in the sciences such as chaos theory - which in turn come out of computer modelling. Paradoxically, the computer is revealing truths about the world that are contradictory to the assumptions of its designers. Linear calculating machines are now telling us that the majority of phenomena in the universe are governed by nonlinear equations, the state of which are incalculable into the future due to sensitivity to initial conditions. Complexity theory, derived from work with computers, also supports anthropological thinking about the rapid mutability of social systems and human ecology. (Colby, et al., 1991.) When it comes to rich qualitative study, the computer is not necessarily our enemy; it could be our friend.

The Future: Computer as Active Assistant -- AI Knowledge systems, Agents, Natural Language comprehension

If the computer is truly to be our "friend" and research assistant, it will need to be able to do more. Most of the technological capabilities I am discussing here remain achievements for the future. However, all the innovations I am discussing are under development by computer scientists - they are not simply "vaporware." And when they do appear, they are likely to revolutionize qualitative methods. These computer innovations are likely to be of use in all phases of research - discovery, analysis, presentation. They will transform the computer from a passive slave to an active assistant - capable of aiding us in our research, coming up with new ideas, making suggestions, bringing up new questions instead of simply answering them. (Tesch 1990.) Some anthropologists may consider this a threat to their autonomy. Or maybe their own value to the profession... (could they be replaced by a robot?) Most will see these developments as making their life easier, maybe even opening new lines of inquiry.

"Agent" technology involves the development of "daemons" (autonomous processes) that can move throughout computer networks and gather data or invoke remote processes of a certain type on command. One of the imagined uses of an "agent" frequently discussed by software developers is a intelligent "aide" in your computer that could do the following: search for airline tickets at the lowest possible fare for your upcoming trip, or create a customized newspaper for you based on locating articles from newswire services on a particular topic, or automatically update your Web page every time someone creates a link to it from somewhere else. Agents could do a lot of grunt work for qualitative research - gathering texts, locating informants, locating funding, finding field sites, distributing surveys to specific recipients that meet only certain special criteria, finding journal articles, finding databases. However, if they're going to do "brain work" for us, they will need to be a lot smarter.

This will require some breakthroughs in AI knowledge representation. While there has been some considerable development of expert systems that represent knowledge in medicine and geology, efforts to create knowledge systems dealing with human cognition and social behavior have been less successful. Existing AI expert knowledge-based qualitative analysis programs come in several varieties. One type is frame-based representation programs (the term frame deriving from the AI work of Minsky and the behavioral studies of Goffman), such as ETHNO and ERVING, which place information in slots, fillers, and pointers to other frames. (Miles and Huberman 1994.) These programs divide knowledge into procedural and declarative frames, and are able to make limited "deductions" about actors in certain settings based on this data. They can actually make inferences and hypotheses about interactions between persons based on these frames.

Other AI programs follow in the line of Shanks and Abelson's work, and these analyze human behavior in terms of scripts, plans, and goals, reducing actions into semantic "primitives." (Schank and Abelson 1977.) Given the scripts for two actors in a social setting, these programs can also make inferences about their behavior. While frame-based programs analyze human actors in terms of knowledge, script-based programs assume humans are governed by rules which are implicitly followed but may not be explicitly "known" by them. Both these programs attempt to make inferences about human social behavior based on data provided by their user (the qualitative researcher.) Current research in neural networks, "bottom up" and parallel processing, and dual analog-digital representations may allow computers to more closely emulate the human brain, and the way it makes inferences about social behavior. These programs might eventually be able to "learn" on their own and improve their performance in understanding and predicting behavior.

The third breakthrough occurring in the computer field has to do with the human-computer interface. Researchers studying human-computer interaction have noted that most forms of input for computers require rather unnatural skills on the part of users, whether that involves entering binary machine code, using punched cards, typing abstruse programming languages, or selecting data through convoluted graphical user interfaces (GUIs.) A much more "natural" way might be to interact with the computer in the same way that one does with their fellow persons. Researchers working in this area are examining natural-language comprehension and are studying ways in which people might be able to communicate with the computer through typed-in language (with all its irregularities and peculiar expressions) or even spoken commands and gesture. (Jones 1995.) Linguists are interested in natural-language comprehension research, because it is the key for computers not only to understand the commands of its users in a 'natural setting', but also to draw conclusions about textual and qualitative data.

As with other forms of AI development, natural-language research has run into certain bottlenecks. It turns out that language comprehension is an extremely complex affair, and despite a half century of research in linguistics, we are not even entirely certain what's involved in the human brain, let alone how to emulate it in the computer. Most natural-language systems, such as Winograd's SHRDLU, can only 'understand' extremely simple imperative sentences dealing with a very small 'universe' of objects (such as "PUT THE RED BLOCK NEXT TO THE BLUE SPHERE.") Many programs such as Eliza don't in fact "comprehend" language at all; they simply "know" how to answer in ways (employing grammatical conventions) that make it seem like they do. And speech comprehension is extensively complicated by the fact that no two speakers of even the same dialect pronounce the same sentence exactly alike. Our ears can disambiguate slurring, dropped consonants or stretched vowels, or changing tone of voice; most speech-recognition systems cannot.

Japanese researchers are working heavily on natural-language comprehension systems as part of their "Fifth Generation" computing initiative. It has required a change from the usual top-down rule-based model of AI, to the bottom-up learning-based model derived from neural networks. Computers now "train" themselves in the rules of speech, rather than being "fed" them in abstract form by programmers. With the combined capabilities of agent technology, knowledge systems, and natural-language comprehension, the computer could become our own personal research assistant. It could go and do online ethnography for us, locating informants, interviewing them, and classifying and categorizing their knowledge. Given some mobility (through a robot chassis), it could even do this in "the field." Data collection, analysis, and presentation could all be accomplished without any human agency. Far-fetched? Perhaps, but it could be only fifteen years away, considering the rapid pace of computer innovation. And in any case, the computer will never replace the anthropologist, but it could act as a very clever adjunct, constantly suggesting theoretical refinements and research improvements along the way.


Return to academic matters