Rapid ethnographic assessment for cultural mapping

Tracy Van Holt

Rapid ethnographic assessment for cultural mapping

Tracy Van Holt

2013, Poetics

visibility

…

description

19 pages

link

1 file

Today researchers need an efficient and valid approach to mine and analyze the large amount of textual information that is available. Automated coding approaches offer promise but a major concern is the accuracy of such codes in capturing the meaning and intent of the original texts. We compare the recall (number of codes identified) and precision (accuracy of the codes) that included bodies of texts coded (1) manually by humans based on the Outline of Cultural Materials (OCM) code book, (2) semi-automatically by computers that used a human-generated content dictionary containing Rapid Ethnographic Retrieval (RER) codes, and (3) automatically by computers that used an automated version of the OCM content dictionary (AOCM). We applied network visualization and statistics to quantify the relative importance of codes. The semi-automatic coding approach had the highest balance of recall and precision. Network visualization and metrics identified relationships among concepts and frame codes within a context. Semiautomated approaches can code much more data in a shorter period of time than humans and researchers can more easily refine content dictionaries and analyses to address errors, which makes semi-automated coding a promising method to analyze the ever-expanding amount of textual information that is available today. #

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: https://www.elsevier.com/authorsrights Author's personal copy Available online at www.sciencedirect.com Poetics 41 (2013) 366–383 www.elsevier.com/locate/poetic Rapid ethnographic assessment for cultural mapping Tracy Van Holt a,*, Jeffrey C. Johnson b, Kathleen M. Carley c, James Brinkley d, Jana Diesner e a Institute for Coastal Science and Policy, Geography Department, East Carolina University, Greenville, NC 27858, USA b Institute for Coastal Science and Policy, Sociology Department, East Carolina University, Greenville, NC 27858, USA c Institute for Software Research, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA d Institute for Coastal Science and Policy, East Carolina University, Greenville, NC 27858, USA e The iSchool (Graduate School of Library and Information Science), University of Illinois at Urbana-Champaign, Champaign, IL 61820-6211, USA Available online Abstract Today researchers need an efficient and valid approach to mine and analyze the large amount of textual information that is available. Automated coding approaches offer promise but a major concern is the accuracy of such codes in capturing the meaning and intent of the original texts. We compare the recall (number of codes identified) and precision (accuracy of the codes) that included bodies of texts coded (1) manually by humans based on the Outline of Cultural Materials (OCM) code book, (2) semi-automatically by computers that used a human-generated content dictionary containing Rapid Ethnographic Retrieval (RER) codes, and (3) automatically by computers that used an automated version of the OCM content dictionary (AOCM). We applied network visualization and statistics to quantify the relative importance of codes. The semi-automatic coding approach had the highest balance of recall and precision. Network visualization and metrics identified relationships among concepts and frame codes within a context. Semiautomated approaches can code much more data in a shorter period of time than humans and researchers can more easily refine content dictionaries and analyses to address errors, which makes semi-automated coding a promising method to analyze the ever-expanding amount of textual information that is available today. # 2013 Elsevier B.V. All rights reserved. Keywords: Content analysis; Accuracy; Network analysis; Data mining * Corresponding author. 0304-422X/$ – see front matter # 2013 Elsevier B.V. All rights reserved. https://dx.doi.org/10.1016/j.poetic.2013.05.004 Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 367 1. Introduction There has been an explosion in the number of available digitized textual sources such as newspapers, journals, blogs, the (participatory) web, etc. Given the sheer magnitude of textual data, it is not possible for human coders to keep up with the flow of text-based information. This article is a first step toward understanding the strengths and weaknesses of semi-automated coding of texts by assessing the accuracy of these coding methods, particularly dictionaries. We seek a coding approach that has the highest balance of recall (number of concepts coded) and precision (codes the concept correctly). In addition, we visualize and analyze word cooccurrences using a network approach providing opportunities to quantify the relationships among codes. If semi-automated coding of textual data and network analysis can mimic human coders’ ability to code for ethnographic concepts, then the door opens to new types of ethnological, comparative or cross cultural studies of relationships at multiple spatial and temporal scales previously unheard of. This matters because textual data available online via newspapers, Twitter, blogs, etc. provide rich sources of data by which to model various aspects of human behaviors—all the way from individuals to societies (NRC, 2008). For example, Internet applications are moving toward technologies that facilitate interaction and participation with the end user, which offer insights into human behavior (O’Reilly, 2005). If people’s interactions, such as a comment on Twitter, are geospatially tagged, or if locations in news articles are geospatially coded, then we can code for the patterns of human opinion and behavior can be mapped across space and time (Rinner et al., 2008; Van Holt et al., 2012). Texts provide fine-resolution data for events that are rare, such as conflicts (Mack, 2007), and therefore difficult to forecast because of a lack of sufficient data (Schneider et al., 2011). Data used in forecasting models are often at the macro level and, as a consequence, researchers may not pick up on finer resolution indicators that may help improve accuracy (Schneider et al., 2011). However, researchers such as Brandt et al. (2011) have advocated using online data resources to forecast conflict in real time. By rapidly and systematically analyzing text-based data, we can analyze human perception and behavior through time and move toward forecasting events and the likelihood of scenarios. Of course, the coding of opinion patterns, the mapping of human behavior, and the forecasting of events require a useful approach. 1.1. Rapid ethnographic assessment We are developing a computational system that will help to advance rapid ethnographic assessment. We make a distinction between ethnographic and other kinds of assessments in that ethnographic assessments follow in a tradition of the coding and analysis of ethnographic texts using abstract codes that capture important theoretical constructs (e.g., kinship lineage systems) and that attempt to maximize the contextual properties of the original texts. Rapidly analyzing a culture, the socio-economic and environmental drivers of culture, and how these processes change over time all require a systematic and robust way to extract and analyze data. This system (see Fig. 1) will operate end-to-end so as to enable the researcher to rapidly collect and relate data across a broad range of situations; to understand and identify fundamental dynamic processes; and to feed ethnographic data into both qualitative and quantitative models. One such example is from Van Holt et al. (2012), who modeled ethnic conflict and peace in Sudan and South Sudan using newspaper articles. They concluded that ethic conflict was associated with livestock, environmental resources, and the structure of multi-ethnic group ties to these resources: ethnic Author's personal copy [(Fig._1)TD$IG] 368 T. Van Holt et al. / Poetics 41 (2013) 366–383 Fig. 1. Overview of system. groups associated with peace had more ties to the coded concept ‘‘biomes’’ (forest, rivers, etc.). We believe such a system will fundamentally alter the way in which ethnographers, ethnologists and modelers work together and support improved data collection, model building, accuracy assessments, and model validation. However, going from text data to various kinds of models (e.g., network models) involves multiple points for introducing and propagating errors (Diesner, 2012). Therefore, key components in this process are to make sure that we understand the errors in coding and how these errors may affect our ability to understand and construct valid models of human behavior. This article is a first step toward understanding the strengths and weaknesses of semiautomated coding of text-based material; it does so by assessing the accuracy of the coding process, especially of coding dictionaries, which are a key component of the presented approach. It is referred to as ‘‘ethnographic’’ in that we follow in the tradition of cross-cultural researchers, such as George Peter Murdock, in creating standardized codes that have ethnographic meaning capturing elements of culture, society, and economics. We compare two automated coding approaches against a gold standard—namely the Human Relation Area Files (HRAF), which are documents (i.e., ethnographies) that were manually coded by researchers using the Outline of Cultural Materials (OCM) code book, one created over the span of George Peter Murdock’s career. In comparison, the Rapid Ethnographic Retrieval (RER) represents the semi-automated approach that includes a dictionary, which took a year for people to develop, and software (AutoMap and ORA; Carley et al., 2011a,b, respectively) that automatically coded and analyzed the data based on a dictionary that involved humans in the analytical loop. Although somewhat labor intensive, the value of these two dictionaries is that they can be modified for use in the Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 369 analysis of other texts with little effort. In contrast, the automated approach used a dictionary that was generated in one day by automating the OCM codes (AOCM) and software (AutoMap and ORA) that automatically coded and analyzed the data with little human involvement. We compare the three coding approaches in terms of recall (number of concepts coded) and precision (codes the concept correctly). 1.2. Ethnographies and ethnologies There is a long history of using coded texts to produce large-scale models of human systems that include ethnographies—studies of individual cultures and ethnologies, or cross cultural comparisons. Ethnographies usually require extensive field work to gather such descriptions. Researchers often try to understand the lived experience (Spradley, 1979) of informants through ethnographies, which can be extracted from narratives—such as open ended and structured interviews, as well as historical documents—through participant observation. In one tradition, after collection, a large corpus of these rich narratives are then coded. However, these coding endeavors have been very labor intensive—involving a large number of coders, usually graduate students, who would code vast volumes of ethnographic materials with respect to a predetermined set of themes, in part to restrict the work to a manageable workload. Most notable advances in ethnologies are based on ethnographies from multiple researchers that were classified by George Peter Murdock who produced several cross cultural data sets (Whiting, 1986)—including the Standard Cross-Cultural Sample—that classified distinct cultures and societies as the units of analysis for cross-cultural studies and a large number of ethnographically coded variables. These data sets were derived from the corpus of known ethnographies of the time that were collected through extensive field work by social scientists and the coded variables were produced by human coders—mostly coders not a part of the original ethnographies. The coding of the texts was labor intensive, but their efforts were able to produce reliable codes for often abstract concepts and variables across a large sample of societies from around the world. Such coded data could be used to model, study and account for patterns observed across cultures that might help to illuminate, for example, the cultural factors related to animal husbandry or matrilineal descent. This coding enterprise took place in the 1950s and 1960s, at time when the number of ethnographic texts, although growing, were still relatively limited in number. Today, ethnographies are evolving from one-shot case studies focused solely on the lived experience in a cultural setting (Spradley, 1979) to include studies that integrate measures of social structure, as well as multi-disciplinary, multi-temporal (Abello et al., 2012), and multigroup analyses. For example, Johnson and Orbach (2002) have found that the structural features of the social system, such as social networks, show how personal relationships are related to success in establishing political change. Experiential and environmental factors are increasingly viewed as important drivers that shape culture or shared knowledge (Boster and Johnson, 1989; Reyes-Garcia et al., 2004). Recent behavioral change or adaptation studies include geospatial and ecological factors, in addition to traditional measures of knowledge and technology (Van Holt, 2012). Still longitudinal studies are rare, despite the fact that this type of experimental or analytical design provides the strongest scientific evidence to understand changes in human behavior. One such compelling study is Godoy et al. (2005), which tracks informants’ knowledge through time as they became integrated into the global market. As studies discover more factors that are predictive of socio-cultural behavior, ethnographic studies become increasingly interdisciplinary requiring larger teams, more time and greater expenses to cover this expanding set of factors. There also may be higher risks and costs in research conducted in regions of high Author's personal copy 370 T. Van Holt et al. / Poetics 41 (2013) 366–383 conflict. For all these reasons, a more scalable approach to collect ethnographic information is needed, and digital texts offer this opportunity. 1.3. Network analysis of word co-occurrences Once coded, the themes can then be analyzed by word frequency, word co-occurrences, and network analysis (Bernard and Ryan, 2009). The relationships among themes can be displayed and further analyzed via multidimensional scaling, clustering algorithms, network analysis and visualization techniques such as spring embedders (Bernard and Ryan, 2009; Johnson and Krempel, 2004), or they can be geospatially mapped (Van Holt et al., 2012). Word co-occurrences characterize relationships and interactions among actors and events, behavior, etc. Osgood’s (1959) analysis of the co-occurrence of words in W.J. Cameron talks on the Ford Sunday Evening Hour radio program was the precursor to the use of network analytics to study and model texts. Osgood coded the talks (37 in total) according to 27 concepts; tested the significance of the associations; and displayed the synthesis of the topics in a crude multi-dimensional scaling visualization, where the size of the node (word) reflected its prominence and ties (arcs) signified that two concepts co-occurred during a given talk. Carley (1994) has shown that a map analysis of texts (also a precursor to network analysis) is a way to integrate cognition and culture; one example in the study discusses how the definition of robots has changed over time from being perceived in a negative to a positive light. Center resonance analysis (CRA), developed by Steve Corman and Kevin Dooley, uses network analytics to produce abstract representations of text by linking together words in texts (CRA, 2001). However, CRA is a data reduction method that involves a combination of natural language processing and network methods to produce an abstract representation of the overall content in the original text(s). Moving away from network analysis, Bourdieu and Wacquant (1992) argued that correspondence analysis is useful for analyzing relational thinking, specifically that of field theory. Nonetheless, de Nooy (2003) argued that network analysis can provide additional information that correspondence analysis cannot, which is a way to quantify social capital and other person to person relationships. Johnson and Griffith (1998) confirmed some of these same findings but advocated the integration of the two approaches (correspondence analysis and network analysis) as both are relationally based but provide different types of information about the nature of such relations. That said, Krinsky (2010) coded news items from a LEXIS-NEXIS search on ‘‘Workfare and New York City’’ for actors (judges, state officials, advocates, etc.) and their political claims. Network analysis was used to identify conversations and bridging power across conversations of actors that spanned across multiple claims. Thus, a network approach to analyze relationships offers potential to provide more context to the automated content approach, increasing the ability to mimic human coders. 1.4. Semi-automated text analysis Content analysis is a methodology for summarizing and assessing the content of texts (Holsti, 1969; Neuendorf, 2002). This approach mainly relies on content dictionaries (i.e., tables of text terms), which can consist of one or multiple words that are associated with a particular theme (Carley, 1994; Gerner et al., 1994; Roberts, 1997; van Cuilenburg et al., 1986). Content dictionaries used to be developed by hand, but automated approaches have been developed, applied and evaluated to accelerate this process (Diesner and Carley, 2008; Diesner, 2012; King and Lowe, 2003; Schrodt and Gerner, 1994). Most of these approaches use a (mixture of) lexical, syntactic, Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 371 semantic, logical and statistical information from the text documents and sometimes meta-data on the texts’ data, such as key words and index terms (for a review, see Diesner and Carley, 2010). Automated text analysis began in the 1960s by Philip Stone, who developed the first dictionary-based content analysis program, the General Inquirer (Stone et al., 1962). Today the General Inquirer is associated with some of the most developed code books for coding text. These include the Harvard IV-4 dictionary that contains 182 categories. The dictionary also has valence categories that are used to refine the meaning of the text. Coding for valence has recently started to gain momentum again under the labels of sentiment analysis and opinion mining. Additional code books have been added, although it is unclear how the legacy of the Harvard IV will evolve. The Kansas Event Data System (KEDS), including the Textual Analysis by Augmented Replacement Instructions (TABARI), is also a dictionary-based approach that codes political events into predefined and continuously updated categories (Gerner et al., 1994). Most content analysis software supports the construction of dictionaries, but it may be cumbersome to consistently and coherently customize and integrate multiple dictionaries for analytical efforts other than for what they were originally intended. Our approach combines (1) semi-automated content analysis, (2) mapping text terms to concepts (e.g., ‘‘Omar Hassan Ahmad Al-Bashir’’ and ‘‘Bashir’’ both to ‘‘al Bashir’’), (3) mapping concepts to ontological categories (e.g., ‘‘al Bashir’’ to ‘‘agent’’), and (4) a link extraction technique (Carley et al., 2007; Diesner and Carley, 2008). In steps one and two, a dictionary is used that is seeded with a set of generalizations and aliases, as well as automated solutions for stemming and reference resolution, to identify a set of concepts. Using a human-inthe-loop approach referred to as data-to-model process (D2M), the human researcher can apply further generalizations and identify aliases to reduce the concept set (Carley et al., 2012). For example, a trained coder reviews the results of automated strategies and has the chance to make changes to automatically generated outputs and solutions. The result from this phase of the coding, like other content analysis procedures, is a list of concepts and their cumulative frequency, which is used to improve and expand a dictionary. In step three, we classify concepts into a set of ontological categories. These categories are agents (e.g., people), organizations (e.g., tribes), locations (e.g., New York City), tasks (e.g., conflict), resources (e.g., potable water), and others. This is accomplished using a probabilistic entity extraction model trained via Conditional Random Fields; a supervised machine learning technique appropriate for working with sparse, large-scale data (Diesner, 2012; Diesner and Carley, 2008). Then, using a human-in-the-loop approach, the suggested classification is vetted—resulting in improved coding (Carley et al., 2012; on the impact of this verification step, see Diesner, 2012). The fourth step of the coding process is the extraction of links among instances of ontological classes. This is done automatically using a proximity-based approach, where the user specifies a window size within which all concepts are linked to each other (Danowski, 1993; for the impact of proximity-based coding and related error rates, see Diesner, 2012). The result of the overall D2M process is a network representation of the textual information, that is, a network where each concept is associated with an ontological category. The extracted concept networks contain weighted, bidirectional links. The weight indicates the cumulative frequency with which a link was observed. The coded text is then analyzed and visualized with the ORA software (Carley et al., 2011b). 1.5. Illustration of going from texts to networks Our dictionary codes for different ontology categories, such as knowledge, resource, task, etc. For a link to appear in the network, the coded concept appeared within n words of another coded Author's personal copy [(Fig._2)TD$IG] 372 T. Van Holt et al. / Poetics 41 (2013) 366–383 Fig. 2. An example of how a text is coded as a network. concept, where n is set by the user. The concept networks can be viewed per ontological category or they can be integrated. In this example (Fig. 2) text from the Sudan Tribune was extracted and a user-generated content dictionary was applied by AutoMap. After the processing, Bor would code to bor, an ethnic group in Sudan, and then cross-classified as an organization. Cattle would code to ecology_resource and then classified as a resource; and the word attacks codes to conflict and then classified as a task. To generate the networks, we set a window size of seven, which means that coded text within seven words of each other show a link. This window size was found to be optimal for our data set, and it resulted in not too many or too few links in the code networks. AutoMap generates an xml file (dynetml) that is imported into ORA, a network analysis software system that reads each ontology category (i.e., knowledge, resource, and task, etc.) as a separate network and also integrates ontology categories. In ORA, social network metrics can be computed. Moreover, network visualization can be generated that show the structure, patterns and strength of relationships among categories and the network overall. For example, in Fig. 2, we see that the tie between conflict and ecology_resource is thicker than other codes—indicating multiple co-occurrences of these concepts in the text and a strong link between conflict and ecology-resource, in this case, cattle. One concern, however, is how accurate are these codes in capturing the meaning and intent of the original texts? In investigating this issue, we now turn to the accuracy of our content dictionaries in comparison to other methods for coding including the use of human coders only and an automated approach. In addition, we demonstrate the visual and analytical capabilities of using AutoMap and ORA in the context of coding ethnographic data to answer substantive questions about social groups. 2. Research design and methods Here we analyzed 25 paragraphs of an HRAF coded ethnography, Shaping of Somali Society: Reconstructing the History of a Pastoral People, 1600–1900 (Cassanelli, 1982), where human coders hand-coded each paragraph according to the OCM code book. We compare the humancoded OCM data to computer processed data coded by the automated content dictionary of the OCM categories, which we will call AOCM categories, and the semi-automated Rapid Ethnographic Retrieval process (RERs). The OCM coded files were obtained from the HRAF Archive of Ethnography that includes over a million pages of ethnographies, where each paragraph has been hand coded or more precisely indexed by professional anthropologists according to culture traits and cultural groups using the OCM categories (Murdock, 1987) and Outline of World Cultures (OWC) (Murdock, 1983). The intent of coding each section (a single paragraph or group of paragraphs) in the ethnographies is to facilitate cross-cultural research (ethnological studies). Researchers identify a cultural trait of interest and sample texts—in this case, ethnographies—containing those traits Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 373 across multiple cultures using the Standardized Cross Cultural Sample (Murdock and White, 1969) or other sampling techniques. The HRAF are electronically available and the OCM categories span a wide range of culture traits, thus making HRAF an ideal source of texts to compare coding approaches. The OCM categories are posted at the end of each paragraph or group of paragraphs that have been manually coded by HRAF employees. To generate the automated AOCM codes, we used a highly efficient approach. We spent one day automating the coding of the OCM categories. First we downloaded the descriptions of each of the OCM categories as provided on the HRAF website or in the OCM manual (Murdock, 1987). We then parsed out each independent word or phrase using punctuation characters and created a dictionary, also known as thesaurus, where each independent word or phrase mapped to the corresponding OCM code, which we then call an AOCM code. The OCM descriptions are incomplete; for example, the fishing category (OCM_226), lists fish and shellfish in the description, but does not include algae. In other cases, the descriptions are difficult to automate; for example, economic importance of fishing, a description OCM_226, was simply not mapped for. The purpose of our comparison is not to automate each OCM code precisely, but rather to see how by searching for a few of the descriptor words provided in the OCM guidebook, fully automated techniques compare to manual and semi-automated techniques. Even if only a small fraction of the text is coded correctly, we hope to understand what the computers code efficiently, what human-coders are required for, and how to improve upon this fully automated coding approach. In contrast to this efficient technique, it took us over a year to create the RER categories by providing lists of words within each category that helped create our dictionary. Each term in the data set goes through a hierarchical classification system following from the term itself to the RER code. The classification system begins with the broadest categories (i.e., the ontological categories that include knowledge, resources and tasks, etc.). All of the verbs and their variants that the human coders observed or could think of were assigned to the categories in the task ontology. All terms that are resources, such as oil, cattle, or diamonds, were assigned to their specific categories within the resource category. Other complex concepts representing information were assigned to categories within the knowledge class. Each of these categories has several sub-categories called RERs for Rapid Ethnographic Retrieval. The analyses presented here use the finest level of resolution of RER categories. The RER approach is not a direct match to the OCM, but we did use this code book as a guide to make sure that we characterized the majority of cultural materials. At this time, a word can only code to one category, though in the future, we plan to add multiple categories. We compare the total number, as well as the content of codes found (recall) and accuracy of the AOCM, OCM, and RER approaches (precision). We then compare the percentage match from the human coded texts (OCM) and automated coding approaches (AOCM and RER). We identify what types of additional codes the automated approaches are picking up on (false positives). We then compare the automated approaches (AOCM and RER) for coding frequency and accuracy for each coding event. Finally we visualize the AOCM and RER networks, and compare the relationships among codes using network statistics. The RER network contains coded concepts from three different ontologies—knowledge, task and resource—that are each represented in the network with a different icon. These categories were selected as they are central to our content domain. The AOCM does not have these distinct categories, only one network is generated, and all nodes are represented with an icon. In order to make the two networks more comparable to each other, ORA was used to merge the three RER ontologies into one. The task and resource ontologies were moved into Author's personal copy 374 T. Van Holt et al. / Poetics 41 (2013) 366–383 knowledge. This allows for network level metrics to include all connected nodes in order to produce a single set of measurements. For visual clarity, only the main component of each network is visualized and all isolates (i.e., nodes not connected to any other node) were deleted. Both networks were symmetrized and then characterized by computing the degree centrality and betweenness centrality for each coded concept. Degree centrality measures how many times a concept was adjacent any other concept; a concept with high degree is tied to many other concepts in the text. The betweenness centrality metric is a measure of the extent to which a code or node is on the shortest path between all other nodes; it reflects how important a node is in connecting other nodes or bridging concepts. Since the OCM data were coded at the paragraph level they were not included in the network-level comparison to avoid conflicts of comparability. 3. Results 3.1. Accuracy assessment and unique codes The highly efficient automated AOCM technique had a high recall rate but low precision, i.e., it identified the highest number of unique codes (138); however, this approach was also the least accurate1 (only 38% correct matches; see Table 1). The human coded (OCM) technique resulted in low recall but high precision in that it found the fewest codes, but these codes were all accurate (100%). The content dictionary based semi-automated RER approach was also highly accurate (96%) and identified 84 codes, and it had the highest balance of recall and precision out of the tested methods. Out of the 33 different concepts coded for in the human-coded OCM, twenty-four (73%) were picked up by the AOCM automated approach (Table 2). Twenty-nine of the OCM codes (88%) and 42 (80%) of the AOCM codes could be accounted for in the semi-automated RER approach through similar code matches (similar matches were used because the RER code book does not precisely match the OCM or AOCM code book). The RER approach correctly coded for 24 additional concepts not picked up by the AOCM and 46 additional codes not picked up by the OCM. In comparing the automated and semi-automated approaches (all coding events), the automated AOCM approach coded (or recalled) 593 items of text throughout the 25 segments analyzed, of which 32% of the coding events were accurate (precise) because they reflected the context of the original text (Table 3). The semi-automated RER coded (recalled) many more items of text (824) than the AOCM, and nearly tripled the accuracy (precision) with 93% correctly coding for the concept. 3.2. Example of errors and breadth of coding The following texts are excerpts from Shaping of Somali Society: Reconstructing the History of a Pastoral People, 1600–1900 (Cassanelli, 1982), and they demonstrate coding with OCM, AOCM, and RER codes at the sentence level. In the following texts, the bold words coded to the italicized terms in parentheses. The codes are differentiated in terms of AOCM or RER classification. The OCM codes are at the end of the text. 1 A code was considered accurate if it reflected the text at least one time. Author's personal copy 375 T. Van Holt et al. / Poetics 41 (2013) 366–383 Table 1 The semi-automated RER approach has the highest balance between recall (total codes identified) and precision (codes that were accurately coded). The automated AOCM approach has high recall but low precision. The human coded OCM codes had the highest precision but lowest recall. Accuracy was determined by a human coder. Coding method OCM AOCM RER Recall Precision Total codes found Accurate codes 33 138 84 Inaccurate codes Total Percent Total Percent 33 52 81 100% 38% 96% 0 86 3 0% 62% 4% Note: Accuracy was determined if the code was correct at least 1 time. Table 2 Percent match among human coded (OCM), automated (AOCM) and semi-automated RER coded concepts. Number of additional codes found in each comparison identified as well. AOCM OCM AOCM RER Number match Percent match Total additional codes found Number match Percent match Total additional correct codes found 24 a 73% 28 29 b 42 c 88% 80% 46 24 Note: Accuracy was determined if the code was correct at least 1 time. Total additional codes found in the RER analysis may not sum to 81 because multiple RERs may closely match a single OCM or AOCM code. a See Appendix 1 for a list of OCM and AOCM codes found and those that matched. b See Appendix 2 for a list of OCM and RER codes found and those that matched. c See Appendix 3 for a list of AOCM and RER codes found and those that matched. Table 3 In terms of total coding events—individual times a concept is coded, the semi-automated coding (RER) had the higher recall and precision in comparison to the automated (AOCM) approach. ACOM RER Recall (total coding events) Precision (percent of events that were accurate) Recall (total coding events) Precision (percent of events that were accurate) 593 190 32% 824 765 93% Although they were outstanding (RER adjective) breeders of camels (RER livestock), cattle (AOCM 233 pastoral activities; RER livestock), and horses (RER livestock), they never raised animals (AOCM 363 streets and traffic; RER fauna) specifically (RER adverb neg) to meet the needs of any large nonpastoral population (RER population). In the northern part of the country, these goods were available annually at the winter bazaars (AOCM 443 retail marketing) in the coastal (AOCM 131 location; RER biomes and land cover) towns (AOCM 632 towns) and periodically from itinerant traders who set up makeshift shelters near water (AOCM 318 environment quality; RER biomes and land cover) holes or wadi s (AOCM 527 rest days and holidays) in the interior (AOCM 131 location; RER valence). OCM codes: 221_annual_cycle, 233_pastoral_activities, and 439_external trade. Author's personal copy 376 T. Van Holt et al. / Poetics 41 (2013) 366–383 All coding schemes picked up on pastoral activities well. An example of a positive match is AOCM 233 pastoral activities and RER livestock both coding for cattle. Water was coded as AOCM 318 environment quality under the AOCM code book, which is in the same domain, but not a precise code. The OCM codes picked up on an OCM_221 annual cycle, which neither the RER nor AOCM codes accounted for. The RER in addition, coded for parts of speech, population, valence, and biomes and land cover. False positives in the above text include AOCM 363 streets and traffic for animals and AOCM 527 rest days and holidays for s. Gellner formulates the distinction as ‘‘simple’’ vs. ‘‘symbiotic’’ nomadism (AOCM 221 annual cycle; RER livestock) and he sees the latter (RER adjective) as characteristic (RER valence) of Middle (AOCM 911 chronologies and culture sequences) Eastern pastoralists (RER livestock). OCM codes: 221 annual cycle, 233 pastoral activities, 619 tribe and nation The AOCM and OCM coding of the above text both coded for annual cycle. The ACOM picked up on the word nomadism, which was an element of the OCM dictionary, whereas the OCM coder read the paragraph and coded the paragraph as annual cycle. The RER and OCM coding both picked up on livestock. Only the OCM coded for tribe and nation. The AOCM approach falsely coded AOCM 911chronologies and culture sequences for Middle. 3.3. Network-level analysis The network-level characteristics generated via the semi-automated RER and automated AOCM show that much more information is generated via the RER approach. The RER approach generated 73, as opposed to 33 AOCM nodes since only accurate codes were displayed (Table 4). Table 4 The word co-occurrence network generated via the semi-automated (RER) approach produced a slightly denser network, coding for more concepts, and with more connection among concepts in comparison to the automated (AOCM) generated network. Node count Clustering coefficient Network density Node average Standard deviation Min Max Betweenness Network centralization Node average Standard deviation Min Max Total degree Network centralization Node average Standard deviation Min Max AOCM RER 33 73 0.312 0.312 0.365 0.000 1.000 0.338 0.338 0.180 0.000 0.619 0.403 0.085 0.117 0.000 0.475 0.108 0.021 0.030 0.000 0.136 0.034 0.012 0.011 0.003 0.044 0.059 0.019 0.027 0.001 0.155 Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 377 Table 5 Degree centrality for nodes (concepts) coded by the semi-automatic (RER) and automated (AOCM) approaches. A higher degree signifies that the concepts were tied to other coded concepts in the network. AOCM Degree RER Degree 592 131 632 121 721 613 443 312 0.044 0.044 0.037 0.027 0.024 0.020 0.017 0.001 Livestock Kinship Political boundary Biomes and land cover Conflict Economy Land use Land resource 0.155 0.136 0.078 0.064 0.063 0.054 0.043 0.039 household location towns theoretical orientation in research and its results instigation of war lineages retail marketing water supply The networks were denser in the RER network; that is, there were more connections across concepts, and it was easier for one concept to connect with another concept for the RER approach. This effect can be a simple function of the higher recall rate of this method. By analyzing the eight most important codes from a network perspective in terms of degree centrality and betweenness centrality, we can see that AOCM and RER picked up on many similar concepts and these concepts had high degree—meaning that they were salient concepts in the analyzed text (Table 5 and Fig. 3). The AOCM network shows AOCM 529 household and AOCM 131 location as sharing the highest degree measurement. The RER network analysis shows that RER livestock had the most ties to other concepts in the text (high degree centrality), while the AOCM had no similar corresponding code in its network. ACOM 613 lineages and AOCM 592 household may be representative of the RER kinship. AOCM 632 towns and AOCM 131 locations may represent RER political boundary. AOCM 721 instigation of war is similar to RER conflict. AOCM 443 retail marketing is similar to RER economy. ACOM 121 theoretical orientation in research and its results had no similar code in the RER network because this code was not represented in the RER content dictionary. The RER approach picked up on more [(Fig._3)TD$IG]environmental concepts (RER Biomes and Land Cover and RER Land Resource), and these Fig. 3. Main components for the automated (AOCM) and semi-automated (RER) approaches. Concept nodes are sized by degree centrality. Author's personal copy 378 T. Van Holt et al. / Poetics 41 (2013) 366–383 Table 6 Betweenness centrality for nodes (concepts) coded by the semi-automatic (RER) and automated (AOCM) approaches. A higher betweenness centrality signifies that the concept is in a unique position because it bridges together other coded concepts in the network. AOCM Betweenness RER Betweenness 592 121 721 613 312 131 443 162 0.475 0.356 0.332 0.240 0.229 0.187 0.158 0.121 Livestock Kinship Land resource Political boundary Economy task Agriculture Intermediate conflict task Biomes and land cover 0.136 0.114 0.105 0.090 0.083 0.073 0.070 0.061 household theoretical orientation in research and its results instigation of war lineages water supply location retail marketing composition of population concepts had a high degree centrailty, meaning that they were salient in the text; however, these terms were not picked up on by the AOCM approach. In both networks, kinship, political boundaries, and intermediate conflict were important bridging concepts (Table 6 and Fig. 4). Of course, since the AOCM did not code for livestock that concept did not appear as an important bridging component, nor did any of the environmental terms, in contrast to the RER analysis. The AOCM 162 composition of population and RER agriculture ranked high in betweenness centrality metrics (as opposed to degree centrality metrics). Terms associated with livestock (animal-by-products, bone, horn and shell technology) do not appear highly central in the AOCM approach. The AOCM approach may have more false positives, but most of the major concepts and main points appear in both the AOCM and RER approaches using the degree and betweenness centrality metrics that focus in on the most important topics. On the other hand, livestock, which was an important bridging concept, was not picked up on by the automated approach and this can be remedied by coding for these terms. [(Fig._4)TD$IG] Fig. 4. Main components for the automated (AOCM) and semi-automated (RER) approaches. Concept nodes are sized by betweenness centrality. Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 379 4. Discussion The large amount of time needed to develop the semi-automated RER code scheme is necessary because the RER approach had the highest balance of recall and precision. The RER codes matched similar concepts that the human coder found (88%). In contrast, the automatically generated dictionaries can likely pick up on broad concepts, but need to be refined for more precise coding. This finding agrees with prior research which has shown that, even though network metrics run on matrices extracted with different methods from the same underlying text data, the agreement in highest scoring key players can still be high (Diesner, 2012). This prior work has also shown that the automated mapping of text terms to ontological categories—as provided in AutoMap and used herein—generalizes strongly across different genres, with the genres considered being email data, news wire data and funded research proposals (Diesner, 2012). Technologies that support creation of codes like the RER are needed, and indeed, due to this work were developed and have been shown to reduce coding time significantly (Carley et al., 2011b). The systematicity of the AOCM and RER are such that they generate on average more consistent coding than will human coding, particularly for large corpora. Another feature of this work is that the RER codes are not specific to Somalia or this data and can be re-used. Currently, they are being used to code a large corpus of texts (all news articles for eight years from the Sudan Tribune online newspaper; Van Holt et al., 2012). However, adaptation to other genres and content domains might still require additional human labor. As noted above, we added new coding terms. We have referred to these as RER terms. These were added to the original OCM codes for the simple reason that technology, social evolution and advances in science leading to the discovery of new drivers of cultural change had led to factors of interest that we wanted to code for that were not part of the original set of the OCM codes. We note that this is likely to be the norm and that a canonical set of comprehensive terms is unlikely to emerge in the near future. This makes the need for technologies that support the rapid human-in-theloop coding, like RER, critical. Moreover, machine learning methods, in general, and statistical natural language processing methods, in particular—such as boot-strapping and semi-supervised domain adaptation—have shown to be useful to adopt existing dictionaries to new domains and data sets with no to minimal human intervention (Daumé, 2007; Gupta and Sarawagi, 2009). 4.1. Error analysis and word sense disambiguation Because each automated code (AOCM and RER) picked up on a word or group of words, a word can be taken out of context and inappropriately classified. As of now, our dictionary can only code to one concept and word sense disambiguation is a concern (see the example below), and they account for most of the false positives. For example, RER ecological concepts was coded for in three instances, each time coding for nature, but the text was not in an ecological context. The fluid and pragmatic nature of Somali politics. . . (text 33) . . .the fragmented nature of the Somalis’ historical experience. . . (text 38) One concerns the nature of the Somali ‘‘conquest’’ of the Horn of Africa. . . (text 46) Also RER oil, appeared once thought the texts but incorrectly coded for reserves, which in the document referred to animal reserves and not oil reserves. . . . access to dry season grazing reserves. . . (text 33) Author's personal copy 380 T. Van Holt et al. / Poetics 41 (2013) 366–383 The RER property incorrectly coded for will. A will may be considered property if it refers to a last will and testament, but in the majority of cases it will be used as a verb to proclaim future action or intention, such as it does in the analyzed texts. In later chapters I will describe. . . (text 35) ‘‘If a man (of the town) lives long enough, he will get to see everything. . . (text 42) Lambs conceived on that night will be born about 150 days later. . . (text 71) One possibility would be to use linguistics to provide dual meanings. We could ask the computer to search for nature of and code that to another concept, such as RER-characteristics (a new category) or to move nature of to the delete list for this analysis. Another possibility is to use a will to code for the document that is related to property. Also, AutoMap can analyze parts of speech, which helps to disambiguate terms. For example, will can be a noun (in the sense of a document) or a verb. For a human coder, this disambiguation needs to be done on a per concept basis because the user must go back to the original text and see how well the analysis reflects the meaning of the text. Of the 98 correct RER codes, 19 had both correct and incorrect coding instances. The RER fauna accounted for the most incorrect coding occurrences from a single RER, with ten incorrect and only eight correct. In every instance of incorrect coding horn, coded to RER fauna, but in a context referring to the horn of Africa. Horn of Africa could be coded to RER boundary and horn could remain RER fauna. If such a change is made, it is important to program the software so horn of Africa is searched for first and horn is searched for second. 4.2. Network analysis The network analysis on the resulting data quantifies how the concepts are related. For general concepts, the AOCM and RER network analysis showed similar findings. If this RER coding system and network analysis was used to index HFRAF texts, cross cultural texts could be sampled by how salient a concept was in a text or how a concept was related to another code. So, we could find out not only which texts discussed kinship, but also we could see the relationships between kinship and other concepts. We then could analyze the structure of those texts with the RER to ontology classification to see how the structure of society is distinct and relates to ecological and geospatial attributes, for example. Finally, we could evaluate which texts had kinship as more or less central concepts (degree centrality) and when kinship was a key part of society that bridged together other parts of society (betweenness centrality). In fact, new types of ethnological studies could be developed by creating units of analysis for comparison based on the relationships among and within cultures, societies, ethnic groups, etc., rather than the characteristics of a single culture or subgroup alone. 4.3. Conclusions Content analysis of newspaper articles, blogs, and other text data resources offers social scientists a new approach for the rapid ethnographic assessment of text-based data sources, where large-scale, over time data can be integrated with other concepts that address human behavior. The considered automatically generated dictionary (AOCM) approach had a high recall rate (many items coded for) but low precision (many false positives) in contrast to the semi-automated RER approach that had few false positives and matched the gold standard human relations area files (OCM). The RER approach, however, had higher recall, picked up on even more concepts, Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 381 than the human coders (OCM). The main false positives in the RER analysis were word sense disambiguation issues, which were words that coded for more than one concept. What does that mean for other text coding projects? When large data sets are available, precision is often more crucial than recall, which makes RER a more suitable approach. When only a small data set is to be coded, recall is often considered more important than precision—which means that AOCM might be a useful strategy, but will require manual refining of the generated dictionary. As another remedy, we hope to use parts of speech to code for some of these issues in the future. When comparing only the correctly coded AOCM and RER via a network approach, in general, both coding schemes picked up on similar topics in a broad sense (top eight topics as per betweenness and degree centrality network measures). Network metrics help to characterize salient concepts across texts. Although we unioned separate ontologies for purposes of comparison, our approach allows for us to visualize and analyze multiple concepts—such as knowledge, resources, and tasks—and therefore integrate data from multiple disciplines over space and time. The network metrics and visualization can transform how ethnological studies are conducted, by basing the units of analysis for comparison on relationships among and between cultures in addition to characteristics within cultures analyzed. The Internet resources and other digitized textual data sources allow us to view data over time at a finer resolution. In combination, this offers new analytical pathway for improving our understanding of human behavior with the use textual sources of data. Acknowledgements This work is supported, in part, by the Office of Naval Research (ONR), United States Navy (ONR MURI N000140811186). Additional support was provided by the Center for Computational Analysis of Social and Organizational Systems (CASOS). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, or the U.S. government. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https:// dx.doi.org/10.1016/j.poetic.2013.05.004. References Abello, J., Broadwell, P., Tangherlini, T.R., 2012. Computational folkloristics. Communications of the ACM 55 (7) 60– 70. Bernard, H.R., Ryan, G.W., 2009. Analyzing Qualitative Data: Systematic Approaches. Sage, Thousand Oaks, CA. Boster, J.S., Johnson, J.C., 1989. Form or function: a comparison of expert and novice judgment of similarity among fish. American Anthropologist 91, 866–889. Bourdieu, P., Wacquant, L.J.D., 1992. An Invitation to Reflexive Sociology. University of Chicago Press, Chicago. Brandt, P.T., Freeman, J.R., Schrodt, P.T., 2011. Real time, time series forecasting of inter- and intra-state political conflict. Conflict Management and Peace Science 28 (1) 40–63. Carley, K.M., 1994. Extracting culture through textual analysis. Poetics 22, 291–312. Carley, K.M., Diesner, J., Reminga, J., Tsvetovat, M., 2007. Toward an interoperable dynamic network analysis toolkit: decision support systems (Special Issue on Cyberinfrastructure for Homeland Security). Advances in Information Sharing, Data Mining, and Collaboration Systems 43 (4) 1324–1347. Author's personal copy 382 T. Van Holt et al. / Poetics 41 (2013) 366–383 Carley, K.M., Columbus, D., Bigrigg, M., Kunkel, F., 2011a. AutoMap User’s Guide 2011. Technical Report, CMU-ISR11-108. Carnegie Mellon University, School of Computer Science, Institute for Software Research. Carley, K.M., Reminga, J., Storrick, J., Columbus, D., 2011b. ORA User’s Guide 2011. Technical Report, CMU-ISR-11107. Carnegie Mellon University, School of Computer Science, Institute for Software Research. Carley, K.M., Bigrigg, M.W., Diallo, B., 2012. Data-to-model: a mixed initiative approach for rapid ethnographic assessment. Computational and Mathematical Organization Theory 18 (3) 300–327. Cassanelli, L.V., 1982. The Shaping of Somali Society: Reconstructing the History of a Pastoral People, 1600–1900. University of Pennsylvania Press, Philadelphia. CRA, 2001. Analyses of News Stories on the Terrorist Attack. Available at: https://locks//locsks.asu.edu/terror. Danowski, J.A., 1993. Network analysis of message content. Progress in Communication Sciences 12, 198–221. Daumé, H., 2007. Frustratingly easy domain adaptation. In: Proceedings of 45th Annual Meeting of the Association of Computational Linguistics (ACL), Prague, Czech Republic, pp. 256–263. De Nooy, W., 2003. Fields and networks: correspondence analysis and social network analysis in the framework of field theory. Poetics 31, 305–327. Diesner, J., 2012. Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from Texts. Technical Report, Carnegie Mellon CMU-ISR-12-101. (Ph.D. Thesis). Diesner, J., Carley, K.M., 2008. Conditional random fields for entity extraction and ontological text coding. Journal of Computational and Mathematical Organization Theory 14, 248–262. Diesner, J., Carley, K., 2010. Extraktion relationaler daten aus texten.(Relation extraction from text). In: Stegbauer, C., Häußling, R. (Eds.), Handbuch Netzwerkforschung. (Handbook of Network Research).Vs Verlag, Weisbaden, pp. 507–521. Gerner, D., Schrodt, P., Francisco, R., Weddle, J., 1994. Machine coding of event data using regional and international sources. International Studies Quarterly 38 (1) 91–119. Godoy, R., Reyes-Garcıa, V., Byron, E., Leonard, W., Vadez, V., 2005. The effect of market economies on the well-being of indigenous peoples and on their use of renewable natural resources. Annual Review of Anthropology 34, 121–138. Gupta, R., Sarawagi, S., 2009. Domain adaptation of information extraction models. ACM SIGMOD Record 37 (4) 35–40. Holsti, O.R., 1969. Content Analysis for the Social Sciences and Humanities. Addison-Wesley, Reading, MA. Johnson, J.C., Griffith, D.C., 1998. Visual data: collection, analysis, and representation. In: de Munck, V., Sabo, E. (Eds.), Using Methods in the Field: A Practical Introduction and Casebook. Altamira Press, Walnut Creek, CA, pp. 211–228. Johnson, J.C., Krempel, L., 2004. Network visualization: ‘‘The Bush Team’’ in Reuters News Ticker 9/11-11/15. Journal of Social Structure 5 (4) . Johnson, J.C., Orbach, M.K., 2002. Perceiving the political landscape: ego biases in cognitive political networks. Social Networks 24, 291–310. King, G., Lowe, W., 2003. An automated information extraction tool for international conflict data with performance as good as human coders: a rare events evaluation design. International Organization 57 (3) 617–642. Krinsky, J., 2010. Dynamics of hegemony: mapping mechanisms of cultural and political power in the debates over workfare in New York City, 1993–1999. Poetics 38, 625–648. Mack, A., 2007. Global Political Violence: Explaining the Post-Cold War Decline, Coping with Crisis. Working Paper Series. International Peace Academy, New York. Murdock, G.P., 1983. Outline of World Cultures, 6th edition. Human Relations Area Files, New Haven, CT. Murdock, G.P., 1987. Outline of Cultural Materials, 5th edition. Human Relations Area Files, New Haven, CT. Murdock, G.P., White, D.R., 1969. Standard cross-cultural sample. Ethnology 8, 329–369. National Research Council, 2008. Behavioral modeling and simulation: from individuals to societies. Committee on Organizational Modeling from Individual to Societies (G.L. Zacharias, J. McMillan, H. Arrow, S.P. Borgatti, R. Burton, K.M. Carley, C. Dibble, E. Hudlicka, J.C. Johnson, S.E. Page, A. Sage, L.S. Tesfatsion, and M.J. Zyda). In: Zacharias, G.L., McMillan, J., Van Hemel, S. (Eds.), Board on Behavioral, Cognitive, and Sensory Sciences, Division of Behavioral and Social Sciences and Education. The National Academies Press, Washington, DC. Neuendorf, K.A., 2002. The Content Analysis Guidebook. Sage, Thousand Oaks, CA. O’Reilly, T., 2005. What is Web 2.0? Design Patterns and Business Models for the Next Generation Software. Available at: https://oreilly.com/web2/archive/what-is-web-20.html. Osgood, C., 1959. Suggestions for winning the real war with communism. Journal of Conflict Resolution 3, 295–325. Reyes-Garcia, V., Byron, E., Vadez, V., Godoy, R., Limache, E.P., Leonard, W.R., Wilkie, D., 2004. Measuring culture as shared knowledge: do data collection formats matter? Cultural knowledge of plant uses among Tsimane’ Amerindians, Bolivia. Field Methods 16 (2) 135–156. Author's personal copy T. Van Holt et al. / Poetics 41 (2013) 366–383 383 Rinner, D., Kebler, C., Andrulis, S., 2008. The use of Web 2.0 concepts to support deliberation in spatial decision-making. Computers, Environment, and Urban Systems 3, 386–395. Roberts, C.W., 1997. A generic semantic grammar for quantitative text analysis: applications to East and West Berlin radio news content from 1979. Sociological Methodology 27, 89–129. Schneider, G., Geditsch, N.P., Carey, S., 2011. Forecasting in international relations: one quest, three approaches. Conflict Management and Peace Science 28 (1) 5–14. Schrodt, P.A., Gerner, D.J., 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–92. American Journal of Political Science 38 (3) 825–854. Spradley, J., 1979. The Ethnographic Interview. Wadsworth Group, Belmont, CA. Stone, P.J., Bales, R.F., Namenwirth, J.Z., Ogilvie, D.M., 1962. The General Inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behavioral Science 7, 484–498. van Cuilenburg, J., Kleinnijenhuis, J., de Ridder, J., 1986. A theory of evaluative discourse: towards a graph theory of journalistic texts. European Journal of Communication 1 (1) 65–96. Van Holt, T., 2012. Landscape influences on fisher success: adaptation strategies in closed and open Access fisheries in Southern Chile. Ecology and Society 17 (1) 28–44. Van Holt, T., Johnson, J., Brinkley, J., Carley, K., Caspersen, J., 2012. Structure of ethnic violence in Sudan: a semiautomated network analysis of online news (2003–2010). Computational and Mathematical Organization Theory 18 (3) 340–355. Whiting, J.W.M., 1986. George Peter Murdock (1897–1985). American Anthropologist 88 (3) 682–686. Dr. Tracy Van Holt’s interests include human–environment interactions as they relate to natural resource use, the consequences of landscape and environmental change, climate change, conflict, and sustainable development. She works with communities and research questions in tropical and temperate ecosystems as well as wetland and coastal environments. She integrates geospatially explicit remotely sensed data with social and environmental data. Dr. Jeffrey C. Johnson’s interests include the influence of technological and environmental factors on the organization of work, leisure, and cognition, particularly in groups in extreme and isolated environments. He has focused a major portion of his teaching and research program around the use of social network theories and methods for understanding social structure and organization. His recent substantive interests have focused on the relationship between cognition and social structure. The bulk of his research has focused on these concerns among the maritime peoples of the Pacific basin, especially the insular Central Pacific, the Caribbean, and coastal North America. Interdisciplinary in both training and orientation, he has had teaching experience in economics, anthropology, sociology, statistics, and Pacific studies. Dr. Kathleen M. Carley’s interests include dynamic network analysis, computational, social and organization theory, adaptation and evolution, text mining, and the impact of telecommunication technologies and policy on communication, information diffusion, belief evolution, disease contagion and response within and among groups particularly in disaster or crisis situations. In her research, she combines results and approaches from cognitive science, organization science, social networks and computer science to address complex social and organizational problems. She and her Center for Computational Analysis of Social and Organizational Systems (CASOS) have developed advanced technologies for network analytics and visualization (ORA), Network extraction from texts (AutoMap), and network evolution and diffusion (Construct and BioWar). James Brinkley is a PhD student in the Coastal Resource Management program and a graduate research assistant for the Institute for Coastal Science and Policy at East Carolina University. His interests include coastal hazards, cultural knowledge of maritime communities, coastal and environmental planning, and resource conflict issues. Current work in his interdisciplinary academic program involves using various social science and geospatial analyses to study cultural understanding of coastal natural hazards. Jana Diesner is an Assistant Professor at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. Jana conducts research at the nexus of network analysis, natural language processing and machine learning. Her goal is to contribute to the computational analysis and better understanding of the interplay and coevolution of information and networks. She develops and analyzes methods and technologies for extracting network data from text corpora and considering the content of information for network analysis. She studies networks from the business, science and geopolitical domain, and is particularly interested in covert information and covert networks.

Log In

Rapid ethnographic assessment for cultural mapping

Related papers

Related papers

Related topics