Reducing networks of ethnographic codes co-occurrence
in anthropology
Alberto Cottica, Veronica Davidov, Magdalena Góralska, Jan Kubik, Guy
Melançon, Richard Mole, Bruno Pinaud, Wojciech Szymański
To cite this version:
Alberto Cottica, Veronica Davidov, Magdalena Góralska, Jan Kubik, Guy Melançon, et al.. Reducing
networks of ethnographic codes co-occurrence in anthropology. International Conference on Quantitative Ethnography 2022, Oct 2022, Copenhagen, Denmark. pp.43-57, 10.1007/978-3-031-31726-2_4.
hal-03770039v2
HAL Id: hal-03770039
https://hal.science/hal-03770039v2
Submitted on 25 Oct 2022
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Reducing networks of ethnographic codes
co-occurrence in anthropology ⋆
Alberto Cottica1[0000−0003−2527−6233] , Veronica Davidov1,2[0000−0002−7098−4338] ,
Magdalena Góralska3,4[0000−0001−9491−6682] , Jan Kubik4,5[0000−0003−0017−381X] ,
Guy Melançon6[0000−0003−3193−7261] , Richard Mole4[0000−0002−4790−5654] , and
Bruno Pinaud6[0000−0003−4814−3273] Wojciech Szymański4[0000−0002−2773−494X]
1
Edgeryders
Monmouth University
3
University of Warsaw
4
University College London
5
Rutgers University
Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800
2
6
Abstract. The use of data and algorithms in the social sciences allows
for exciting progress, but also poses epistemological challenges. Operations that appear innocent and purely technical may profoundly influence final results. Researchers working with data can make their process
less arbitrary and more accountable by making theoretically grounded
methodological choices.
We apply this approach to the problem of reducing networks representing ethnographic corpora. Their nodes represent ethnographic codes, and
their edges the co-occurrence of codes in a corpus. We introduce and discuss four techniques to reduce such networks and facilitate visual analysis. We show how the mathematical characteristics of each one are aligned
with a specific approach in sociology or anthropology: structuralism and
post-structuralism; identifying the central concepts in a discourse; and
discovering hegemonic and counter-hegemonic clusters of meaning.
Keywords: anthropology · ethnography · sociology · networks · reduction · visual analytics
1
Introduction
Since their inception, the social sciences have been split between qualitative
and quantitative approaches. One of their most challenging undertakings has
been to develop multi-method approaches that combine the strengths of both
and minimize their weaknesses. We are working on a method that relies on
both qualitative and quantitative techniques to increase the benefits of their
complementarity. The former are employed at the stage of data collection – via
in-depth interviews – and at the stage of analysis, when the ethnographically
⋆
Supported by the European Commission’s Horizon 2020 programme, grant agreement 822682.
2
Cottica et al.
established contextual knowledge is employed in an iterative interpretation of
the collected material in order to reveal repeatable, and thus in some sense
”deeper”, patterns of thought. Ethnographic coders – who are immersed in the
studied societies and cultures – generate rich sets of codes. We analyze them
not just to calculate frequencies of themes and motifs, but also to reveal their
pattern of connectivity, that we then render in compelling visualizations. In these
visualizations, an ethnographic corpus is represented as a network [11], whose
nodes correspond to ethnographic codes; the edges connecting them represent
the co-occurrence of codes in the same part of the corpus. We call this network
a codes co-occurrence network (CCN).
A problem that commonly arises is that CCNs are too large and dense for
human analysts to process visually. Network science has come up with several
algorithms to reduce networks, based on identifying and discarding the least
important edges in a network. It is relatively easy to apply them to this type
of graph. What is harder is to justify the choice of one or the other of these
techniques, and of the values assigned to the tuning parameters that they usually
require. In previous work, we have proposed criteria for choosing a technique
to reduce a CCN [10], and evaluated four candidate techniques against those
criteria. In this paper, we highlight the affinity of each of the four techniques
with a prominent method of analysis, associated in turn with a specific school
of thought in sociology or anthropology. Next, we use data from a study on
Eastern European populism to demonstrate how they work. Our objective is
to contribute to the rigor and transparency of the methodological choices of
researchers when dealing with large ethnographic corpora.
We proceed as follows. After discussing work related to our own, we introduce
the codes co-occurrence network, which is the network to be reduced. Next, we
lay out criteria for choosing a technique to reduce a CCN for qualitative analysis,
and introduce four such techniques. We then propose a mapping of reduction
techniques onto methods of analysis widely used in sociology or anthropology.
Finally, we proceed to apply them to our data, to show how the choice of a
reduction technique sheds light on a specific facet of the studied phenomena.
2
Related work
The turn towards big data, fueled by improvements in computing power, has led
to renewed faith in the ability of quantitative work to provide knowledge that
is more generalized than, yet as valid (that is, knowledge that preserves some of
the richness of case-derived insights) as that obtainable by qualitative studies or
quantitative projects relying on smaller numbers of cases [3].
This has led to exciting progress. At the same time, however, it has highlighted a pressing need for methodological robustness. As scientific work based
on large datasets addresses increasingly precise questions, more steps are needed
to move from raw data to final result. As a consequence, the methods themselves
may be hard to check against the insights derived from intimate familiarity with
specific cases. In combination with ”publish or perish” and with the premium
Reducing networks of ethnographic codes co-occurrence
3
placed by journals on counterintuitive, glamorous results, this has led to various epistemological crises. The replication crisis in social psychology is the most
famous of them [27], but not the only one. For example, it is claimed that half
of the total expenditure on preclinical research in the US goes towards nonreplicable studies [15]; and that ostensibly innocent choices about data cleanup
prior to analysis might lead to divergent results [12]. Even controlled experiments with different researchers working with the same datasets on the same
research questions have led to spectacularly divergent results, for reasons that
are not yet entirely clear [33, 5].
Qualitative sociological and anthropological research is not expected to be
replicable. Rather, its claim to generating reliable knowledge comes from the
rigor and accountability of the methods applied systematically and self-consciously
to a specific case or a small range of cases in well-defined spatial and temporal
contexts. Therefore, careful, transparent choices about one’s method are necessary every step of the way, even more so when research applies mixed methods [3].
This paper is offered as a contribution to the literature on the significance of
such choices in a particular case: that of reducing semantic networks that express
qualitative data. The literature on semantic networks originates in computer science [35, 36, 42, 32]; its main idea is to use mathematical objects – graphs – to
support human reasoning. Building on this tradition, we focus on the idea of network reduction. In doing so, we factor in previous work on the cognitive limits of
humans to correctly infer the topological characteristics of a network from visual
inspection [16, 28, 29, 34]. Such work confirms that large and dense networks are
hard to process visually, and supports the case for network reduction.
It is important to maintain full awareness of the ways network reduction influences visual interpretation, and to account for them in the analysis. To enhance
accountability, we require our mathematical techniques to directly support the
specific requirements of knowledge creation in ethnography, and to be intuitive
enough to ethnographers. In this sense, this work is inscribed in the tradition
of scholars who aim to apply systematic visualization techniques, while still retaining sensitivity to informants’ contextual, interactional, and socioculturally
specific understandings of concepts [14, 19, 39, 6].
3
The codes co-occurrence network and its interpretation
Consider an annotated ethnographic corpus. We call any text data encoding the
point of view of one informant (interview transcript, field notes, post on an online
forum and so on) a contribution. Contributions are then coded by one or more
ethnographers. Coding consists of associating snippets of the contribution’s text
to keywords, called codes. The set of all codes in a study constitutes an ontology
of the key concepts emerging from the community being observed and pertinent
to that study’s research questions 7 .
7
For a complete description of the data generation process, see Section 3 of [11].
4
Cottica et al.
We can think of such an annotated corpus as a two-mode network. Nodes are
of two types, contributions and codes. By associating a code to a contribution,
the ethnographer creates an edge between the respective nodes.
From the two-mode network described above, we induce, by projection, the
one-mode CCN. Recall that this is a network where each node represents an
ethnographic code. An edge is induced between any two codes for every contribution that is annotated with both those codes (Figure 1). The CCN is undirected (A → B ≡ B → A). There can be more than one edge between each pair
of nodes.
Fig. 1: Inducing a co-occurrence edge between ethnographic codes
We interpret co-occurrence as association. If two codes co-occur, it means
that one informant has made references to the concepts or entities described
by the codes in the same contribution, seen as a unit. Hence, we assume, both
concepts belong to this person’s culture-generated mental map. The corpuswide pattern of co-occurrences is taken to encode the collective mental map of
informants.
CCNs tend to be large and dense, hence resistant to visual analysis. They are
large because a large study is likely to use thousands of codes. They are dense
as a result of the interaction of two processes. The first one is ethnographic
coding. A rich contribution might be annotated 10 or 20 times, with as many
codes associated to it. The second one is the projection from the 2-mode codesto-contribution network to the 1-mode co-occurrence network. By construction,
each contribution gives rise to a complete network of all the codes associated
to it, each connected to all the others. Large, dense networks are known to be
difficult to interpret by the human eye [16, 28].
Reducing networks of ethnographic codes co-occurrence
4
5
Techniques for network reduction
Any network reduction entails a loss of information, and has to be regarded as a
necessary evil. Reduction methods should always be theoretically founded, and
applied as needed, and with caution. We propose four reductions techniques, each
one related to a distinct theoretical tradition in the social sciences, particularly
anthropology.
Following [21], we propose that a good reduction technique should:
1. Usefully support inference, understood as a simplifying interpretation of the
emerging intersubjective picture of the world. The main contribution of network reduction to ethnographic inference is that it makes the CCN small and
sparse enough to be processed visually [28, 16]. A well-established literature
– and techniques such as layout algorithms – help us define what a “good”
network visualization is [20].
2. Reinforce reproducibility and transparency. Reproducibility means that applying the same technique to the same dataset will always produce the same
interpretive result (even if the technique has a stochastic component). Transparency means that how the researcher understands how the technique operates, and can explain to her peers how that particular technique contributes
to addressing her research question.
3. Not foreclose the possibility of updating via abductive reasoning. Algorithms
alone do not decide how parameters should be set to get optimal readability.
Rather, the values of the parameters are co-determined by the ethnographers,
who possess rich empirical and theoretical knowledge of relevant contexts.
4. Combine harmoniously with other steps of the data processing cycle, such
as coding and network construction. This means making sure that the interpretations of the data and their network representation are consistent across
the whole cycle.
With that in mind, we turn to the discussion of four candidate techniques.
Each of them can be tuned by choosing the value of a reduction parameter (different for each technique) that determines how many edges to discard. The value
of this parameter is determined by the researcher, in function of the patterns
she explores and of the network topology.
Association depth. A first way to reduce the CCN is the following. For each
pair of nodes in the network connected by at least one edge, remove all d edges
connecting them, and replace them with one single edge of weight d. This yields
a weighted, undirected network with no parallel edges.
d has an intuitive interpretation in the context of ethnographic research.
Consider an edge e = code1 ↔ code2. d(e) is the count of the number of times in
which code1 and code2 co-occur. Since we interpret co-occurrence as association,
it makes sense to interpret d(e) as the depth of the association encoded in e.
This gives us a basis for ranking edges according to the value of d. The higher
the value of d of an edge, the more important that edge.
6
Cottica et al.
To reduce the network, we choose an integer d∗ and drop all edges for which
d(e) ≤ d∗ . As the value of d∗ increases, so does the degree to which the reduced
network encodes high-depth associations between codes.
Association breadth. A second way of reducing the CCN is the following. For all
pairs of nodes code1, code2 in the network, remove all edges e : code1 ↔ code2
connecting them, and replace them with one single edge of weight b, where b
is the number of informants who have authored the contributions underpinning
those edges. Like in the previous section, this yields a weighted network of codes
with no parallel edges, but now edge weight has a different interpretation: it is
a count of the related informants. This has a straightforward interpretation for
ethnographic analysis. The greater the value of b(e : code1 ↔ code2), the more
widespread the association between code1 and code is in the community that we
are studying. We interpret it as association breadth. Notice that b(e) ≤ d(e)
As we did for depth, we reduce by choosing an integer b∗ and dropping all
edges for which b(e) ≤ b∗ . As the value of b* increases, so does the degree to
which the reduced network encodes broadly shared associations between codes.
Highest core values. A third way of reducing the CCN is to consider a cooccurrence edge important if it connects two nodes that are both connected to
a large number of other nodes. A community of such nodes can be identified by
computing the CCN’s k-cores. k-cores are subgraphs that include nodes of degree
at least k, where k is an integer. They are used to identify cohesive structures
in graphs [17].
After computing all the k-cores of a network, its nodes can be assigned a core
value. A node’s core value is the highest value of k for which that node is part
of a k-core.
To find the most important edges in the CCN, we again replace all edges
between any pair of connected codes code1 and code2 with one single edge
e(code1, code2). Next, we choose an integer k ∗ and remove all the codes c whose
core values k(c) ≤ k ∗ .
Simmelian backbone. A fourth approach to identify a network’s most important
edges is to extract its Simmelian backbone. A network’s Simmelian backbone
is the subset of its edges which display the highest values of a property called
redundancy [30]. An edge is redundant if it is part of multiple triangles. The idea
is that, if two nodes have many common neighbors, the connection between the
two is structural. This method applies best to weighted graphs; in this paper,
we use association depth as edge weight.
This technique uses a granularity parameter, k. We set k to be equal to the
average degree of the CCN, rounded to the nearest integer. At this point, for
each pair of nodes n1 , n2 , we can compute the redundancy of the incident edge
e(n1 , n2 ) as the overlap between the k strongest-tied neighbors of n1 and those
of n2 . To reduce the network, we choose an integer r∗ and drop edges for which
the redundancy r(e) ≤ r∗ .
Reducing networks of ethnographic codes co-occurrence
5
7
Mapping network reduction techniques onto four major
approaches in sociology and anthropology
Deciding which network reduction technique is best suited to a particular research project depends on the researcher’s ontological and epistemological beliefs, as well as on the nature of the project itself and of its research questions.
Each reduction technique reveals a different set of attributes semantic networks
have. It also turns out that each technique fits the objectives of a prominent
method of analysis, associated in turn with an identifiable approach in sociology
or anthropology. Based on this fit, we propose that the researcher’s approach
suggests the choice of a reduction technique.
Association depth. Determining association depth is in its essence a method of
uncovering the structure of a society or culture. Key works in anthropology –
Anthropologie structurale [25] and La Pensée sauvage [26] – and in social theory
[1, 31] initiated a whole host of structuralist and post-structuralist approaches.
For post-structuralist sociologists and anthropologists, social relations can
only be understood by analysing how they are constituted and organized through
discourse. In other words, social hierarchies, norms and practices are legitimized
(or delegitimized) by granting the meaning attached to specific concepts a dominant position, enabling certain ideas to become hegemonic, i.e. widely accepted
as then “Truth”. For example, the idea that ethnic nations are natural entities
growing out of shared kinship ties (all academic evidence to the contrary) is
used to legitimize political control by the core nation and the marginalisation
of minority ethnicities. Moreover, discourse scholars work from the assumption
that the meaning respondents attach to floating signifiers is relational within
a discourse. Within a patriarchal discourse, the meaning attached to ‘woman’
is directly determined by the meaning attached to ‘man’, for instance. To understand the meaning of concepts, it is thus essential to understand their interrelationships; discerning which meanings are hegemonic further requires us to
understand which interrelationships between concepts are dominant. Focusing
on association depth is thus a useful way of bringing into sharper focus the interrelationships between concepts that are most commonly used by informants,
thereby providing a picture of the basic structure of discourse in a given community, within which informants create meaning and make sense of the world
around them.
Association breadth. We see association breadth as an alternative point of view
on the structure of discourse. Whereas association depth encodes the raw number
of co-occurrences between codes, association breadth emphasizes how widespread
across different informants those co-occurrences are. In the analysis of section 6
below, we used association depth to check that high-depth edges were not the
artefact of just one (or very few) informant who happened to be obsessively
associating those particular codes.
8
Cottica et al.
Highest core values. The technique based on core values of codes is designed
to determine the centrality of certain concepts in a discourse. While it does
not allow for the reduction of edges, it shows which concepts have most edges
associated with them. It facilitates, therefore, a more systematic determination
of which discursive elements constitute what is known in cultural anthropology
as root paradigms, key metaphors, dominant schemata or central symbols of a
given culture [40, 2].
Simmelian backbone. Finally, the Simmelian backbone extraction can contribute
to the discovery of hegemonic and counter-hegemonic clusters (subcultures) of
meaning in an analyzed body of discourse [18, 23]. No society or culture is fully
integrated and each is subjected to centripetal and centrifugal forces simultaneously. As a result, even in the most “homogenous” societies and cultures one can
identify at least embryonic subcultures or – in another formulation – for every
hegemony there is a budding or fully articulated counter-hegemony. The point is
that a hegemony or counter-hegemony is usually built not on a single symbol or
concept but on their interconnected cluster. This reduction technique helps to
identify such clusters and assess with greater precision their shape and internal
coherence.
6
An application
We used the corpus of a project we are working on to show how each of the four
aforementioned reduction techniques can be seen as broadly corresponding to a
paradigm in anthropology – a convergence that attests to the utility of such a
synthesis. This application is not meant as a full methodological primer. Rather,
it means to be a ”proof of concept”, and show the possibilities of synthesizing
quantitative and qualitative techniques in the service of ethnographic insight.
The data were gathered in the spring and summer of 2021, as a part of a larger
research project on populism in Central and Eastern Europe, to be completed
by the end of 2022. They consist of 17 semi-structured interviews with Polishspeaking Internet users, who used social media to seek and share information
about health against the backdrop of the COVID-19 pandemic. Research participants were asked about their opinion on the current state of affairs in their
respective countries, and their political choices over the years and at present.
The interviews’ transcriptions (about 78,000 words) were then split into contributions, in the sense of Section 3: each question of the interviewer, and answer
of the interviewee was considered as a contribution. In what follows, two codes
are considered to co-occur if, and only if, they were both used in annotating the
same contribution (as opposed to the same interview). Computed this way, the
CCN from this corpus includes 1,116 contributions, and 2,152 annotations. The
latter use 600 unique codes, connected by 16,370 co-occurrence edges. The data
are available as open data [9].
We apply reduction techniques to the CCN in sequence, trying for different
levels of the respective reduction parameters (d, b, k, r) in order to achieve a
Reducing networks of ethnographic codes co-occurrence
9
good combination of legibility (more edges discarded) and completeness (fewer
edges discarded). In each reduced network, we focus on the ego network of one
code in particular, Catholic Church. Ego network analysis is widely used in
anthropology, for example in the conventions of kinship charts. We selected this
particular code in the expectation that the Catholic Church would be fairly
central in any ethnographic study of populism in Poland, and that, therefore, it
would appear in most reduced networks.
Highest core values. Anthropology as a discipline has a long history of trying
to identify “core” dimensions of culture, both to better theorize how a given
culture is constituted, and as a useful heuristic for ethnographic fieldwork (cf
Boas’s outer and inner forces [4], Kroeber’s reality and value culture [22], Steward’s cultural core [38]). In our approach we are particularly inspired by Victor
Turner, a founding figure in symbolic anthropology – a theoretical approach in
British anthropology arising in the 1960s – that viewed culture as an independent system of meaning deciphered by interpreting key symbols and rituals [37]
and theorized that “beliefs, however unintelligible, become comprehensible when
understood as part of a cultural system of meaning” [13]. Turner subscribed to
a definition of symbol as “a thing regarded by general consent as naturally typifying or representing or recalling something by possession of analogous qualities
or by association in fact or thought” [41]. As we are invested in holistically understanding and visualizing how cultural beliefs and discourses are assembled, it
is the recollection and association aspects that are of particular interest to us.
Turner did not seek to define a fixed core of concepts within a culture the way
Steward, for example, did. Nevertheless, he did write about symbols “variously
known as ‘dominant,’ ‘core,’ ‘key,’ ‘master,’ ‘focal,’ ‘pivotal,’ or ‘central’ [that]
constitute semantic systems in their own right [with a] complex and ramifying
series of associations as modes of signification.”
We envision network reduction based on the highest core values as revealing something akin to such a semantic system. We approach it in the spirit
of Turner’s notion of “positional meaning” articulated in his methodology for
studying rituals – a level of symbolic meaning derived from analyzing a symbol’s
association to other symbols and cultural concepts, in other words, contextual
meaning: ”The positional meaning of a symbol derives from its relationship to
other symbols in totality, a Gestalt whose elements acquire their significance
from the system as a whole. This level of meaning is directly related to the important property of ritual symbols... their polysemy. Such symbols possess many
senses, but contextually it may be to stress one or a few of them only.” [41]
In our data, we see the highest core values reduction yielding an innermost
nucleus of nodes (ethnographic codes) that recur most often in relation with
each other. Catholic Church is close to the center of the symbols expressing
this culture. Mathematically, it belongs to one of the innermost k-cores, (k = 28,
containing 82 codes, shown in figure 2), though not the absolute innermost. Two
k-cores exist in the graph where k is higher than 28 (k = 29, k = 42). The
analysis supports the conclusion that the Catholic Church is one of the core
symbols in this culture.
10
Cottica et al.
Fig. 2: The full CCN. The 28-core is shown highlighted in blue. It contains
Catholic Church (in red).
Simmelian backbone. Next, we explore the neighborhood of Catholic Church
through the lens of the Simmelian backbone reduction technique. Recall that
this technique detects community of nodes connected by redundant links, and
was developed to identify homophily and strong ties in a social network of actors
[30]. Here, we use it to identify communities of ethnographic codes. In a way,
when applied to concepts rather than human actors, this approach literalizes the
notion of certain ideas being “in conversation” with each other. The visualization reveals several such “conversations”. The community structure itself maps
onto the anthropological notion of culture as a field of competing forces, with
different clusters of codes encoding different strands of culture. In the words
of Jean and John Comaroff, “culture [is] the semantic space, the field of signs
and practices, in which human beings construct and represent themselves and
others, and hence their societies and histories. . . culture always contains within
it polyvalent, potentially contestable messages, images, and actions.” [8] This
approach stresses that culture is neither monolithic nor fixed, but rather always
contingent and in flux, and allows us to see, from a bird’s eye perspective, how
various “signifiers-in-action” coalesce into identifiable semantic subspaces.
Catholic Church belongs to a community of codes that are political rather
than spiritual– such as abuse of power, political marketing, and right
wing (figure 3. In fact, the highest-redundancy edge incident to Catholic Church
is to politicisation (r = 55). Our ethnographic interpretation is that people
Reducing networks of ethnographic codes co-occurrence
11
have concerns pertaining to the Catholic Church, both in the context of what
they conceive as this institution’s excessive politicization and more personal concerns, anxieties, and anomic tendencies. This can be used as a foundation to build
on iteratively in future research on a range of subjects, including but not limited to political cultures, epistemologies, various dimensions of trust and belief,
and the position of the Catholic Church in the public space and the country’s
culture.
Fig. 3: The ego network of Catholic Church, with only edges with edge redundancy r > 30 shown.
Association depth and association breadth. We now turn to the association depth
and association breadth reduction techniques, which work in tandem to deepen
our understanding of the underlying structures of discursive associations. The
association depth visualization shows us which associative links between concepts are the strongest – in other words, which codes emerge as being mentioned
together most often. Association breadth helps evaluate the diffusion of these
“deep” edges among informants. When the results produced through the depth
and breadth reductions align, it confirms that deep associations are not generated by a small number of interviews with people who frame a topic by linking it
repetitively with a constant, limited set of other topics, but rather a broad agreement that emerges from the analysis of many interviews or conversations. We can
see how this plays out with Catholic Church code (Figure 4a): the three deepest
12
Cottica et al.
associations are formed between it and the abuse of power, politicisation,
and Polish catholicism codes. If we choose lower (but still significant, in the
sense that the number of edges in the CCN is reduced by over 95%) levels of the
reduction parameter d, codes like LGBT, discrimination and Law and Justice
party appear.
The association breadth-reduced CCN shows that the broadest links to Catholic Church are very similar to the deepest ones. The very broadest three
connect it to politicization, Polish catholicism, and discrimination.
Edges to “political” codes like LGBT, inequality, abuse of power, abortion
and Law and Justice party resist to reductions by over 95% in the number of
edges in the CCN (Figure 4b). In our case, these two reduction methods yield
closely aligned results. Both attest to the Catholic Church figuring as an institution associated with politics more than with faith or spirituality among the
informants. Even though there are some codes visible in the graph that may
correlate to spirituality, the broadest associations still link the Catholic Church
with political codes and the issue of abuse of power.
(a) Only edges with association depth d > (b) Only edges with association breadth
4, and incident codes are shown.
b > 2, and incident codes are shown.
Fig. 4: The ego network of Catholic Church in two CCN reductions.
7
Discussion and conclusions
As ethnographers working with this form of data analysis, we look for patterns
that are of interest to us either for their novelty (unexpected connections) or
confirmation of either previous research, or initial impressions formed during
data collection. In this particular example, this finding aligns with existing survey
studies on Polish Catholicism today, which show that most Poles disapprove of
the Church’s direct involvement in politics [7].
More broadly, this approach is synergetic with anthropology’s long-standing
interest in structures. While we don’t aspire to resurrect the classic structuralist goal of uncovering deep underlying structures or cross-cultural universals à
Reducing networks of ethnographic codes co-occurrence
13
la Claude Lévi-Strauss, there is methodological value in understanding cultural
structures in a way aligned with schema theory developed by cognitive anthropologists rather than old-school structuralists. We are aware that such structures are
historically contingent and subject to change. Nevertheless, these visualizations
offer us a synchronic snapshot of how people mentally organize their experiences
and understanding. From a methodological standpoint this can be valuable not
only as new insight or confirmation, but also as a part of an iterative research
process. Once we have a sense of what ideas the people under study believe link
together most strongly, that knowledge can inform subsequent questionnaires,
interviews, and selection of sites for participant observation. For example, perhaps the most salient participant observation in an ethnographic project on the
Catholic Church in Poland today would have to take place, counterintuitively,
outside the churches, in the domain of politics.
Lévi-Strauss believed that universal deep cognitive structures underpin all
human cultural experience; in that, he exemplified the cross-cultural universalism
position in anthropology. We do not subscribe to such a position; nevertheless, his
approach to myth analysis resonates with our reduction techniques. According
to him, all existing versions of the myth had to be aggregated, so that one could
isolate what he called “gross constituent units” – clusters of specific types of
relations that are present in all versions of the myths (e.g. characters overrating
kinship relations, characters underrating kinship relations) [24]. These units,
Lévi-Strauss posited, revealed deep structures expressed through the language
of myth. In a similar way, we also look at the highest-redundancy edges in order
to glean what they reveal about deep associations structuring cultural discourses
in a corpus.
In conclusion, through this demonstration we aim to make a contribution to
the ongoing and worthwhile conversations in the social sciences geared at synthesizing qualitative and quantitative methods. The reduction techniques discussed
in this paper can be instrumental in supporting ethnographic insights, and the
accountability of methodological choices in ethnographic research. The ethnographer’s goal and research question inform the choice of a reduction technique;
the appropriateness of such choice can be transparently argued by the researcher.
Moreover, since the steps to build and reduce the CCN are reproducible (given
the value of the reduction parameter), other researchers can validate, dispute,
or improve upon her interpretation, thereby contributing to the accountability
of qualitative research. The highest core values reduction identifies concepts of
central significance, and can help map a starting point of entry into the data;
the Simmelian backbone reduction maps heterogeneous communities of meaning,
and may be especially helpful in identifying hegemonic and counter-hegemonic
discourses at work within a community. Finally, the association depth and association breadth reductions, working in tandem, can help illuminate and validate
the most significant associative structures of meaning in specific domains within
a community under study.
14
Cottica et al.
References
1. Althusser, L.: For Marx. Verso Books (1965)
2. Aronoff, M.J., Kubik, J.: Anthropology and political science: A convergent approach, vol. 3. Berghahn Books (2013)
3. Beaulieu, A., Leonelli, S.: Data and Society: A Critical Introduction. SAGE (2021)
4. Boas, F.: The aims of anthropological research. Science 76(1983), 605–613 (1932)
5. Breznau, N., Rinke, E.M., Wuttke, A., Adem, M., Adriaans, J., Alvarez-Benjumea,
A., Andersen, H.K., Auer, D., Azevedo, F., Bahnsen, O., et al.: Observing many
researchers using the same data and hypothesis reveals a hidden universe of data
analysis. MetaArXiv (2021). https://doi.org/10.31222/osf.io/cd5j9
6. Burrell, J.: The field site as a network: A strategy for locating ethnographic research. Field Methods 2(21), 181–199 (2009)
7. CBOS Foundation: Postawy wobec obecności religii i kościola w przestrzeni publicznej (2022), https://www.cbos.pl/SPISKOM.POL/2022/K 003 22.PDF
8. Comaroff, J., Comaroff, J.: Ethnography and the historical imagination. Routledge
(2019)
9. Cottica, A., Davidov, V., Góralska, M., Kubik, J., Melançon, G., Mole, R., Pinaud, B., Szymański, W.: Comparing techniques to reduce networks of ethnographic
codes co-occurrence. In: Advances in Quantitative Ethnography. Fourth International Conference, ICQE 2022, Copenhagen, Denmark, October 15–19, 2022, Proceedings. Springer, Copenhagen (October 2022, in press)
10. Cottica, A., Hassoun, A., Kubik, J., Melançon, G., Mole, R., Pinaud, B., Renoust,
B.: Comparing techniques to reduce networks of ethnographic codes co-occurrence
(Jul 2021). https://doi.org/10.5281/zenodo.5801464
11. Cottica, A., Hassoun, A., Manca, M., Vallet, J., Melançon, G.: Semantic social
networks: A mixed methods approach to digital ethnography. Field Methods 32(3),
274–290 (2020)
12. Decuyper, A., Browet, A., Traag, V., Blondel, V.D., Delvenne, J.C.: Clean up or
mess up: the effect of sampling biases on measurements of degree distributions in
mobile phone datasets. arXiv preprint arXiv:1609.09413 (2016)
13. Des Chene, M.: Symbolic anthropology. In: Encyclopedia of Cultural Anthropology.
Henry Holt (1996)
14. Dressler, W.W., Borges, C.D., Balierio, M.C., dos Santos, J.E.: Measuring cultural
consonance: Examples with special reference to measurement theory in anthropology. Field Methods 17(4), 331–355 (2005)
15. Freedman, L.P., Cockburn, I.M., Simcoe, T.S.: The economics of reproducibility
in preclinical research. PLoS biology 13(6), e1002165 (2015)
16. Ghoniem, M., Fekete, J.D., Castagliola, P.: On the readability of graphs using
node-link and matrix-based representations: a controlled experiment and statistical
analysis. Information Visualization 4(2), 114–135 (2005)
17. Giatsidis, C., Thilikos, D.M., Vazirgiannis, M.: Evaluating cooperation in communities with the k-core structure. In: 2011 International conference on advances in
social networks analysis and mining. pp. 87–93. IEEE (2011)
18. Gramsci, A.: I quaderni del carcere. Einaudi (1975)
19. Hannerz, U.: The global ecumene as a network of networks. In: Kuper, A. (ed.)
Conceptualizing society, pp. 34–56. Routledge (1992)
20. Herman, I., Melancon, G., Marshall, M.: Graph visualization and navigation in
information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics 6(1), 24–43 (2000). https://doi.org/10.1109/2945.841119
Reducing networks of ethnographic codes co-occurrence
15
21. King, G., Keohane, R.O., Verba, S.: Designing social inquiry: Scientific inference
in qualitative research. Princeton university press (1994)
22. Kroeber, A.: Reality culture and value culture. In: SCIENCE. vol. 111, pp. 456–
457. AMER ASSOC ADVANCEMENT SCIENCE 1200 NEW YORK AVE, NW,
WASHINGTON, DC 20005 (1950)
23. Laitin, D.D.: Hegemony and culture: Politics and change among the Yoruba. University of Chicago Press (1986)
24. Lévi-Strauss, C.: The structural study of myth. The journal of American folklore
68(270), 428–444 (1955)
25. Lévi-Strauss, C., Lévi-Strauss, C.: Anthropologie structurale, vol. 171. Plon Paris
(1958)
26. Lévi-Strauss, C., et al.: La pensée sauvage, vol. 289. Plon Paris (1962)
27. Maxwell, S.E., Lau, M.Y., Howard, G.S.: Is psychology suffering from a replication
crisis? what does “failure to replicate” really mean? American Psychologist 70(6),
487 (2015)
28. Melançon, G.: Just how dense are dense graphs in the real world? a methodological
note. In: Proceedings of the 2006 AVI workshop on BEyond time and errors: novel
evaluation methods for information visualization. pp. 1–7 (2006)
29. Munzner, T.: Visualization analysis and design. CRC press (2014)
30. Nick, B., Lee, C., Cunningham, P., Brandes, U.: Simmelian backbones: Amplifying
hidden homophily in facebook networks. In: Advances in Social Networks Analysis
and Mining (ASONAM), 2013 IEEE/ACM International Conference on. pp. 525–
532 (Aug 2013)
31. Poulantzas, N.: On social classes. New Left Review (1973)
32. Shapiro, S.C.: Representing and locating deduction rules in a semantic network.
ACM SIGART Bulletin (1977). https://doi.org/10.1145/1045343.1045350
33. Silberzahn, R., Uhlmann, E.L., Martin, D.P., Anselmi, P., Aust, F., Awtrey, E.,
Bahnı́k, Š., Bai, F., Bannard, C., Bonnier, E., et al.: Many analysts, one data set:
Making transparent how variations in analytic choices affect results. Advances in
Methods and Practices in Psychological Science 1(3), 337–356 (2018)
34. Soni, U., Lu, Y., Hansen, B., Purchase, H.C., Kobourov, S., Maciejewski, R.: The
perception of graph properties in graph layouts. Computer Graphics Forum 37(3),
169–181 (2018). https://doi.org/10.1111/cgf.13410
35. Sowa, J.F.: Conceptual structures: information processing in mind and machine.
Addison-Wesley Pub., Reading, MA (1983)
36. Sowa, J.F., et al.: Knowledge representation: logical, philosophical, and computational foundations, vol. 13. Brooks/Cole Pacific Grove, CA (2000)
37. Spencer, J.: Symbolic anthropology. In: Encyclopedia of Social and Cultural Anthropology. Henry Holt (1996)
38. Steward, J.H.: Theory of culture change: The methodology of multilinear evolution.
University of Illinois Press (1972)
39. Strathern, M.: Cutting the network. The Journal of the Royal Anthropological
Institute 2(3), 517–535 (1996)
40. Turner, V.: Liminal to liminoid, in play, flow, and ritual: An essay in comparative
symbology. Rice Institute Pamphlet-Rice University Studies 60(3) (1974)
41. Turner, V.: Symbolic studies. Annual Review of Anthropology 4(1), 145–161 (1975)
42. Woods, W.A.: What’s in a link: Foundations for semantic networks. In: Representation and understanding: Studies in Cognitive Science, pp. 35–82. Elsevier (1975)
Comparing techniques to reduce networks of
ethnographic codes co-occurrence
Alberto Cottica, Amelia Hassoun, Guy Melançon, Jan Kubik, Benjamin
Renoust, Bruno Pinaud, Richard Mole
To cite this version:
Alberto Cottica, Amelia Hassoun, Guy Melançon, Jan Kubik, Benjamin Renoust, et al.. Comparing
techniques to reduce networks of ethnographic codes co-occurrence. 7th International Conference on
Computational Social Science, Jul 2021, Zurich, Switzerland. hal-03277204
HAL Id: hal-03277204
https://hal.science/hal-03277204
Submitted on 2 Jul 2021
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
7th International Conference on Computational Social Science IC2 S2
July 27-31, 2021, ETH Zürich, Switzerland
Comparing techniques to reduce networks of
ethnographic codes co-occurrence
Alberto Cottica⋆ , Amelia Hassoun◦ , Guy Melançon• , Jan Kubik◦ , Benjamin
Renoust⊙ , Bruno Pinaud• and Richard Mole×
⋆ EdgeRyders,
Belgium; ◦ Univ. of Oxford, UK; • Univ. of Bordeaux, France; ◦ The State Univ. of New
Jersey, USA; × Univ. College London, UK; ⊙ University of Osaka, Japan
Keywords: ethnography, anthropology, networks, reduction, visual analytics
Extended Abstract
Semantic Social Network Analysis (SSNA) is a research method conceived for the social sciences. It consists in a combination of techniques drawn from digital ethnography and network science. The digital ethnography lineage ensures that SSNA remains open-ended and
exploratory [1]. The network science lineage provides quantitative insight on the extent to
which a statement vouched by one informant is shared by the others [2]. SSNA aims to combine the depth of ethnography with the breadth of surveys. To achieve this, it encodes annotated ethnographic corpora as structured data. Raw data consists of Contributions, recorded
testimonies authored by informants and recorded in a database; Annotations, database objects
created as ethnographers associate snippets of texts they find in contributions to keywords,
called Codes; and codes themselves. We represent these data as a social network where the
nodes are informants and edges represent conversational interactions. Codes – associated to
edges via annotations – encode the semantics of that interaction. We call this a semantic social
network (SSN), and its analysis semantic social network analysis (SSNA). Here we focus on a
transformation of the SSN, called the codes co-occurrence network (CCN) where nodes represent ethnographic codes and undirected edges represent co-occurrence. An edge between two
codes means that both codes were used to code the same contribution.
We can think of CCNs as patterns of free associations, specifying the connections between
the concepts encoded in ethnographic codes [6]. Ethnographers find them highly intuitive [2].
However, in a typical study (100–500 informants, 1,000–5,000 contributions) 1,000–2,000
codes might arise. Thus, CCNs tend to be both fairly large and dense, with tens of thousands of
edges making them a dense networks known to be difficult to visualize [4]. Reducing a CCN
could make it amenable to visual analysis by ethnographers. However, any network reduction
entails a loss of information, and has to be regarded as a necessary evil. Reduction methods
should always be theoretically founded, and applied with caution. We compare some reduction
techniques and their theoretical groundings, and discuss their interpretations. First, we induce
a CCN from a corpus obtained from an online forum discussing populist politics in Eastern
Europe (336 informants, 2,284 contributions, 5,863 annotations and 1,445 codes, connected by
85,174 co-occurrence edges). Next, we attack it with alternative reduction techniques. Finally,
we systematically assess each one in terms of criteria of quality prevalent in the literature on
methods for qualitative research. Following [5], we evaluate the extent to which each reduction
1
7th International Conference on Computational Social Science IC2 S2
July 27-31, 2021, ETH Zürich, Switzerland
technique: usefully supports inference, understood as an interpretation of the emerging intersubjective picture of the world; reinforces reproducibility and transparency that help to increase
the researcher’s ability to assess equivalence between any two implementations; does not foreclose the possibility of updating via abductive reasoning (algorithms alone do not decide how
parameters should be set to get optimal readability); combines harmoniously with other parts
of SSNA, such as coding and network construction.
Reduction techniques include: (i) reduction by dropping co-occurrences that occur only
once or few times (Figure 1); (ii) reduction by dropping co-occurrences associated to a low
number of informants; reduction by dropping edges not belonging to high-k k-cores [3].We
propose an interdisciplinary approach in evaluating techniques to process ethnographic data.
These techniques are, in themselves, purely mathematical, but are evaluated by an interdisciplinary team in terms of how well they support qualitative research in the social sciences.
Figure 1: Detail of a CCN. Edge color maps to number of co-occurrences.
References
[1] M. H. Agar et al. The professional stranger: An informal introduction to ethnography,
volume 2. Academic press San Diego, CA, 1996.
[2] A. Cottica, A. Hassoun, M. Manca, J. Vallet, and G. Melançon. Semantic social
networks: A mixed methods approach to digital ethnography. Field Methods, page
1525822X20908236, 2020.
[3] S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. K-core organization of complex
networks. Physical review letters, 96(4):040601, 2006.
[4] M. Ghoniem, J.-D. Fekete, and P. Castagliola. On the readability of graphs using node-link
and matrix-based representations: a controlled experiment and statistical analysis. Information Visualization, 4(2):114–135, 2005.
[5] G. King, R. O. Keohane, and S. Verba. Designing social inquiry: Scientific inference in
qualitative research. Princeton university press, 1994.
[6] M. Stella, S. De Nigris, A. Aloric, and C. S. Siew. Forma mentis networks quantify crucial
differences in stem perception between students and experts. PloS one, 14(10):e0222870,
2019.
2