[1]\fnmLingyao \surLi \equalcontThese two authors contributed equally to this work. [2]\fnmLy \surDinh \equalcontThese two authors contributed equally to this work.

[1]\orgdivSchool of Information, \orgnameUniversity of Michigan, \orgaddress\cityAnn Arbor, \stateMI, \countryUnited States

2]\orgdivSchool of Information, \orgnameUniversity of South Florida, \orgaddress\cityTampa, \stateFL, \countryUnited States

3]\orgdivSenseable City Laboratory, \orgnameMassachusetts Institute of Technology, \orgaddress\cityCambridge, \stateMA, \countryUnited States

Academic collaboration on large language model studies increases overall but varies across disciplines

[email protected]    [email protected]    \fnmSonghua \surHu [email protected]    \fnmLibby \surHemphill [email protected] * [ [
Abstract

Interdisciplinary collaboration is crucial for addressing complex scientific challenges. Recent advancements in large language models (LLMs) have shown significant potential in benefiting researchers across various fields. To explore the application of LLMs in scientific disciplines and their implications for interdisciplinary collaboration, we collect and analyze 50,391 papers from OpenAlex, an open-source platform for scholarly metadata. We first employ Shannon entropy to assess the diversity of collaboration in terms of authors’ institutions and departments. Our results reveal that most fields have exhibited varying degrees of increased entropy following the release of ChatGPT, with Computer Science displaying a consistent increase. Other fields such as Social Science, Decision Science, Psychology, Engineering, Health Professions, and Business, Management & Accounting have shown minor to significant increases in entropy in 2024 compared to 2023. Statistical testing further indicates that the entropy in Computer Science, Decision Science, and Engineering is significantly lower than that in health-related fields like Medicine and Biochemistry, Genetics & Molecular Biology. In addition, our network analysis based on authors’ affiliation information highlights the prominence of Computer Science, Medicine, and other Computer Science-related departments in LLM research. Regarding authors’ institutions, our analysis reveals that entities such as Stanford University, Harvard University, University College London, and Google are key players, either dominating centrality measures or playing crucial roles in connecting research networks. Overall, this study provides valuable insights into the current landscape and evolving dynamics of collaboration networks in LLM research. Our findings also suggest potential areas for fostering more diverse collaborations and highlight the need for continued research on the impact of LLMs on scientific research practices and outcomes.

keywords:
Large language model, interdisciplinary collaboration, diversity analysis, Shannon entropy, network analysis

1 Introduction

Interdisciplinary collaboration is increasingly recognized as critical for addressing complex scientific challenges, which often involves investigators from multiple fields of expertise [1, 2, 3]. Recent advances in generative AI models and applications, particularly the development of large language models (LLM) such as OpenAI’s ChatGPT, Google’s PaLM, and Meta’s Llama, are reshaping research activities across disciplines. Researchers have demonstrated LLMs’ utility in research tasks such as literature search [4, 5], content analysis [6], and findings generation from provided data [7].

The impact of LLMs extends far beyond their computer science origins or general application in research activities. In health-related fields, LLMs are being employed to interpret protein structures [8], process electronic health records [9], and even aid in drug discovery [10]. Engineering benefits from these models through advancements in autonomous driving [11] and remote sensing [12] technologies. Social scientists are leveraging LLMs for large-scale text analysis, including attitude simulation [13] and content moderation on social media platforms [14]. Their impact also extends to finance, where they’re being used to streamline document review and perform financial analysis [15]. This wide-ranging applicability across disciplines drives us to believe that LLMs are transforming multiple aspects of the research processes, potentially impacting how researchers collaborate.

Traditionally, interdisciplinary collaboration has been essential to combine expertise from diverse disciplines to find novel solutions to scientific problems. However, the advent of LLMs is reshaping this paradigm. Researchers have leveraged tools like ChatGPT to gain cross-disciplinary insights and interpret their work from various perspectives, potentially reducing the need for direct collaboration with experts from other fields [16]. LLMs have also been used to automate certain aspects of scholarly writing process, such as synthesizing existing literature from multiple disciplines on a specific topic [17], and reframing the main findings from different disciplines into domain-specific language [18].

While LLMs may have lowered the barriers to conducting interdisciplinary research, especially for those unfamiliar with language processing techniques, they offer significant potential to enhance research practices. By automating routine tasks [5, 19], such as crunching numbers and formatting manuscripts according to journal guidelines, LLMs can free up time for researchers to focus on developing innovative solutions and gaining deeper insights [20, 17]. Well-known LLMs such as ChatGPT have also illustrated capabilities to bridge language gaps between disciplines such as environmental science and ecology [17], statistics and biology [18]. However, it’s crucial to recognize that LLMs’ effectiveness is primarily illustrated in language-related tasks. They are not substitutes for the critical examination of findings or the nuanced and domain-specific understanding essential for truly bridging disciplinary language [21, 22].

Given that the use of LLMs is reshaping the way scholars work and collaborate, the questions then arise: how have LLMs transformed interdisciplinary collaboration? In what ways have they changed how different disciplines interact and collaborate? Therefore, our study explores the application of LLMs in scientific disciplines and their implications for interdisciplinary collaboration by addressing two key research questions, as listed below. The first question aims to examine the diversity of co-authors in terms of their institutional and departmental affiliations, using the Shannon entropy measure. The second question aims to understand the structural patterns of co-authorship collaborations, using network analysis to identify key researchers, institutions, and disciplines that are active producers of LLMs research, as well as those facilitating collaborations across disciplinary boundaries.

  • RQ1. How diverse are the coauthors of papers utilizing LLMs in terms of their research institutions and departments across different fields?

  • RQ2. What are the structural patterns of co-authorship networks in research utilizing LLMs, and what roles do key entities (leading researchers, institutions, and departments) play in facilitating and enhancing collaboration?

To answer these questions, we collect 50,391 papers from OpenAlex related to LLMs from 27 scientific fields published between the release of the BERT model in October 2018 and June 2024. We use Shannon entropy to measure the authorship diversity and social network techniques to reveal the collaboration network structures. Our findings indicate that since the advent of LLMs —particularly with the release of ChatGPT in November 2022—collaboration diversity has increased across many fields, notably Computer Science, Social Science, and Psychology. Medicine is the only discipline where diversity shows a significant decrease. Our network analysis further reveals that Computer Science and Medicine consistently play influential roles in connecting different research communities regarding LLM applications. Overall, our findings demonstrate that LLM research has grown exponentially since 2022 and holds the potential to enhance interdisciplinary collaboration. It also encourages the involvement of prominent academic institutions in leading LLM-based research and applications across different domains.

2 Data and Methods

Aligned with the two research questions, our data collection and methods involve two units of analysis: papers and authors. The first research question examines papers to analyze the diversity of coauthors across research fields and institutions. The second research question centers on authors to explore the structural patterns of co-authorship networks and identify key researchers and their roles in facilitating collaboration. Combining these two units of analysis allows us to capture (1) which disciplines and institutions are most impacted by the advent of LLMs in terms of collaboration diversity, as well as (2) key researchers in LLM research and their respective departmental and institutional affiliations. These analyses help us understand how LLMs are impacting scientific collaboration by revealing patterns of interdisciplinary and cross-institutional partnerships, as well as identifying the influential actors driving LLM research across disciplinary boundaries.

2.1 Data preparation

We select OpenAlex [23] for collecting related studies for two main reasons. First, OpenAlex is an open-source repository of scholarly metadata, allowing us to gather both recently archived preprints and published articles from journals and conferences. This is particularly useful for our analysis, given the prevalence of preprinted studies in the field of LLMs. Second, OpenAlex provides its data freely and openly, ensuring that our analysis can be easily replicated by the community without any licensing restrictions.

The workflow for data cleaning is presented in subsection A.1. Within OpenAlex, we collect relevant papers on the most popular LLMs and their respective models, as detailed in Table 2 in subsection A.2. We use two general terms, “large language model” and “LLM,” along with popular open-source models (e.g., BERT, Flan-T5, LLaMA) and closed-source models (e.g., ChatGPT, Claude) based on the MMLU benchmark [24]. We observe that some models, like Yi or Phi, do not yield relevant papers, potentially introducing significant noise during paper screening. Additionally, models like grok-1 or Galactic did not return any search results. Our data collection, conducted as of June 22, 2024, results in a total of 97,242 papers.

To ensure the collected papers are relevant to the topic of LLMs, we restrict our search to titles and abstracts. However, some papers containing these keywords might still be irrelevant. Therefore, we implement several steps to filter out irrelevant papers. First, we consider only articles and preprints, excluding types such as editorials and opinions. Second, we remove potential duplicates, including those with duplicated titles and papers initially published as preprints and later as journal articles with similar titles. We use Jaccard similarity to identify and remove these duplicates (see subsection A.3). Third, we employ GPT-4 models to evaluate the relevance of a paper to LLM topics based on its title, abstract, and keywords, and filter out those papers that are not relevant to LLM topics (see subsection A.4). Applying these filtering criteria reduces the original collection to 50,391 papers for the subsequent analysis.

2.2 Measure of collaboration diversity

Our first research question seeks to assess the diversity of collaboration. OpenAlex offers a variety of information about author affiliations, including details about departments, institutions, and countries. This allows us to represent the authors’ affiliation information for a paper using a set as follows,

A(xi)={D(xi),I(xi),C(xi)}𝐴subscript𝑥𝑖𝐷subscript𝑥𝑖𝐼subscript𝑥𝑖𝐶subscript𝑥𝑖A(x_{i})=\{D(x_{i}),I(x_{i}),C(x_{i})\}italic_A ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { italic_D ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_I ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_C ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } (1)

where xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the i𝑖iitalic_ith author of a paper, A(xi)𝐴subscript𝑥𝑖A(x_{i})italic_A ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denotes the set of authors’ affiliation information given a paper, D(xi)𝐷subscript𝑥𝑖D(x_{i})italic_D ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represents their department information, I(xi)𝐼subscript𝑥𝑖I(x_{i})italic_I ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represents their institution information, and C(xi)𝐶subscript𝑥𝑖C(x_{i})italic_C ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represents their country information (see Appendix D for the analysis of countries). It is important to note that these three types of information can vary significantly within a single paper. For instance, all collaborating authors could belong to the same institution and country but be affiliated with different departments. Our subsequent analysis particularly focuses on the collaboration between institutions and departments; therefore, we consider the first two sets of variables in Equation 1.

Next, we use Shannon entropy to measure the collaboration diversity given authors’ affiliation information in a paper. Shannon entropy quantifies the uncertainty or randomness in a set of possible outcomes. In the context of information theory, it represents the average amount of information produced by a stochastic set of data sources. Mathematically, for a discrete random variable Y𝑌Yitalic_Y with possible values y1,y2,,ynsubscript𝑦1subscript𝑦2subscript𝑦𝑛y_{1},y_{2},\ldots,y_{n}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and probability distribution P(Y)={p(y1),p(y2),,p(yn)}𝑃𝑌𝑝subscript𝑦1𝑝subscript𝑦2𝑝subscript𝑦𝑛P(Y)=\{p(y_{1}),p(y_{2}),\ldots,p(y_{n})\}italic_P ( italic_Y ) = { italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_p ( italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_p ( italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }, the Shannon entropy H(Y)𝐻𝑌H(Y)italic_H ( italic_Y ) is calculated as:

H(Y)=i=1np(yi)log2p(yi)𝐻𝑌superscriptsubscript𝑖1𝑛𝑝subscript𝑦𝑖subscript2𝑝subscript𝑦𝑖H(Y)=-\sum_{i=1}^{n}p(y_{i})\log_{2}p(y_{i})italic_H ( italic_Y ) = - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_p ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (2)

where Y𝑌Yitalic_Y belongs to one of the aspects (e.g., D𝐷Ditalic_D, I𝐼Iitalic_I) in the set of authors’ affiliation, p(yi)𝑝subscript𝑦𝑖p(y_{i})italic_p ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the probability of the outcome yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We use this metric to measure the diversity given authors’ affiliation information, such as their affiliated departments. For example, if a paper has five authors with affiliated departments D(xi)𝐷subscript𝑥𝑖D(x_{i})italic_D ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represented as {d1(x1),d1(x2),d2(x3),d2(x4),d4(x5)}subscript𝑑1subscript𝑥1subscript𝑑1subscript𝑥2subscript𝑑2subscript𝑥3subscript𝑑2subscript𝑥4subscript𝑑4subscript𝑥5\{d_{1}(x_{1}),d_{1}(x_{2}),d_{2}(x_{3}),d_{2}(x_{4}),d_{4}(x_{5})\}{ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) , italic_d start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) }, then the probabilities of d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and d3subscript𝑑3d_{3}italic_d start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are calculated as 0.4, 0.4, and 0.2, respectively. Using Equation 2, the Shannon entropy is: H(D)=(0.4log20.4+0.4log20.4+0.2log20.2)=1.5219𝐻𝐷0.4subscript20.40.4subscript20.40.2subscript20.21.5219H(D)=-\left(0.4\log_{2}0.4+0.4\log_{2}0.4+0.2\log_{2}0.2\right)=1.5219italic_H ( italic_D ) = - ( 0.4 roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.4 + 0.4 roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.4 + 0.2 roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.2 ) = 1.5219. In general, higher entropy indicates greater diversity in collaboration based on authors’ affiliation information, while lower entropy suggests that authors’ affiliations are more uniform. An entropy of 00 implies that all authors of a paper are from the same institution.

2.3 Measures of network structure

We construct co-authorship networks based on the bipartite network projection, which involves converting a paper-author network to a co-authorship network whereby two authors are connected if they have co-authored at least one paper together. Each connection is weighted based on the total number of papers that each pair of researchers co-authored together. This method, as described by [25, 2], allows us to identify key researchers that have notable collaborative influence in the field, as well as any differences in collaboration patterns depending on the authors’ disciplines, institutions, and countries. The formula for our weighted projection approach is shown below:

wij=aipajpsubscript𝑤𝑖𝑗superscriptsubscript𝑎𝑖𝑝superscriptsubscript𝑎𝑗𝑝w_{ij}=\sum a_{i}^{p}a_{j}^{p}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∑ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT (3)

where aipsuperscriptsubscript𝑎𝑖𝑝a_{i}^{p}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT denotes whether author i𝑖iitalic_i contributes to paper p (with 1 indicating authorship and 0 indicating no authorship), and ajpsuperscriptsubscript𝑎𝑗𝑝a_{j}^{p}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT similarly indicates whether author j𝑗jitalic_j contributes to paper p. wijsubscript𝑤𝑖𝑗w_{ij}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is 1 if i𝑖iitalic_i and j𝑗jitalic_j are authors of paper p. This method assigns a full weight of 1 to each co-authorship instance and sums these weights across all papers where i𝑖iitalic_i and j𝑗jitalic_j are co-authors.

With the resulting co-authorship networks, we analyze the structural properties in terms of (1) overall cohesion, (2) topology, (3) community structure, and (4) centrality measures to identify influential researchers. We compute these measures using Python’s NetworkX𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑋NetworkXitalic_N italic_e italic_t italic_w italic_o italic_r italic_k italic_X, NetworKit𝑁𝑒𝑡𝑤𝑜𝑟𝐾𝑖𝑡NetworKititalic_N italic_e italic_t italic_w italic_o italic_r italic_K italic_i italic_t, visualize the networks with R’s ggraph𝑔𝑔𝑟𝑎𝑝ggraphitalic_g italic_g italic_r italic_a italic_p italic_h, and modify a subset of measures based on our operationalization. Cohesion measures include the density, clustering coefficient, average path length, and size of the largest component, which contain details on the overall connectedness of the network, as well as how efficient the network is in facilitating collaborations between researchers from different disciplines, institutions, and countries.

We also determine whether a co-authorship network follows a power-law degree distribution, indicating a hubs-and-spokes structure where a few hubs accumulate most of the connections. In the co-authorship context, this means that certain key researchers act as central hubs, coordinating the majority of collaborations across the network. To do this, we compute the α𝛼\alphaitalic_α goodness-of-fit value, which indicates if a simulated network with the same number of nodes and edges as the co-authorship network exhibits a similar power-law distribution. Generally, a α𝛼\alphaitalic_α value between 2 and 3 suggests that a power-law distribution is a good fit [26].

3 Results

In subsection 3.1, we analyze the diversity of collaboration based on the institutions and departments affiliated with authors across various fields. In subsection 3.2, we examine the network to understand how different institutions or departments collaborate with each other. It should be noted that the field categorization of a paper is provided by OpenAlex (see Appendix C for more details). OpenAlex has established a BERT model that generates a score distribution for identified topics using a paper’s title, abstract, and citations [23]. Their model provides up to three topics with the highest scores, which are then mapped to fields according to the ASJC structure [23].

In addition, we sort the collected 50,391 papers based on the paper count in order, as presented in Figure 1(a). For the subsequent analysis, we focus on the top 12 fields with the most publications in the topics of LLMs: (1) Computer Science, (2) Medicine, (3) Social Science, (4) Decision Science, (5) Biochemistry, Genetics & Molecular Biology, (6) Psychology, (7) Engineering, (8) Health Professions, (9) Business, Management & Accounting, (10) Neuroscience, (11) Arts & Humanities, and (12) Materials Science. We then filter for papers with complete authors’ affiliation information, resulting in 25,933 papers with complete institution information and 16,645 papers with complete department information. Overall, the entropy shows an increase based on authors’ affiliated institution (coefficient=0.73𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡0.73coefficient=0.73italic_c italic_o italic_e italic_f italic_f italic_i italic_c italic_i italic_e italic_n italic_t = 0.73) and department (coefficient=0.81𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡0.81coefficient=0.81italic_c italic_o italic_e italic_f italic_f italic_i italic_c italic_i italic_e italic_n italic_t = 0.81) information (see Figure 1(b)).

Refer to caption
Figure 1: (a) The distribution of papers in the collection by field. (b) The temporal pattern of entropy and paper count based on authors’ affiliated institution (top) and department (bottom) information, respectively. In both subplots, the x-axis denotes the date by year. The primary y-axis denotes the Shannon Entropy calculated using Equation 2, while the secondary y-axis indicates the count of papers. Each point represents the averaged entropy based on papers published in one quarter. The dotted gray line implies the date when ChatGPT was released by OpenAI.

3.1 Analysis of collaboration diversity

Figure 2(a) and Figure 2(b) show the collaboration diversity measured by Shannon Entropy for authors’ institutions and departments, respectively. It is interesting to notice that nearly all fields exhibit a sharp increasing trend after the release of ChatGPT. Before that date, only Computer Science shows a comparatively larger number of LLM-related studies. This is possible because BERT models, such as DeBERTa and RoBERTa, were widely studied before the release of ChatGPT. It is also noteworthy that other fields, such as Biochemistry, Genetics & Molecular Biology, Psychology, Neuroscience, Social Science, Engineering, and Arts & Humanities, demonstrated significant applications of LLMs even before the release of ChatGPT. After the release of ChatGPT, one of the most interesting observations regarding the trend is that Medicine shows a sharp and significant increase, with the peak close to that of Computer Science. Similarly, Business, Management & Accounting and health Professions have also witnessed a sharp and increasing trend after the release of ChatGPT.

Regarding the entropy analysis based on institutions and departments, the Shannon Entropy trends in Figure 2(a) and Figure 2(b) closely align for each respective field. Compared to 2023, most fields show a minor increase in the averaged entropy in 2024. Exceptions include Biochemistry, Genetics & Molecular Biology, Engineering, Neuroscience, Business, Management, & Accounting, and Materials Science. Moreover, the variance of entropy is much wider before ChatGPT’s release, possibly due to fewer papers on the topic of LLMs and more diverse collaborations. One possible explanation for this pattern is that researchers from other fields might have sought partnerships with Computer Science experts for such research topics.

In addition, the observed changes in entropy suggest a potential shift in research collaboration dynamics following the introduction of ChatGPT across several fields. First, papers in the field of Computer Science display a stable increasing trend. A plausible explanation for this is that many Computer Science researchers are seeking collaborations with domain experts, focusing on areas such as AI for science applications or AI for social good. However, Medicine exhibits a notable decrease in entropy. This is the only field showing a statistically significant decrease in entropy after ChatGPT’s release, while other fields display either minor increases or negligible changes. These findings highlight the varying impacts of ChatGPT’s introduction on interdisciplinary collaborations across different academic fields.

Then, we conduct a Wilcoxon rank-sum test to compare the entropy before and after the release of ChatGPT, with detailed results presented in Appendix B (see Figure 5). Our analysis reveals statistically significant changes in entropy across several fields. Computer Science, Biochemistry, Genetics & Molecular Biology, and Psychology show a statistically significant increase in entropy, while Medicine displays a statistically significant decrease. The other fields do not show any significant increase or decrease in the entropy. In particular, we observe that the mean of the entropy distribution for most fields remains at 0, with the first quartile consistently at 0. This suggests that many researchers primarily collaborate with colleagues from their own institutions or departments on LLM-related topics.

We conduct additional Wilcoxon rank-sum tests to compare entropy across different fields, with results presented as heatmaps in Figure 6. Several consistent patterns emerge when examining both institution and department results before and after ChatGPT’s release. Certain fields like Computer Science, Medicine, and Neuroscience exhibit more significant differences compared to other fields, as evidenced by darker blue cells in the heatmaps. Following ChatGPT’s release, there is also an increase in significant differences across fields, illustrated by a higher prevalence of darker cells in the right heatmap in Figure 6, compared to their counterparts in the left heatmap of the same figures. Another interesting observation is that Computer Science consistently demonstrates significantly lower entropy than Medicine, Biochemistry, Genetics & Molecular Biology, Health Professions, and Neuroscience.

Refer to caption
Refer to caption
Figure 2: Collaboration diversity over time based on (a) authors’ institutions and (b) authors’ departments. In both plots, the primary y-axis denotes the Shannon Entropy calculated using Equation 2, while the secondary y-axis indicates the count of papers. Panel (a) is based on 25,933 papers with complete institution information, while Panel (b) is based on 16,645 papers with complete department information. Each point in the panel represents the averaged entropy based on papers published in one year. The dotted gray line implies the date when ChatGPT was released by OpenAI.

3.2 Analysis of collaboration network

We analyze the co-authorship networks in terms of structural cohesion, topology, community structures, and influential researchers with respect to various dimensions of centrality. The network metrics are presented in Table 1, and the network visuals are presented in Figure 3.

Based on Table 1, the institution-based and department-based networks both have low density (0.0001) but high clustering coefficients (0.61 and 0.75, respectively), indicating that while direct collaborations are limited, existing ones form tight-knit clusters. The largest components comprise a notable portion of all network edges (77% for institutions network, and 70% for departmental affiliation network), meaning that most researchers are part of research communities that are reachable from each other.

The relatively high average shortest path length (16.74 for institution-based, and 16.61 for department-based networks) suggests extensive reach within the networks, indicating that while direct connections between researchers might be limited, there are alternative paths that connect them. The primary difference is that the department-based network exhibits a stronger fit towards a power-law distribution (α𝛼\alphaitalic_α=2.32), suggesting that central departments, namely Computer Science, and Medicine, are major hubs that facilitate numerous collaborations. On the other hand, the institution-based network (α𝛼\alphaitalic_α=4.06) does not exhibit a clear hub-and-spokes structure, indicating a more evenly distributed pattern of collaborations across various institutions.

Degree centrality results highlight the prominence of Computer Science, Medicine, and other Computer Science-related departments in LLM research. Furthermore, the high betweenness centrality of these same departments highlights their roles in facilitating collaboration between departments that would have otherwise been unconnected. As illustrated in Figure 3, the clusters of departments between co-authors not only demonstrate the dominance of Computer Science and Medicine and their related disciplines (the orange cluster and the green cluster for Computer Science and Medicine-related disciplines, respectively) in the network, but also shows that these two disciplines connect fields with little to no overlap. For instance, Medicine is in the shortest path between Engineering and Social Science in 196 instances. Similarly, Computer Science is in the shortest path between Medicine and Psychology and Neuroscience for 384 instances.

With respect to key institutions (the “Degree Centrality” and “Betweenness Centrality” rows in Table 1), Stanford University is consistently the highest in terms of degree and betweenness centrality, indicating their active involvement as both collaborators and facilitators of collaboration between institutions in LLM research. While Harvard University and University College London are not as central in terms of direct collaborations with other universities, their high betweenness indicates they actively connect institutions with each other, as well as with industrial tech companies: for instance, University College London connects Google (United States) and University of Zurich (162 instances); and Harvard University connects Allen Institute and University of Washington (210 instances).

Closeness centrality results show how researchers from departments and institutions are, on average, reachable to other researchers. Top 5 departments with the highest closeness centrality are in language-related disciplines, such as Computational Language Modeling, German Language, and Literature & Education, which shows the extensive application of LLMs in domains where language analysis or content generation is needed. Top 5 institutions in terms of closeness centrality are primarily universities from countries outside the U.S. (see Table 1). This finding shows that LLM-related research is a global effort, with international institutions acting as connectors across geographical boundaries.

Eigenvector centrality results show more consistency with degree centrality and betweenness centrality results than with closeness centrality, highlighting researchers who are not only well-connected but also connected to other influential researchers. Researchers with the highest eigenvector centrality scores are also from Medicine, Computer Science, and their related fields. Biomedical Informatics has the highest eigenvector centrality, as it is directly connected to Medicine and Computer Science, the two most influential departments in our dataset. The eigenvector centrality analysis for institutions underscores the critical role of medical research institutes, such as the University of Colorado Anschutz Medical Campus, European Bioinformatics Institute, and Lawrence Berkeley National Laboratory, in facilitating collaborations that significantly impact the LLM research community.

Refer to caption
Figure 3: Co-authorship networks based on authors’ (a) institution and (b) department affiliations. Each node represents a researcher’s institution or department, and each edge represents a co-authorship between pairs of researchers from the respective institutions or departments. Node colors represent the clusters to which the nodes belong, determined based on the Louvain modularity. Node labels represent the top 20% of nodes ranked by degree centrality. Edge thickness represents the frequency of co-authorship between the connected researchers.
Table 1: Network metrics based on authors’ institutions and departments
Metric Institutions Departments
No. Nodes 8391 6604
No. Edges 35080 16811
Density 0.001 0.0008
Avg. Shortest Path 16.74 16.61
Clustering Coeff. 0.61 0.75
No. Components 1569 1678
Largest Component 6485 4616
Power-law Exponent 4.06 2.32
No. Comm.(Louvain) 1613 1730
No. Comm. (CNM) 1684 1775
Degree Centrality Stanford University (343) Computer Science (835)
(top 5) University of Washington (287) Medicine (518)
Peking University (262) Computer Science & Eng. (422)
Tsinghua University (256) Artificial Intelligence (289)
University of Waterloo (227) Engineering (226)
Betweenness University College London (0.028) Computer Science (0.089)
Centrality Harvard University (0.026) Computer Science & Eng. (0.038)
(top 5) Stanford University (0.021) Medicine (0.035),
University of Waterloo (0.021) Psychology (0.034)
University of Washington (0.021) Engineering (0.034)
Closeness Centrality University of Craiova (0.5) Computational Lang. Modeling (0.50)
(top 5) Constantin Brâncuşi Univ. (0.5) German Language (0.50)
Nutrition Center of Philippines (0.5) Information & Architecture (0.50)
West Visayas State University (0.5) Literature & Education (0.50)
Yalova University (0.5) Vet. Public Health & Epidem. (0.50)
Eigenvector Univ. of Colorado Anschutz (0.61) Biomedical Informatics (0.45)
Centrality Lawrence Berkeley Natl. Lab. (0.54) Computer Science (0.40)
(top 5) European Bioinfo. Institute (0.37) Medicine (0.38)
Critical Path Institute (0.31) Envir. Genomics & Sys. Biology (0.30)
University of Illinois System (0.11) Bioinformatics (0.26)

4 Discussion

4.1 Key findings and implications

Collaboration diversity results, as well as co-authorship structures, reveal the notable and complex effects that LLMs have on fostering interdisciplinary and cross-institutional collaborations. The integration of LLMs into research areas has led to a potential shift in collaboration patterns. Below, we summarize several key findings and implications.

First, there has been an overall significant increase in the number of publications on LLM-related topics since the advent of ChatGPT in 2022. This surge extends beyond Computer Science and artificial intelligence to disciplines such as Medicine, Social Science, and Engineering. Additionally, the overall entropy, which measures the diversity of collaboration, has increased. This observation implies that collaboration diversity has broadened across these fields.

The impact on specific fields varies. Computer Science sees a stable increase in entropy, suggesting that computer scientists continue to seek collaboration with researchers from other disciplines. This trend aligns with our observation that researchers are increasingly exploring topics in AI for social good or AI for science [27]. Health-related fields, including Medicine, Neuroscience, Health Professions, and Biochemistry, Genetics, & Molecular Biology, exhibit higher entropy compared to Computer Science, Social Sciences, and Engineering. However, Medicine is the only area showing a significant decrease in entropy. One possible explanation is that the advancement of LLMs, particularly with their simple deployment interfaces, could enable researchers from disciplines like Medicine to more effectively use these AI models for domain-specific issues compared to the past. This is already being seen in applications such as drug discovery and repurposing [28].

Therefore, while LLMs might be associated with broader collaborations across disciplines and institutions, they may also encourage specialized collaborations within certain fields, such as Medicine, where domain-specific knowledge is often needed. In particular, medical-related research may have specific data privacy requirements, such as HIPAA regulations for electronic health records [29], as well as domain-specific expertise, such as in pathology [30] and clinical diagnosis [31], to evaluate whether LLM-predicted outputs are correlated with improved patient outcomes. Unlike previous AI models, researchers in Medicine can now use LLMs to address domain-specific issues more effectively without always seeking external collaborations.

These findings suggest that interdisciplinary collaboration may foster in contexts where LLMs may be used for analyzing and generating data that could be applicable to multiple disciplines, as opposed to health-related fields like Medicine where the data is typically specific to that field. Researchers in these disciplines can now use LLMs to address domain-specific issues more effectively without always seeking external collaborations. Additionally, creating clear guidelines and support for managing data privacy, ethical considerations, and domain-specific adaptations of LLM technologies can be crucial for leveraging their full potential in sensitive fields like Medicine.

Structural analysis of co-authorship connections shows that despite differences in collaboration diversity after the advent of ChatGPT, Computer Science and Medicine remain the most represented disciplines in the network. In particular, researchers from these two disciplines (via departmental affiliation) have the highest degree and betweenness centrality scores, indicating their roles as both active researchers and facilitators of cross-field collaborations. This finding is especially important with regard to the field of Computer Science because, despite their own papers having less disciplines represented (less entropy compared to other disciplines), they are structurally necessary to connect other disciplines together. For instance, Medicine bridges Computer Science and Engineering and Information Technology (334 times), Engineering and Social Science (196 times), and Neurosurgery and Intelligent Technology and Engineering (144 times). Relatedly, the influence of Medicine is also reflected in the cross-institutional analysis, where medical institutions and associated national laboratories (e.g. University of Colorado Anschutz Medical Campus, European Bioinformatics Institute, Jackson Laboratory) are the most influential in terms of eigenvector centrality, highlighting their central roles in connecting with other influential institutions and facilitating extensive cross-institutional collaborations.

Overall, the study of author collaboration and network analysis provides valuable insights into the dynamics and patterns of interdisciplinary research in LLMs, highlighting how knowledge and expertise are exchanged across various fields. Our analysis reveals that the diversity of academic collaborations increases overall, but varies across disciplines. Fields with methodological expertise and domain generality like Computer Science exhibit an increasing collaboration diversity as they might have sought for applications with other different fields, such as Psychology and Social Sciences. In contrast, disciplines like Medicine show a decreased diversity in collaboration after the release of ChatGPT due to the necessity of domain-specific knowledge in addition to methodological expertise and the more effective application of these AI models compared to the past. Extensive cross-institutional and cross-country collaborations are also facilitated by two dominant fields, Computer Science and Medicine. Leading research institutions and associated national labs in the U.S., U.K., and China are actively conducting research using LLMs and fostering interdisciplinary partnerships while sharing expertise and resources.

4.2 Opportunities for future work

This study presents several avenues for future research, each addressing current limitations and opportunities for expansion. One primary constraint lies in the data quality of OpenAlex, which has been shown to have missing information issues, particularly in author affiliation and abstract details [32]. These gaps could potentially lead to a loss of valuable insights from papers or preprints not fully captured by OpenAlex. Future research could benefit from conducting sensitivity analyses to assess the impact of these missing values on the patterns observed in this study, thereby providing a more robust understanding of the data’s reliability and comprehensiveness.

Another area for improvement concerns OpenAlex’s categornization of disciplines. While they utilize a BERT-based model to identify topics and subsequently map them to fields based on Scopus’s ASJC structure (see Appendix C), there is a lack of clear statement regarding the training and testing data size, as well as the reported performance metrics for discipline classification. A valuable direction for future work would be to evaluate the accuracy and reliability of this discipline classification system, potentially developing more refined or domain-specific categorization methods for LLM-related research.

Next, future research could focus on continuously collecting papers and expanding the scope to encompass the broader concept of generative AI, such as multimodal LLMs (e.g., GPT-4o). Given that the landscape of LLMs has undergone dramatic changes, with new models and breakthroughs emerging at an unprecedented pace, this work could allow for a more comprehensive longitudinal analysis of trends, innovations, and shifts in academia. Additionally, we could explore the interplay between academic research and industry developments and track how breakthroughs in LLMs influence and are influenced by practical applications. This could provide valuable insights into the trajectory of LLM and generative AI development and help identify emerging subfields, interdisciplinary connections, and potential areas for future breakthroughs.

Last, while this study conducts a before-and-after comparison, the observed differences should not be interpreted as causal, given the inherent time-varying trends in collaboration along with other confounding effects are not well controlled. Preliminary analyses using the Bayesian Structural Time Series (BSTS) model are conducted in this study (see Figure 7); however, a more robust quasi-experimental design incorporating control and treatment groups could merit further exploration to infer the causal effects of LLMs on collaboration patterns.

Declarations

  • Conflict of interest/Competing interests.
    The author declares no competing interests.

  • Data availability.
    The data files can be accessed at: https://doi.org/10.5281/zenodo.13118978

  • Code availability.
    The code files can be accessed at: https://github.com/Lingyao1219/llm-science

Appendix A Data Preparation

A.1 Data preparation workflow

Figure 4 outlines the process for collecting and cleaning the dataset of LLM-relevant papers for our analysis. The process begins with a broad search using general terms related to LLMs and popular models based on the MMLU benchmark (see subsection A.2) [24], spanning from October 2018 to June 2024. We apply this search to the title and abstract to avoid excessive noise in the dataset. This initial search yields a collection of 97,242 papers, which then undergo a series of filtering steps to enhance relevance and remove duplicates.

The cleaning process involves several steps, each progressively narrowing down the dataset. First, the collection is limited to preprints and articles, reducing it to 87,489 papers. The next step involves removing duplicates based on titles. Additionally, since a preprint might change its title upon official publication, we examine papers with slight variations between preprints and formal publications using Jaccard similarity (see subsection A.3). After reviewing the papers with Jaccard similarity, we find that a paper with a similarity score greater than 0.6 is very likely to be duplicated, and we remove those potential duplicates. However, it is still possible for a paper to contain LLM-related keywords but not relevant to LLM studies, such as the keyword “PaLM” which often appears in papers discussing “palm trees” or “palm oil.” To address this issue, we employ the GPT-4o model to support the evaluation of the relevance of a paper (see subsection A.4). This filtering process results in a final set of 50,391 papers, representing a focused and highly relevant collection for our analysis. The distribution of these 50,391 papers by field is presented in Figure 1.

Refer to caption
Figure 4: An illustrative workflow for paper collection and cleaning.

A.2 Paper collection

Table 2 below shows the specific search terms used to collect papers from OpenAlex. The search terms include two general terms, “large language model” and “LLM,” along with popular open-source models (e.g., BERT, Flan-T5, LLaMA) and closed-source models (e.g., ChatGPT, Claude) based on the MMLU benchmark [24].

Table 2: LLM-related search terms
Category Search terms
General LLM terms (Large language model) OR LLM
Close-sourced (GPT 2) OR (GPT 3) OR (GPT 4) OR ChatGPT OR (Claude Instant) OR (Claude 1.3) OR (Claude 2) OR (Claude 3) OR (Google PaLM) OR (Google Gemini) OR (PaLM 2) OR (Gemini Pro) OR (Gemini Ultra) OR (davinci-002) OR (davinci-003) OR (Chinchilla 70B)
Open-sourced BERT OR RoBERTa OR (Meta LLaMA) OR (LLaMA 2) OR (LLaMA 3) OR Mistral OR Mixtral OR Qwen OR DBRX OR (Falcon 40B) OR (Falcon 7B) OR (Falcon 180B) OR (OPT 66B) OR (BLOOM 176B) OR (GLM 130B) OR (Flan T5) OR (Flan PaLM)

A.3 Duplicates removal

We first remove the duplicated papers based on the title information. This handling reduces the collection of papers to 70,644. Given that a preprint might change its title when officially published, we then use the Jaccard similarity coefficient to identify works with variations in titles but are actually the same. The Jaccard similarity coefficient is a measure of similarity between two sets. It is defined as the size of the intersection divided by the size of the union of the sets, as shown below:

J(A,B)=|AB||AB|𝐽𝐴𝐵𝐴𝐵𝐴𝐵J(A,B)=\frac{|A\cap B|}{|A\cup B|}italic_J ( italic_A , italic_B ) = divide start_ARG | italic_A ∩ italic_B | end_ARG start_ARG | italic_A ∪ italic_B | end_ARG (4)

In our context, sets A𝐴Aitalic_A and B𝐵Bitalic_B represent the collections of words from the titles and author names of a preprint and its paired articles, respectively. |AB|𝐴𝐵|A\cap B|| italic_A ∩ italic_B | denotes the number of common words between the titles and author names of the preprint and its paired articles, while |AB|𝐴𝐵|A\cup B|| italic_A ∪ italic_B | is the total number of unique words in both titles and author names combined. We output all works with J(A,B)0.5𝐽𝐴𝐵0.5J(A,B)\geq 0.5italic_J ( italic_A , italic_B ) ≥ 0.5 and find that J(A,B)0.6𝐽𝐴𝐵0.6J(A,B)\geq 0.6italic_J ( italic_A , italic_B ) ≥ 0.6 often implies the preprint and the published article are the same paper. Therefore, we remove all those preprints showing a J(A,B)0.6𝐽𝐴𝐵0.6J(A,B)\geq 0.6italic_J ( italic_A , italic_B ) ≥ 0.6.

A.4 Relevance evaluation

This step involves filtering out papers that are not actually discussing LLM-related topics, even if their abstracts contain key terms such as “Palm.” We take two steps to do the filtering. First, we consider papers with identified topics (provided by OpenAlex) including “natural language processing,” “artificial intelligence,” “machine learning,” “text mining,” “deep learning,” “transfer learning,” “question answering,” and “speech recognition” as LLM-relevant. Then, we use GPT-4 to determine their relevance based on the title, abstract, and topics. The prompt is designed as follows:

Please identify if the following paper is related to the topic of large
language models (LLMs) or involves the use of LLMs based on the
following provided information.
----------
Paper Title: {title}
Abstract (if available): {abstract}
Topics (if available): {topics}
----------
If the abstract is unavailable, please use the title and topics to make
your determination. Please note that a paper mentioning concepts like
neural network’, machine learning’, artificial intelligence’, or
any NLP tasks always suggests a connection to LLMs. For example, a
paper titled The improved neural network model in humor detection
is very likely to involve LLMs. Respond Yes for such papers. Please
only respond with either Yes or No’. Please do not return other
output.

To validate the classification, we manually reviewed 190 randomly selected papers, of which 86 are classified as irrelevant and 104 as relevant based on our manual annotation. We then compare these results with those returned by GPT-4o. We use F1-score and accuracy to evaluate performance. Overall, GPT-4’s identification achieves an agreement of 0.96 with manual annotation, with F1-scores of 0.96 for both irrelevant and relevant classes.

Appendix B Statistical Analysis

B.1 Wilcoxon rank-sum test

Figure 5 and Figure 6 provide statistical support for the analysis presented in subsection 3.1. Both figures employ the Wilcoxon rank-sum test, a non-parametric method ideal for comparing two groups of data that are either interval-scaled or not normally distributed, to quantify the statistical significance of observed differences in entropy. Figure 5 examines whether there is a significant difference in entropy before and after the release of ChatGPT, with subfigure (a) illustrating the entropy distribution and test results based on authors’ institutions, and subfigure (b) presenting the same analysis for authors’ departments. Figure 6 investigates whether two fields display significant differences in entropy, with subfigure (a) showing the Wilcoxon rank-sum test results comparing entropy across two fields before and after ChatGPT’s release based on authors’ institutions, and subfigure (b) presenting the same comparison based on authors’ departments.

Refer to caption
Figure 5: The entropy distribution in terms of each field before (Blue) and after (Red) the ChatGPT’s release based on (a) authors’ affiliated institutions, and (b) authors’ affiliated departments. The x-axis shows the fields, and the y-axis denotes the Shannon entropy. The Wilcoxon rank-sum test is conducted to compare the distribution of entropy before and after ChatGPT’s release.
Refer to caption
Figure 6: The Wilcoxon rank-sum test result across two fields before (left) and after (right) ChatGPT’s release based on (a) authors’ affiliated institutions, and (b) authors’ affiliated departments. The lightest blue color implies that the two fields do not show significant differences in entropy, while the darker blue color represents the difference is more statistically significant.

B.2 Bayesian structural time series analysis

The entropy changes after the launch of ChatGPT cannot be solely attributed to causal effects due to inherent trends in entropy over time. To address this, a Bayesian structural time series (BSTS) model is fitted for each field for time series forecasting and causal inference [33, 34]. Figure 7 displays results where the impact of ChatGPT is statistically significant. Trained by pre-intervention data, the BSTS model predicts post-intervention entropy (the dashed line in each panel’s top subfigure). The difference between these predictions and observations provides an estimate of the intervention’s pointwise effect (second subfigure in each panel), and their cumulative sum indicates the total effect over time (third subfigure in each panel). As shown, the release of ChatGPT consistently leads to a statistically significant increase in the entropy in Computer Science. It also leads to a statistically significant decrease in the entropy in Medicine based on institutions, and a statistically significant decrease in the entropy in Business, Management & Accounting based on departments. Other fields, however, do not show significant impacts by ChatGPT based on the BSTS causal inference.

Refer to caption
Figure 7: BSTS outcomes for entropy based on authors’ affiliated (a) institutions and (b) departments. The results plotted only include fields where the release of ChatGPT in November 2022 has shown a statistically significant impact.

Appendix C OpenAlex topic and field classification

OpenAlex has identified 4,516 topics based on the publication-level classification system developed by Waltman and Van Eck [35]. The classification of topics for each paper is based on OpenAlex’s proprietary model, which fine-tunes the multilingual BERT (mBERT) model for topic classification. The model’s input comprises a paper’s title, abstract, and citations, although only approximately half of the papers have usable abstracts. According to OpenAlex’s performance report, when all information for a paper is available, the model achieves a (top K = 1) accuracy of 0.72. The (top K = 1) accuracy refers to the percentage of samples where the correct label appears as the top prediction. On average, their model achieves a (top K = 1) accuracy of 0.53 and a (top K = 3) accuracy of 0.73. OpenAlex provides up to three topics for each paper [23].

Subsequently, OpenAlex establishes a one-to-one relationship between these topics and higher-level fields. The fields are organized in a hierarchical structure, including subfields, fields, and domains, derived from Scopus’s ASJC (All Science Journal Classification) structure. This matching process is conducted using an LLM and further verified by OpenAlex’s annotators. As a result, each paper can be associated with up to three fields, corresponding to the three identified topics [23].

While the classification performance for field classification is not explicitly reported, it is reasonable to assume that the accuracy for mapping the 4,516 topics to 26 fields could be significantly higher than the reported (top K = 1) accuracy for topic classification. This assumption is based on the fact that the field classification represents a narrowing-down process from a larger set of topics to a smaller set of fields [23].

Appendix D Network analysis for countries

The network metrics based on authors’ affiliated country information are presented in Table 3, and the network visuals are presented in Figure 8.

Refer to caption
Figure 8: Co-authorship networks based on countries of researchers. Each node represents a researcher’s country of affiliation, and each edge represents a co-authorship between pairs of researchers from the respective country. Node colors represent the clusters to which the nodes belong, determined based on Louvain modularity. Node labels represent the top 20% of nodes ranked by degree centrality. Edge thickness represents the frequency of co-authorship between the connected researchers.
Table 3: Network metrics based on authors’ affiliated countries
Metric Countries
No. Nodes 155
No. Edges 1745
Density 0.15
Avg. Shortest Path 9.36
Clustering Coeff. 0.77
No. Components 16
Largest Component 140
Power-law Exponent 5.22
No. Comm.(Louvain) 20
No. Comm. (CNM) 19
Degree Centrality (top 5) US (109), GB (91), CN (81), DE (79), CA (75)
Betweenness Centrality (top 5) PL (0.056), GB (0.053), CA (0.051), FR (0.050), TR (0.050)
Closeness Centrality (top 5) PT (0.157), MX (0.157), QA (0.155), TR (0.154), SY (0.154)
Eigenvector Centrality (top 5) US (0.630), CN (0.498), GB (0.395), DE (0.222), CA (0.194)

References

  • \bibcommenthead
  • Shi and Evans [2023] Shi, F., Evans, J.: Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines. Nature Communications 14, 1641 (2023) https://doi.org/10.1038/s41467-023-36741-4
  • Dinh et al. [2024] Dinh, L., Barley, W.C., Johnson, L., Allan, B.F.: Hyperauthored papers disproportionately amplify important egocentric network metrics. Quantitative Science Studies, 1–24 (2024)
  • Venturini et al. [2024] Venturini, S., Sikdar, S., Rinaldi, F., Tudisco, F., Fortunato, S.: Collaboration and topic switches in science. Scientific Reports 14(1), 1258 (2024) https://doi.org/10.1038/s41598-024-51606-6 . Number: 1 Publisher: Nature Publishing Group. Accessed 2024-02-20
  • Khraisha et al. [2024] Khraisha, Q., Put, S., Kappenberg, J., Warraitch, A., Hadfield, K.: Can large language models replace humans in systematic reviews? evaluating gpt-4’s efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages. Research Synthesis Methods (2024)
  • Le [2023] Le, F.: How chatgpt is transforming the postdoc experience. Nature 622, 655 (2023)
  • Pilny et al. [2024] Pilny, A., McAninch, K., Slone, A., Moore, K.: From manual to machine: assessing the efficacy of large language models in content analysis. Communication Research Reports, 1–10 (2024)
  • Byun et al. [2023] Byun, C., Vasicek, P., Seppi, K.: Dispensing with humans in human-computer interaction research. In: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–26 (2023)
  • Ferruz et al. [2022] Ferruz, N., Schmidt, S., Höcker, B.: Protgpt2 is a deep unsupervised language model for protein design. Nature communications 13(1), 4348 (2022)
  • Yang et al. [2022] Yang, X., Chen, A., PourNejatian, N., Shin, H.C., Smith, K.E., Parisien, C., Compas, C., Martin, C., Costa, A.B., Flores, M.G., et al.: A large language model for electronic health records. NPJ digital medicine 5(1), 194 (2022)
  • Savage [2023] Savage, N.: Drug discovery companies are customizing chatgpt: here’s how. Nat Biotechnol 41(5), 585–586 (2023)
  • Ma et al. [2024] Ma, Y., Cui, C., Cao, X., Ye, W., Liu, P., Lu, J., Abdelraouf, A., Gupta, R., Han, K., Bera, A., et al.: Lampilot: An open benchmark dataset for autonomous driving with language model programs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15141–15151 (2024)
  • Kuckreja et al. [2024] Kuckreja, K., Danish, M.S., Naseer, M., Das, A., Khan, S., Khan, F.S.: Geochat: Grounded large vision-language model for remote sensing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27831–27840 (2024)
  • Argyle et al. [2023] Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J.R., Rytting, C., Wingate, D.: Out of one, many: Using language models to simulate human samples. Political Analysis 31(3), 337–351 (2023)
  • Li et al. [2024] Li, L., Fan, L., Atreja, S., Hemphill, L.: “hot” chatgpt: The promise of chatgpt in detecting and discriminating hateful, offensive, and toxic comments on social media. ACM Transactions on the Web 18(2), 1–36 (2024)
  • Wu et al. [2023] Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., Mann, G.: Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023)
  • Susarla et al. [2023] Susarla, A., Gopal, R., Thatcher, J.B., Sarker, S.: The janus effect of generative ai: Charting the path for responsible conduct of scholarly activities in information systems. Information Systems Research 34(2), 399–408 (2023)
  • Dwivedi et al. [2023] Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., Baabdullah, A.M., Koohang, A., Raghavan, V., Ahuja, M., et al.: Opinion paper:“so what if chatgpt wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. International Journal of Information Management 71, 102642 (2023)
  • Einarsson et al. [2024] Einarsson, H., Lund, S.H., Jónsdóttir, A.H.: Application of chatgpt for automated problem reframing across academic domains. Computers and Education: Artificial Intelligence 6, 100194 (2024)
  • Owens [2023] Owens, B.: How nature readers are using chatgpt. Nature 615(7950), 20 (2023)
  • Agathokleous et al. [2023] Agathokleous, E., Saitanis, C.J., Fang, C., Yu, Z.: Use of chatgpt: What does it mean for biology and environmental science? Science of The Total Environment 888, 164154 (2023)
  • Meyer et al. [2023] Meyer, J.G., Urbanowicz, R.J., Martin, P.C., O’Connor, K., Li, R., Peng, P.-C., Bright, T.J., Tatonetti, N., Won, K.J., Gonzalez-Hernandez, G., et al.: Chatgpt and large language models in academia: opportunities and challenges. BioData Mining 16(1), 20 (2023)
  • Barley et al. [2022] Barley, W.C., Dinh, L., Workman, H., Fang, C.: Exploring the relationship between interdisciplinary ties and linguistic familiarity using multilevel network analysis. Communication Research 49(1), 33–60 (2022) https://doi.org/10.1177/0093650220926001
  • Priem et al. [2022] Priem, J., Piwowar, H., Orr, R.: Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022)
  • Hendrycks et al. [2020] Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.: Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020)
  • Breiger [1974] Breiger, R.L.: The duality of persons and groups. Social Forces 53(2), 181–190 (1974)
  • Newman [2005] Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemporary physics 46(5), 323–351 (2005)
  • Tomašev et al. [2020] Tomašev, N., Cornebise, J., Hutter, F., Mohamed, S., Picciariello, A., Connelly, B., Belgrave, D.C., Ezer, D., Haert, F.C.v.d., Mugisha, F., et al.: Ai for social good: unlocking the opportunity for positive impact. Nature Communications 11(1), 2468 (2020)
  • Chakraborty et al. [2023] Chakraborty, C., Bhattacharya, M., Lee, S.-S.: Artificial intelligence enabled chatgpt and large language models in drug target discovery, drug discovery, and development. Molecular Therapy-Nucleic Acids 33, 866–868 (2023)
  • Jiang et al. [2023] Jiang, L.Y., Liu, X.C., Nejatian, N.P., Nasir-Moin, M., Wang, D., Abidin, A., Eaton, K., Riina, H.A., Laufer, I., Punjabi, P., et al.: Health system-scale language models are all-purpose prediction engines. Nature 619(7969), 357–362 (2023)
  • Lu et al. [2024] Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Zhao, M., Chow, A.K., Ikemura, K., Kim, A., Pouli, D., Patel, A., et al.: A multimodal generative ai copilot for human pathology. Nature, 1–3 (2024)
  • Peng et al. [2023] Peng, C., Yang, X., Chen, A., Smith, K.E., PourNejatian, N., Costa, A.B., Martin, C., Flores, M.G., Zhang, Y., Magoc, T., et al.: A study of generative large language model for medical research and healthcare. NPJ digital medicine 6(1), 210 (2023)
  • Zhang et al. [2024] Zhang, L., Cao, Z., Shang, Y., Sivertsen, G., Huang, Y.: Missing institutions in openalex: possible reasons, implications, and solutions. Scientometrics, 1–23 (2024)
  • Brodersen et al. [2015] Brodersen, K.H., Gallusser, F., Koehler, J., Remy, N., Scott, S.L.: Inferring causal impact using bayesian structural time-series models. Annals of Applied Statistics 9, 247–274 (2015)
  • Hu and Chen [2021] Hu, S., Chen, P.: Who left riding transit? examining socioeconomic disparities in the impact of covid-19 on ridership. Transportation Research Part D: Transport and Environment 90, 102654 (2021)
  • Waltman and Van Eck [2012] Waltman, L., Van Eck, N.J.: A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology 63(12), 2378–2392 (2012)