This repository contains code and data related to the CSET Truth, Lies, and Automation: How Language Models Could Change Disinformation report. Some materials have not been provided publicly for proprietary or privacy reasons. Examples of withheld materials are some articles used in training sets and survey response data. Other materials are not being released because we believe the benefits do not outweight the risk. We chose to provide the outputs from fine-tuned models that write articles with a particular slanted worldview but we are not providing the fine-tuned models themselves. We recognize that the models are not difficult to reproduce but are choosing not to release ready-made disinformation tools. Additionally, much of the code relies on access to GPT-3 which we do not provide. For these reasons, very little of the code here can be run as is. We hope that it is useful for anyone interested in understanding the mechanisms we explored or the data generated by GPT-3.
Folders correspond to the disinformation skills we evaluated as described below:
Data and code associated with "Truth, Lies, and Automation" report.
This repository contains code and some of the data used for the "Truth, Lies, and Automation" report published by CSET. Unfortunately, because most of this code relies on 1) API access to GPT-3, 2) data access to licensed or restricted content, and/or 3) survey data that cannot be published for privacy reasons, very little of this code can be fully run as uploaded here.
This code is divided across six files that correspond to the six skills tested in our paper. Each folder stands alone but the code within it may make calls to files that cannot be provided in this public repository. This Readme will briefly walk through the structure of each folder and skill.
This skill was meant to test the ability of GPT-3 to produce more tweets on a theme. It was the simplest test we ran and contains only one short jupyter notebook. This code includes a call to (a now-deprecated version of) the Twitter API to collect replies to the climate "skeptical" outlet @ClimateDepot, and code to sort these replies by number of likes in order to pull the ten most liked replies. These replies were then used to generate an additional five outputs from OpenAI. The original 500 tweets collected via API call are not published, except for the 10 used in our GPT-3 prompt.
This skill was meant to test the ability of GPT-3 to write full articles from a headline in such a way that it could reliably match the tone and writing style
of the publication responsible for the original headline:article pair. It contains three jupyter notebooks. The first—gpt3_generations.ipynb
—collects the data
used for this skill and generates GPT-3 outputs. It starts with an SQL query to the Lexis-Nexis news database to collect China-relevant articles from five
publications. (Nexis Metabase (2021). Nexis Metabase https://www.lexisnexis.com/en-us/products/data-as-a-service/academic.page) It then preprocesses the content of these articles and randomly selects 3,000 articles from each publication. Finally, it samples 25 headlines from
each source and uses them to generate a GPT-3 news-like output.
The second file—gpt2_finetuning.ipynb
—tests the ability of finetuning on GPT-2 to increase the match between a GPT output and a target publication's style.
It begins by creating .txt files for finetuning for each of our sources, excluding the content associated with the 25 headlines which will later be used for output
generation. Next it creates a finetuned instance of the GPT-2 small model for each of the five publications and generates an output from each finetuned model (and
from an untuned instance of GPT-2) for each of our 125 test headlines. Finally it loads these results into a dataframe.
The final classifier.ipynb
file tests the outputs of the GPT models using a simple naive Bayes classifier. Note that this file pares back to only three sources
for simplicity. It begins by loading the original content of three sources and vectorizing the body of each article using tf-idf vectorization and ngrams of length
1, 2, and 3. A naive Bayes classifer is then trained on 95% of these articles and tested on the remaining 5% to determine how easily a simple classifier can
identify the publisher of a previously unseen article. We then treat our GPT-2 and GPT-3 outputs as though they were published by the same publication that
published the headline used to generate each output and run these outputs through the same classifier to determine how easily it can identify the publication
source of our headline input. For our fine-tuned models, we use the fine-tuned model associated with the publication responsible for the input headline when we
determine which output to run through our classifier.
Note that throughout this section, while we have redacted files containing licensed content from Lexis-Nexis, we have included one large file—outputs_with_gpt2_sampled.csv
—that includes 50 of our 125 test headlines taken from Nexis Metabase (2021). Nexis Metabase https://www.lexisnexis.com/en-us/products/data-as-a-service/academic.page, where each headline is matched with the GPT-3 and GPT-2 outputs that correspond to it. These 50 headlines were randomly sampled from our 125 rows to give the reader a sense of the various types of outputs produced in this test.
In this skill, we were interested in whether or not we could use GPT-3 to take breaking news stories and "spin" them in ways that matched a pre-chosen slant.
This folder contains a key.csv file which includes the original opening paragraphs of five Associated Press articles on various topics as well as some key
parameters used for instructing GPT-3 on what to do. The Slant_Rewriting.ipynb
notebook contains code that performs the slant rewriting process in three steps:
(1) directing GPT-3 to generate a series of bullet point summaries for the original AP content, (2) using some automated quality checks to identify the best such
summary, and (3) using this summary to direct GPT-3 to write a slanted news story that makes use of the information provided in bullet points. Four GPT-3 rewrites
were generated for each of the five AP topics and saved to the rewritings.csv file.
These outputs were then evaluated by nine CSET researchers alongside the original AP articles and a few other real pieces of published content. Links to the original AP articles and the other real pieces of content can be found in the authentic_sources.txt
file. The results of this
evaluation are broken into two files: survey_results_authenticity.csv
, which includes the respondents' evaluations of each piece of writing's likely authenticity,
and survey_results_slants.csv
, which includes the respondents' evaluations of each piece of writing's slant. This latter file also includes fields to indicate
(1) the intended slant of a piece of writing (i.e. what direction we told GPT-3 to spin it or what slant we chose it to represent if it was a real article); (2)
the distance between the original Associated Press slant on a topic and the slant of the piece of writing in question; and (3) for GPT-3 outputs, whether the slant
of the rewrite moved in the direction relative to the Associated Press article that we asked the model to spin it. Finally, Survey_Analysis.ipynb
contains some
brief code analyzing these results.
This section was meant to test GPT-3's ability to mimic Q-style drops and contains one short notebook. This .ipynb includes three examples of Q drops and one GPT-3 output.
This section was meant to test GPT-3's ability to make convincing statements on both sides of a "fault line" in American society. The folder contains 4 .ipynb notebooks and 4 correspondending .csv datasets. The notebooks are simply calls to the OpenAI api with different input strings, nothing fancy. The .csv datasets are just manually selected and slightly edited extractions from some of the responses of the OpenAI api.
This section evaluated GPT-3's ability to persuade humans on international relations issues (Afghanistan withdrawal and Chinese sanctions) and tested whether it was better at persuading people if it aligned its statements with their political identity (Democrat vs Republican). The folder contains an .ipynb for generating the statements, a .csv dataset of manually-extracted statements, and an .ipynb for analyzing the survey results data. The survey data itself is not provided for human subjects protections.