About the Sources and Process

The DNSA’s Kissinger Collection comprises 15,502 telephone conversation transcripts (telcons) and 2163 meeting memoranda transcripts (memcons).

Example-Memcon-and-Telcon

Following declassification, these documents were gathered up by the DNSA, analyzed and curated, and hosted on their online site along with a page of metadata for each document. This data was scraped and converted into a table with a document for each row, and a column for every available metadata property.

Screen Shot 2014-08-01 at 6.16.19 AM

Now, with the metadata cleaned and organized, the documents were put thru Optical Character Recognition, which resulted in (roughly) a 6% margin of error when put through a limited spell check. These OCR results are interesting for a number of reasons, the spikes corrleating to documents where there were no correctly-spelled words because the documents were replaced with handwritten withdrawal slips, an unintended finding aid. It’s also important to note that if a document’s OCR process resulted in it recognizing a word as another, correctly spelled word (eg ‘see’ / ‘sea’) that would not count as an error in this calculation.

Screen Shot 2014-08-01 at 6.20.41 AM

The resulting text files (spell checked but not corrected) were then processed using a number of tools. For Word Frequency and Collocation we used AntConc:

Screen Shot 2014-08-01 at 6.23.42 AM

for Topic Modeling we used MALLET, and for Sentiment Analysis we used LIWC2007.

Leave a Reply