Category Archives: Topic Modeling

Topic Modeling Stream Graphs

The colored streams represent each of the 40 topics of the topic models created for the memcons (top) and the telcons (bottom). The pie graph at the right of each graph shows the relative proportion of topic weight for each month of correspondence. The difference in density between the memcons (which show more activity at the end of Kissinger’s tenure) and the telcons (which show more activity at the beginning) are explained in large part by his promotion to Secretary of State in 1974. Before that time, when he was National Security Advisor, Kissinger utilized telephone conversations to address most of the issues confronting him. After his promotion, he shifted to a more official forum of meetings and memoranda for most of his work.

This interactive diagram can be played back, and various months explored in more detail – for example, the largest spikes in the telcons and memcons correspond to the timing of Kissinger’s promotion to Secretary of State, and to meetings regarding the October 1973 Yom Kippur War and the resultant flurry of diplomatic activity to broker agreements between the combatants in May 1974.

Interactive Topic Model Stream Graphs

streamgraphs

Topic Modeling Area Graphs

The capability to go beyond merely counting word frequency to measuring the correlations in frequency between words is a powerful tool for computational historical research. This technique, called ‘topic modeling,’ relies upon complex probabilistic mathematics beyond the capabilities of most historians. Using a variant of MALLET (open-source topic modeling software), I have assembled topic models of the Kissinger collections. The initial results of this process resulted in a 40-category list for both the memcons and telcons collections. By compiling the topic modeling data and graphing each topic’s frequency data into an x/y line/area graph, a contextual, historical timeline emerges for each of the 40 Kissinger memcon and telcon topics. Peaks in the graphs indicate the dates of documents that contain the highest cumulative ‘weighting,’ or relevance, to that respective topic. Immediately, the data graphed on the timeline evokes questions: many of the peaks on the topic graphs synchronize well with related events in the historical record. Examining each topic graph in relation to these historical timelines is in itself a useful exercise for researchers in finding content related to a particular topic.

For example, those interested in reading documents most closely associated with the wars in Indochina and Kissinger’s Paris Peace Conference talks with Le Duc Tho and Xuan Thuy, Chairman Mao and Chou En-lai, the Cambodia Campaign and resulting public outcry in May 1970, the ‘Backchannel’ and SALT talks with Dobrynin, Gromyko, Brezhnev, or other topic areas of Kissinger’s activity can use these graphs to locate the relevant dates and documentation for their topics much more easily than by consulting a traditional index.

Memcons: Interactive Topic Model Area Graphs
memcons-months

Telcons: Interactive Topic Model Area Graphs
telcons-months

Topic Modeling Force-Directed Graphs (Interactive)

Memcons: Interactive Topic Model Force Graph
d3-memcons-force2

The placement of the ‘Cambodia’ topic outside that military arc, much closer to ‘Laughter’ than, say, ‘Vietnam’ or ‘Soviet,’ is very interesting, suggesting that the archive may contain only those documents of a less contentious or generic nature compared to those other topics.The “Cambodia” topic’s comparative proximity to the Laughter topic, clearly visible in this graph, reflects an uncharacteristically ‘jovial’ slant of the content of the documents in the Cambodia topic in comparison to those from the other topics of similarly grave military importance. It is an odd result that supports other findings that the archive’s “Cambodia” material on which this topic is based is likely a hand-picked, sanitized and non-representative selection of only the more congenial exchanges regarding Cambodia, specifically excluding tense and difficult situations. Memoranda detailing planning and execution of disavowed military incursions, involvement in the installation of the Lon Nol regime, and other incidents are largely absent from the archive. Computational techniques here combined with a subjective historian’s assessment of the inapplicability of ‘laughter’ to topics like Cambodia, have thus uncovered a strong relationship between a document’s classification status and its subject matter. Further interpretations of the proximity of the ‘laughter’ topic (among others) to these geopolitical foci are detailed in greater depth in the written paper.

Telcons: Interactive Topic Model Force Graph
(NOTE: may take a while to load)

telcons-force-thumb2

Topic Modeling performed using ‘MALLET Topic Modeling Toolkit.’

Topic Modeling Force-Directed Graphs (Static)

Instead of a more traditional x/y axis graph, each memcon in the archive and their relation to the 40 topics of the topic model are represented here using a ‘force-directed’ diagram. More than prior figures, this graph is off-putting at first and requires a bit of orientation to understand. Here each document is represented by one of a network of small circles, connected by lines and placed at a distance from the larger circles (the topics) according to their respective association to each topic. The size of the topic circles and their textual labels reflects the total weight afforded to them by the documents in the archive, and the color of the small documents’ circles and connecting lines reflects the classification status of each document.

Memcons: Static Topic Model Force Graph
KT-Topics3-withnegotiations

This graph elegantly demonstrates in one view the interrelated ‘clusters’ of documents by proximity, their classification status, and the complex ways in which all documents relate to their constituent topic(s) and to one another. Even more than the line/area graphs, this image synthesizes the information gathered through metadata analysis, n-gram counting, and topic modeling to present inter-relationships not always readily apparent from a tabular view of the underlying data.

The blue dots/lines represent documents with ‘Top Secret’ classification status, the yellow dots are ‘Secret,’ the pink dots are ‘Unclassified’ and the 40 topics of the topic model are displayed as grey circles with text. Documents sharing similar topic weightings are clustered together, and placed at a relative distance from those topics. The placement of documents and topics related to matters of high military or national security significance among the bluish upper left region is unsurprising, as is the placement of ‘laughter’ so far on the other side of the graph. It is also notable that this upper left hand area of the graph contains those countries at the heart of Nixon and Kissinger’s vaunted “triangular diplomacy.” The topics concerning Soviet Union, China, Vietnam, and related topics are all placed in close proximity to one another occupying a close-knit area of the graph, suggesting that when those topics were mentioned they were often mentioned together. There is another fascinating topic in this topic model revealed by this graph, one with a unique significance. The “Laughter” topic is based upon those documents in which the transcriber literally placed the phrase “[laughter],” representing jovial, lighthearted moments of Kissinger’s correspondence in which the participants had a chuckle. A historian would expect these sorts of emotional expressions to occur in inverse proportion to the gravity of their respective topics (for example, the least ‘laughter’ during those negotiations in which relations were at their most sensitive, tense and/or adversarial), and the placement of the “Laughter” topic at the furthest possible point from topics relating to the Soviet Union, China and Vietnam negotiations validates this interpretation.

Topic Model Line Graphs

These graph thumbnails and closeups detail the topic weighting for a specific topic of the 40 topics in the topic model, each laid out on a timeline. The red lines represent selected historical events (listed in the sidebar to the right of the graph) displayed on the timeline for comparison to the changing topic data.

Memcons Topic Model ThumbnailsSlide24

Telcons Topic Model ThumbnailsSlide25

Detailed Timeline – Memcons ‘Cambodia’ TopicSlide27

Detailed Timeline – Memcons ‘Le-Duc-Tho-Agreement’ TopicSlide28

Additional Findings

Slide38 Slide39 Slide40 Slide41 Slide42