• Overview
  • Topic
    • Loading topics...
  • Word
  • Word index
  • Working...
  • Data:
  • Settings
  • About

Settings

Use these controls to adjust how much information is displayed on some of the browser pages.

Project Description: this site provides a visualization of the topic modelling output of the hearing transcripts of 35 oil pipeline project proposals reviewed by the National Energy Board of Canada between 1993 and 2018. More details about the project and methodology can be found here (link to “About”).

  • Grid
  • Scaled
  • List
  • Stacked

click a circle for more about a topic

scroll to zoom; shift-drag to pan; click for more about a topic

click a column label to sort; click a row for more about a topic

y-axis:

  • % conditional
  • word counts joint
topic variation top words proportion of corpus average probability

Project description: this model-browser provides a visualization of the topic modelling output used to analyze the content of the hearing transcripts of 35 oil pipeline project proposals reviewed by the National Energy Board of Canada between 1993 and 2018. In Canada, major pipeline projects were reviewed by the National Energy Board (NEB) until 2019, when changes in regulation replaced the NEB with the Canadian Energy Regulator (CER). The NEB’s mandate was to “review applications to build and operate new energy pipelines and make its decision or recommendation based on the Canadian public interest” (National Energy Board, n.d.-a). As part of the review process, the NEB launched public hearings where stakeholders could participate to challenge the plans and evidence presented by the project proponent, and provide their own evidence and views. Meanwhile, the project proponent was permitted to respond to the interventions of stakeholders. The NEB’s responsibility was to take into consideration these arguments in their final decision.

Data collection: The dataset used in this analysis consists of 411 documents containing 44,231 pages and 14.9 million words associated with the hearings held as part of the review process of 35 oil pipeline project proposals reviewed by the National Energy Board between 1993 and 2018. These documents were publicly accessible from the National Energy Board’s website (https://apps.neb-one.gc.ca/) and were downloaded in May 2018.

Preparing the corpus: To prepare our corpus, we took several steps aimed at four objectives: first, to transform the original documents (i.e., volumes of hearings) into documents per actor, so each document contains the verbatim transcripts of an individual actor during the hearing; second, to estimate the size of each actor’s document as a proxy of the length of their participation in the hearing, which was used to exclude those that did not have significant participation in the hearing (i.e., around 100 words or less); third, to have the collection of documents (one document per actor) prepared for topics extraction and topics weighting quantification through topic modelling. The corpus comprised 3,074 documents, one per actor, associated with the 35 cases mentioned earlier.

Topics extraction and topics weighting quantification through topic modelling: To extract the topics from the transcripts and quantify them for each actor, we relied on Topic Modelling (TM). The TM algorithm used in this study was the probabilistic model of Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003), which is a generative probabilistic model based on the assumption that documents within a corpus exhibit multiple themes that can be represented as a probabilistic mixture of topics, and that topics are a probabilistic mixture of words. The entire corpus was examined to calculate the distribution of words in topics and the distribution of topics in documents (documents represent actors, in our case). The software used was MALLET embedded in our TM processing pipeline using both Python and R. This approach allowed us to inductively identify the topics mobilized by each actor within the hearing, as well as their weighting function (or probability distributions). We calculated models that ranged from 5 to 150 topics and then selected the best model based on its coherence measure and manual examination. The optimal model was one with 60 topics.

Labelling topics into general concern categories: We took an abductive approach (Dubois & Gadde, 2002), “going back and forth” between our analysis of the topics and the three dimensions that characterize the notion of the public interest as described by the NEB: “environmental, economic, and social interests” (National Energy Board, n.d.-a).

This topic modelling browser was based on the model-browser interface by Andrew Goldstone; source available on github. Made using d3.js and Bootstrap. Zip support using JSZip.

Select a topic from the "Topic" menu above.

Top words

Word Weight

Conditional probability of words in topic of topic

Click a bar to limit to the documents it represents

Top documents

There are no documents containing this topic.

Document % Tokens

Choose a specific document to view from the bibliography or from a topic page.

Below: the last-viewed document. Stable link to this view:

... tokens. (view original document)

Topic Top words % Tokens

Choose a specific word to view from the list of all words or from a topic page.

Below: the last-viewed word. Stable link to this view:

Prominent topics for

Click row labels to go to the corresponding topic page; click a word to show the topic list for that word.

There are no topics in which this word is prominent.

Sort:

jump to:

top

All words prominent in any topic

Words not prominent in any topic are not listed