ALL THE METADATA'S A STAGE

Exploratory Bibliographic and Text Network Analysis of British Comedy Dramas Across 17th-19th Centuries Using OpenRefine, Python and Gephi

Author: Chahna Ahuja

Course: Introduction to Digital Humanities CLass (2025-26) Dr. Margherita Fantoli at KU Leuven (Msc Digital Humanities)

---

Introduction to the Project

This project explores the British Library’s bibliographic dataset of digitized British dramas from the 17th–19th centuries, using an exploratory distant reading approach.

The workflow of this notebook proceeds in three stages:

Descriptive Statistics and Visualization of the Dataset
Provides a structured overview of the dataset for both the researcher and the reader, helping understand dataset size, structure and suitability for further analysis.
Exploratory Data Analysis on Comedy Subdataset
Investigateso examine temporal trends in publications, spatial networks of publishing locations, authorship distribution, and lexical patterns of comedy titles.
Text Mining and Network Analysis
Constructs a syntactic dependency network of frequent lemmas in comedy titles to address the central research question:

How do 17th–19th century British comedy titles encode gendered roles and archetypes, and what do they reveal about the social expectations of the period?

Github Repository Structure

README.md
data/
data_analysis/
- chahna_dataproject_script.ipynb
- chahna_dataproject_jupyternotebook.html
network_analysis/
data_visualization/ #charts and data visualizations from python
project_management/ #project management plant + gantt chart
visual_assets/ #visual aesthetic images for python html

Dataset

Original Source: British Library digitized drama metadata, cleaned as part of a group assignment for Introduction to Digital Humanities CLass, taught by Dr. Margherita Fantoli, in academic year 2025-26 at KU Leuven (Msc Digital Humanities).
Data Wrangling Process:
- OpenRefine used to wrangle data while preserving original information.
- Facet filtering to isolate subsets and identify inconsistencies.
- GREL scripting to standardize and transform columns.
- WikiData reconciliation to enrich author information with contextual metadata.
Cleaned Datasets: Main dataset: dh_group7_drama.csv. Three subdatasets: Comedy, Tragedy, Plays cleaned in OpenRefine by Chahna Ahuja, Xinran Liu and Liangyu Gan, respectively as a group project component for this assignment. (check this Github repository to know more!)

Methods

Data Analysis Environment: Jupyter Notebook combining code, output, and narrative. Interactive charts via Plotly, word clouds via WordCloud. Use of Wikidata API.
Text Mining Using NLP: spaCy (tokenization, lemmatization, dependency parsing), NLTK (preprocessing & exploratory analysis)
Network Visualization: Gephi used to construct the comedy title syntactic dependency network to explore the hypothesis.

Tools for Python and Gephi

For reproducing the python notebook, first pip install the packages in requirements.txt
Go to https://gephi.org/users/download/ and download Gephi. Open Gephi, and import nodes and edges CSV files from here
To view HTML notebook without downloading it, go here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALL THE METADATA'S A STAGE

Exploratory Bibliographic and Text Network Analysis of British Comedy Dramas Across 17th-19th Centuries Using OpenRefine, Python and Gephi

Author: Chahna Ahuja

Course: Introduction to Digital Humanities CLass (2025-26) Dr. Margherita Fantoli at KU Leuven (Msc Digital Humanities)

Introduction to the Project

Github Repository Structure

Dataset

Methods

Tools for Python and Gephi

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ALL THE METADATA'S A STAGE

Exploratory Bibliographic and Text Network Analysis of British Comedy Dramas Across 17th-19th Centuries Using OpenRefine, Python and Gephi

Author: Chahna Ahuja

Course: Introduction to Digital Humanities CLass (2025-26) Dr. Margherita Fantoli at KU Leuven (Msc Digital Humanities)

Introduction to the Project

Github Repository Structure

Dataset

Methods

Tools for Python and Gephi