Carcanet Project Archivist Paul Carlyle writes:
In March I wrote about Palladium (Providing Access to Large Literary Archives in a Digital Medium), a project currently underway at the John Rylands Research Institute and Library. Many of you will remember that we’re looking at the email archive of the Carcanet Press, the renowned Manchester-based literary publisher, and exploring how we accession emails, manage the archive’s long-term preservation, provide access and support research. One of our most important tasks is to develop methods that we can use to appraise the emails we currently have and those that will arrive in the future. We’re also looking at how, as part of the appraisal process, we review emails for confidential or sensitive information, and how we can develop and enhance ePADD, email preservation software created by Stanford, to assist us with both appraisal and sensitivity reviews.
What is appraisal?
Put simply, appraisal is a process used by archivists to decide what to keep and what to discard. It helps them understand the nature of the records created or inherited by an individual or an organisation. Archivists identify records that are likely to have enduring historical value: records that will be of interest or use to researchers now and in the future, and that ought to be preserved permanently.
Why do we need to decide? In the past the intimidating volume of digital records encouraged archivists to consider keeping everything. Nowadays, most archivists realise that digital collections cannot go untended. The estimated number of emails sent globally increases every year (it was reported last year that more than 293 billion emails were sent/received in 2019; it’s estimated that this will increase to 347 billion by the end of 2023), and legal obligations compel libraries and archives to take a more active approach. Carcanet’s emails contain great riches for those interested in modern literature, but they also contain large amounts of ephemera familiar to anyone with an email account: password resets, junk mail, out-of-office replies. Retaining everything can weaken the coherence of a collection, making it difficult to find the significant records. Only once a collection is appraised can attention be given to access.
How do we appraise the Carcanet’s emails?
The Rylands has been accessioning Carcanet’s emails for a decade. The archive contains approximately 800,000 emails and attachments combined, and presents significant appraisal challenges:
- Carcanet employs several staff and has different departments with wide-ranging responsibilities. It generates thousands of emails, and will continue to do so while email remains the dominant form of professional communication. The number of emails make it impractical for anyone to go through them one by one.
- No two people manage their email accounts the same way. Email accounts can have elaborate folder structures, while others are largely concentrated in a single folder. It’s like working with a number of discrete collections.
- There is often little or no separation between the personal and professional. Vast quantities of personal data and special category data resides in the emails. Finding email addresses and other contact information is easier than finding information about a person’s health or their political beliefs. An author might employ euphemism, or use nicknames, shorthand or some sort of code in their emails, making it more difficult to locate with the available search tools.
- Sensitivity is not confined to data protection law: some information may be of no legal concern but making it accessible could nevertheless have significant consequences for individuals: for their reputations, the reception of their work, their careers, or their personal and professional relationships.
We’re breaking the archive into smaller pieces, identifying parts of email accounts that present few, if any, difficulties, and those that are likely to be more problematic. We’ve developed selection criteria, which we’ll continue to refine. Knowledge of an organisation and its work is essential if appraisal is to be effective. Understanding the nature of Carcanet’s work across several decades means we can identify: emails that are relevant to its activities as a publisher, and its relationship with its authors, arts and cultural organisations, other publishers, and academic institutions; emails that document Carcanet’s participation in literary debates, and those that document changes to Carcanet as an organisation.
As part of this project we’re looking at how innovative computational methods will support and improve our work.
ePADD is the tool we’re using to manage Carcanet’s emails. Stanford has used it with some of its own collections, most notably the archive of the poet Robert Creeley. The British Library has used it to manage the poet Wendy Cope’s emails, while the Harry Ransom Centre has used it with the emails of novelist Ian McEwan. These email archives have come from individuals, so Carcanet’s emails present a good test of how ePADD handles emails generated by an organisation.
ePADD allows us to do many things that will assist the appraisal and sensitivity review:
- standard keyword and advanced searches
- bespoke lexicon searches that can be tailored to different subjects and individual email accounts
- ‘regular expression’ searches, which help identify formulaic information such as bank account numbers
- ‘entity searches’: ePADD extracts ‘entities’ – these include people and places, among other things – and converts it into structured data which can be searched
- labelling and bulk-restricting sensitive emails
- search attachments
It ‘resolves’ names and email addresses associated with a single correspondent, rationalising what would otherwise be an unwieldy list, and ‘deduplicates’ emails, which may help weed out some of the digital ephemera.
We plan to test ePADD’s capabilities, look at how we could enhance or add to its existing features, and see what we can learn through our approach to appraisal and sensitivity review. It will contribute to the ePADD+ Project, a collaborative project between the University of Manchester, Stanford and Harvard), which my colleague Jessica Smith wrote about recently.
Please keep an eye on the blog for further updates.