One of the main challenges of the PhD project that Harkive is part of, is the need to devise a means by which the insights held within the stories people have told the project since 2013 may be revealed. The largely text-based data collected represents a huge challenge in that regard, leading to a methodological focus on collaborative and experimental analytical methods. Such an approach is by no means unique to this project. Academic researchers in a number of disciplines have been embracing new methods and experimental approaches for several years, leading to the genesis of entirely new fields: Social Computing; Digital Humanities; Cultural Analytics. At the same time, barriers to entry and access in terms of data collection, storage and analysis, are falling, enabling people to critically and artistically engage with data in interesting ways. Think of terms such as Citizen Data Science, or movements such as The Quantified Self.
Harkive and the doctoral research project that underpins it, resides somewhere within the broad and emerging area described above. What makes this exciting for the project is that, just as the landscape of modern popular music is a fascinating and dynamic space, so – increasingly – is the field of human-data interaction.
To put all of this another way, just as the stories Harkive collects are ‘crowd-sourced’, one avenue this project is keen to explore is to see if perhaps some of the analysis may come from a similar method. What questions would other people like to ask of this data? What could be built with it? What would it sound like as a piece of music? These are the questions that come from having an inquiring mind and an interesting data set! There are more possibilities and questions than there is time, however, and it is with this in mind that we have created the Harkive API, full details of which are provided below.
For those of you reading who may be unaware of the function of an API (Application Programming Interface), in simple terms it allows access to data in a structured, reliable way, so that applications, visualisations and other online tools (and even pieces of music) can potentially be created by making use of the data. The crucial point is that although the data held within an API may change over time, the structure the data is held within remains constant. This means that anything built upon an API is able to change dynamically in line with changes in and to the data, without necessarily having to change its own structural dynamics. APIs are thus powerful tools for developers and, increasingly, academic researchers.
Data Visualisations created with Harkive API
A better way to understand the above is to look at the small number of visualisations that have been built by Nick Moreton using the Harkive API. These are being hosted on a dashboard at www.harkive.com and relate to the Harkive 2016 data. This data is and will be dynamic – as people tell their stories, they will generate more data – but the structure of the API remains the same. Because the visualisations on www.harkive.com are built with the API, they will change as more stories are gathered.
Here are some examples:
Story Sources: will display the ratio of total stories according to the various submission methods. For a full list of the available story-telling methods, please visit the How To Contribute page. From the screenshot below, it is easy to see the dominance of Twitter in terms of conversations about Harkive, but these ratios may change on 19th July as stories begin to be posted elsewhere.
Harkive Around The World: will display details of Tweets sent with the #harkive hashtag, where Twitter users have enabled location settings.
WordCloud: Following automatic removal of Stopwords and other phrases (incl. the word Harkive, which features prominently in collected posts), this visualisation will produce a Wordcloud based on the content of Harkive stories. As the screenshot below shows, ‘tell’ is a prominent word at this point in time, and this is because of the promotional posts (and shares of those posts) encouraging people to ‘tell their story’ to Harkive.
The basic examples above demonstrate some of the many ways that different levels of insight can be derived from data. They represent, however, only the tip of the iceberg of what is possible.
Shortly after the 2016 story-gathering element of the project ends next week, we will begin the process of sorting, cleaning and analysing the data. For the purposes of the immediate concern of the PhD project Harkive forms the basis of, this analysis will proceed according to three broad themes: Formats and Technology; Data, Privacy, Identity and Ownership; Recommendation and Discovery.
If you would like to get involved with this process please do contact firstname.lastname@example.org. There are already a small number of academic researchers, analysts and data scientists working on ideas for the data, so please do consider collaborating with us.
If, on the other hand, you would simply like to play with the API and the data it contains in order to create something cool – perhaps even a piece of music? – then please do so. Just remember to let us know what you come up with so that we can share it with the wider Harkive audience.
Further information on the Harkive API
Documentation is available at http://developer.harkive.com.
The Harkive API allows developers access to limited elements of the data collected by The Harkive Project. In particular, and based on the Research Ethics underpinning the project, the API does not provide access to personal information gathered by the project.
The API currently contains only stories collected by the 2016 instance of Harkive. Stories from 2013-2015 will be retrospectively added shortly after Harkive 2016.
The automated collection methods that place new data within the API structure at present capture everything related to Harkive, so will necessary include tweets (and other types of posts) that mention the project. Although tweets sent from the official @Harkive twitter account have been excluded from certain counts in the visualisations, anything posted by others online ahead of Tuesday 19th July will be displayed. This data is included at this stage primarily to demonstrate the API and visualisations. Shortly after 19th July, data contained within the API will be sorted and cleaned, leaving only stories.