Exploring Music Purchases with R

One of my favourite things to do is buy music. I rarely pass up the chance to mooch through the racks of record shops, or charity shops, or indeed any shop where I might pick up a bargain. I’ve been doing this for over 25 years and I recognise that it’s probably something of a compulsion by now – mostly it’s about a love of music and a desire to add to my collection, but a good deal of it is just plain old habit. A regular conversation between me and my wife goes something like this:

Me: “I just bought this for £2. It’s worth about £25”

My Wife: “Ok, but you’ll never sell it, so it’s worth -£2″

She has a point, of course. Although I’ll periodically have a purge and clear out some records, the majority remain on the shelves in the front room of our house and are slowly colonising our living space. I’ll probably never stop.

I’ve always bought a lot vinyl records, but have recently found myself buying CDs for the first time (they are ludicrously cheap in charity shops these days) and a year or so back I bought a gramophone (£10 on eBay!) so have also been picking up a few 78 rpm discs.

On Christmas Eve last year during yet another ‘quick look’ around charity shop, I bought more records. As I’d been experimenting with data analysis in R as part of my PhD research and was looking for projects to work on, I decided that I would begin keeping a record of everything I bought from that point on with a view to running some analysis on my habit. Mainly I wondered how much I spent doing this – £1 here, £5 there on a regular basis probably added up to a frightening amount, I figured, but I was also curious as to how often I bought records, where, in what volume, and also whether I was unearthing buried treasure or just buying rubbish.

To help find out the answers to these questions I started a spreadsheet where I recorded the things I bought, along with the price I paid, the format the music was on, whether the item was new or 2nd hand, and where and when I bought it. I also started adding the purchases to Discogs (something I plan to do with my entire collection at some point) and as I did that I made a note of the Median Sales Price each of the items fetched in the Discogs marketplace. This enabled me to arrive at a rough calculation of how much I was ‘up’ in a purely theoretical sense (since, as we’ve established, I’ll never sell the majority of the things I buy). Clearly this is not a precise measurement as there are lots of other factors to consider in terms of the actual value of items – condition of the record/sleeve, fluctuating prices in the marketplace, and so on – but as a general rule of thumb it seemed like a useful barometer.

Here are some basic insights from the data I’ve collected so far. The following includes the handful of records I purchased in the last week of 2016, which have been lumped in with the 2017 purchases.

– Since 24th December I’ve bought 63 items for a total of £199.80.

– This breaks down as follows: 29 vinyl LPs; 19 CDs; 10 78s; 1 CD boxset; and 4 12” singles.

– 57 of the items were 2nd hand, 6 were new (5 LPs and 1 12″inch)

– The average amount spent on per item was £3.17. Excluding new purchases this fell to £1.99 per item.

In terms of when and how many, it appears that I’ve bought 4 or more items each week in every week except for two so far this year. The spike around the middle of January coincides with my birthday, and this was when I spent some Amazon vouchers I’d received as gifts. Over my birthday weekend I also found a pile of 78 rpm records in a junk shop (more on those in a moment), which contributes to that splurge.


In terms of my making a ‘profit’ from digging around in record shops, the data from Discogs is reasonably pleasing (caveats above notwithstanding).

– The £199.80 I’ve spent equates to a possible/theoretical £466.62, which means I am ‘up’ by £266.82 

– The average Discogs sale price was £7.40 (compared to £3.17 purchase price), which is an average ‘profit’ of £4.23 per item.

The following chart shows price paid against Discogs ‘value’, with the average line indicating that in all but a couple of cases I’ve managed to pick things up for less than they are ‘worth’. You’ll notice that the majority of new items (shown as triangles) have roughly the same purchase/sale price, but it’s reasonable to assume this will change in my favour over time. The further items are away from the black line indicates a bigger difference between purchase price and potential value. The two dots on the bottom right hand corner are a Beach Boys CD boxset I picked up for £4.99 that sells for around £20, and a vinyl copy of Crowded House’s ‘Woodface’ that fetches around £25 that I got for £3.



From Discogs I also recorded the year of release for things I’ve purchased, and this combined with formats and prices/’profit’ makes for another interesting visualisation that gives some insights into my hoarding habits and the current marketplace for digging around in charity shops and junk shops.

– 78rpm records are mostly very cheap, and mainly because most of them are rubbish, but if you choose wisely you easily can pick up things that are worth £5 or more. The best I did here was a copy of The Andrews Sisters’ ‘I Saw Mommy Kissing Santa Claus’ in it’s original picture sleeve, which I picked up for 35p and which could sell for around £15.

– CDs from the 1990s and 2000s are very cheap at the moment – you can pick them up for as little as 25p in charity shops – and although the majority are worth what you pay for them, some are surprisingly sought after. I got a copy of Exotica ’92 – a collection of novelty football records – that is worth upwards of £10. Based on this small dataset, it appears that 1990s CDs are worth picking up at the moment.

– It looks like I buy a lot of 1960s and 1970s vinyl, but very little from the 1980s. This would fit with my general tastes, I think, but it now has me wondering whether I should address this gap (…oh dear).

– Good stuff on vinyl from the 1990s onwards is unsurprisingly thin on the ground (there was less of it about, for a start), but it’s worth picking up if you see it. The 1991 Crowded House record I mentioned before is the big pink dot towards the top of the picture.


This is as far as I’ve managed to dig so far into the data, but I’ll be adding to the 63 items over the course of the year and will be working on some other analysis. Once I have some more/better analysis I’ll provide code and sample data so that you can adapt and use for your own purposes, should you wish to explore your own habits.

One thing I would like to try is to combine this data set with my Spotify audio scrobbles that I collect from Last.FM; it would be interesting to see how my record/CD buying influences my listening on digital services, and vice versa. I’ll post on that soon.