Get to know the music you like – self hosted “your spotify” data analysis

Published by Oliver on

Do you want to know what artist you like most? At what time of the day you usually listen to music? With the (unofficial) self hosted “your spotify” software you can get this data and display it in a great looking interface.

The data you don’t own

I am, like many out there, using Spotify to listen to the songs I like. It works great, streams pretty much any song instantly, is available for most platforms and I can use Spotify connect to play music via my Echos and other devices (like this wallmounted one). The best part: I don’t have to do anything, no buying music, no curating of my music library, no server, nothing.

Of course a subscription model like this also has some significant downsides. The most obvious one might be ongoing costs – I have to pay Spotify a certain (ever increasing) amount of money to keep using their service (or at least some of the features).

A less obvious downside is loosing data. If you stream every single song via Spotify they will get to know a lot about you. Some of that they are sharing during the yearly Spotify Wrapped event (and in quite a nice way I have to admit) but some of it they don’t share. Fortunately more recent (EU) regulations have forced companies to share at least some of your data with you. Combine that with some nice open source software and you can get very interesting insights without another service involved.

Your spotify

The your spotify project is a great looking open source software that can request your listening data from Spotify and display some insights you can gain from that in a nice way. The best part: you can fully host this yourself for free. Disclaimer: of course its not an official software provided by Spotify.

The software will then connect to Spotify’s APIs to download your data (more on that later) and show it in a nice dashboard. If you have a Docker based server like my smart home server or my storage server then you can use a simple docker-compose yaml file to get started.

Here is my your_spotify.yaml file:

version: "3"

services:
  server:
    container_name: spotify_server
    image: yooooomi/your_spotify_server
    restart: unless-stopped
    ports:
      - "8081:8080"
    depends_on:
      - mongo
    environment:
      API_ENDPOINT: http://homeserver:8081
      CLIENT_ENDPOINT: http://homeserver:3001
      SPOTIFY_PUBLIC: ${SPOTIFY_PUB_KEY}
      SPOTIFY_SECRET: ${SPOTIFY_SECRET}

  mongo:
    container_name: mongo
    restart: unless-stopped
    image: mongo:4.4.18
    volumes:
      - ${DATADIR}/spotify/db

  web:
    container_name: spotify_db
    image: yooooomi/your_spotify_client
    restart: unless-stopped
    ports:
      - "3001:3000"
    environment:
      API_ENDPOINT: http://homeserver:8081

It looks a bit different from the official one suggested on Github because I adapted it to my needs. I switched to an older version of MongoDB as the ARM based Pi does not support the newer versions. I also added the more reasonable restart: unless-stopped policy to restart any failing container unless I stopped it manually.

You will notice that I also removed the links part from the server, as in my case all containers run in the default network. Otherwise of course you might want to set up a specific network for these three containers. I also modified the ports used to point to other ports on the server (8081 and 3001) as the default 8080 and 3000 were already in use here.

Finally I changed the APIs to point to my homeserver (you could also use the IP address) that the software is running on instead of the localhost that would be needed if you run it on your local computer. Finally I added to variables like ${SPOTIFY_PUB_KEY} for the Spotify related secrets. The values for those are stored in a .env file in the same folder. If you need an example have a look at this one.

Finally if you look to aggregate all the logs from these containers in one easy to use software then I recommend giving Loki & Grafana a try. You can find all the information about the setup in my guide here. To use it simply add a couple of lines to each container like:

version: "3"

services:
  server:
    // ...
    logging:
      driver: loki
      options:
        loki-url: "http://localhost:3100/loki/api/v1/push"
        max-size: "10m"
// other containers

Getting Spotify API keys

Before we get started we need a way for this locally hosted application to actually get access to your Spotify account and data in the cloud. Fortunately there is a way for applications to access the data via an API provided by Spotify itself. To use that you register a new application (your instance of your_spotify in this case) with Spotify and get a set of keys from them. You remember the two variables from earlier? Thats them.

To register this new application you can follow the official guide. Basically you just have to log in at https://developer.spotify.com/ and find the dashboard (via your username on the top right). Then you click the “create app” button and give it a name and description. That can be anything but I suggest picking something that clearly states the reason you create this app.

You will also need to add a redirect URI. That should be the API endpoint from earlier plus /oauth/spotify/callback. In my case I added http://homeserver:8081/oauth/spotify/callback. The idea behind this is that the your spotify application will later redirect you to the official Spotify login page to authenticate and authorize you. After the login it will redirect you to the URL provided here (so back to the your spotify application).

For this redirect to work the your spotify application does not need to be public, just reachable from the browser/computer you are using.

Finally you need to copy the client id and secrets shown for the Spotify application and put them into the environment variables for your docker compose file. As a last step you then have to add your own user to the application by switching to the user management tab and adding a new user with your email address.

Add a new user to your application

Using your spotify

Now its time to start. Use docker-compose -f spotify.yaml up -d (or the newer docker compose … instead) to start the application. If you don’t see any errors and the docker-compose -f spotify.yaml logs command looks good too you can start using the app.

Go to the client endpoint, in my case http://homeserver:3001, and log in with your Spotify account data. You will be redirected to the your spotify dashboard. The first time I logged in it took a couple of minutes for the application to load all the needed data and calculate statistics so be patient.

the your spotify dashboard showing statistics about time listener and distribution
Great looking your spotify dashboard – just not a lot of music for me on that day

Once all of that is done you will start seeing data about which songs & artists you like to listen to and some quite interesting statistics about the overall time spent listening and the distribution over the day. Unfortunately by default you will only see data for ~the last 24 hours.

Adding historic data

The reason for this limited data is simple: there are technical restrictions on the API side you don’t want to spam their servers anyways. Fortunately there is a way to get the full historic data: you can use Spotifys privacy tools to get access to it. Using their privacy dashboard you can request the data of the past year (and wait 5 days) or all of it (and wait 30 days). The waiting time is a bit unfortunate but you only need to do it once.

I requested the full data and got an email from Spotify with a link to confirm the request. Then another one confirming that they are working on it. Then I waited… PS A couple weeks later I got a link to the requested data via Email. You have to sign in and will then get a link to download a couple files.

Once you have received the data your can use the import function in the settings menu of your spotify to import all of that historical data and finally see the full statistics. To do that simply select either the account data or the extended streaming history (depending on what you requested from Spotify) and then proceed to upload the json files from your history.

This is where you can import the data requested from Spotify

In my case I chose the extended streaming history and it contained some readme file with some explanation of the data format and a set of JSON files containing the actual data. I think the amount of files will depend on how many songs you played in the past, in my case I had 2 audio related JSON files.

Content of my downloaded Spotify data folder

Now select them all together via the SELECT button and wait for quite a bit while the files entries are processed by your server. In my case I had around 21k entries and it took more than an hour. You can simply run it over night. The UI numbers did not update until the process was done in my case but you can follow in real time via the logs.

Import history shown once you upload the JSON files

Once that process is done you can finally see all your data over the last years. Some of the results were quite obvious for me (distribution over the day), others were really interesting (song number & artists listened over the years). Here are some of the results in my case (yes I like Image Dragons a lot ;)).

Some of the your spotify stats

Overall I love the way you can get some more extensive statistics about this data that is all about you but otherwise just available to Spotify. Using EU enforced privacy mechanisms to get that data and a locally hosted open software to display it is the cherry on top of the cake 🙂 Thanks a lot to Timothee Boussus for creating this project.

Categories: Software