HitPredict: Predicting Billboard Hits Using Spotify Data
[Pages:1]HitPredict: Predicting Billboard Hits Using Spotify Data!
Elena Georgieva, Marcella Suta, Nicholas Burton
Professors Andrew Ng and Ron Dror
Abstract
Methods
Results
? The Billboard Hot 100 Chart1 remains one of the definitive ways to measure the success of a popular song. We investigated using machine learning techniques to predict which songs will become Billboard Hot 100 Hits.
? We were able to predict the Billboard success of a song with ~75% accuracy using machine-learning algorithms including Logistic Regression, GDA, SVM, Decision Trees and Neural Networks.
Features and Data
? Ten audio features were extracted from the Spotify API4
Logistic Regression
Neural Network
Increasing order of importance
Feature
Artist Score
Accuracy
Feature
72.9%
Danceability
Accuracy
65.3%
Instrumental
73.2%
Acousticness
69.6%
Danceability
73.2%
Speechiness
73.0%
Accousticness
75.3%
Valence
73.4%
Speechiness
75.8%
Energy
74.9%
Loudness
75.8%
Artist Score
74.6%
Tempo
75.9%
Instrumental
75.1%
Valence
75.7%
Tempo
76.5%
Energy
74.0%
Liveness
76.4%
Liveness
74.3%
Loudness
72.7%
Training (75%)
100
100
Validation (25%)
75 73.4 75.9 73.4 73.7 72 72.8
76.8 76.5
51.5
50
Table 2. Error analysis for the two strongestperforming algorithms. The features at the end of the list decreased the accuracy of predictions.
Figure 3. Billboard hit prediction accuracy results for five machinelearning algorithms. LR
(Table 1).
? We created the Artist Score metric, assigning a score of 1
to a song if the artist previously had a Billboard hit, and 0
otherwise.
Audio Features
Danceability
Liveness
Instrumentalness
Speechiness
Acousticness
Loudness
Valence
Tempo
Energy
Artist Score
Table 1. Audio features extracted from Spotify's API. Spotify assigns each song a value between 0 and 1 for these features, except loudness which is measured in decibels.
Top Songs: December 2018
Dance
1
"Sicko Mode" Travis Scott
Valence
0.5
0
Liveness
Energy Speech
"Thank You,
Next" Ariana
Grande "Happier"
Marshmello &
Bastille "Without Me"
Halsey
Acoustic
"Girls Like You" Maroon 5 ft. Cardi B.
Figure 1. Illustration of audio features for the 5 top tracks of December 2018. Our algorithm predicted their Billboard success with 100% accuracy.
Figure 2. A plot of songs' danceability vs. energy vs. loudness (dB). Black circles represent Billboard hits and red marks represent non-hits.
? Data for ~4000 songs was collected from 3 and the Million Song Dataset5. Songs were from 1990-2018.
? Songs were labeled 1 or 0 based on Billboard success.
? Audio features for each song were extracted from the Spotify Web API4.
? Five machine-learning algorithms were used to predict a song's Billboard success.
Algorithms
? Supervised Learning: data split 75/25 into training/ validation. Logistic Regression and GDA yielded the strongest results.
? Bagging using random forests corrected SVM from over-fitting.
? Decision Tree performs poorly as it suffers from severe over-fitting.
? Neural Network with regularization, using one hidden layer of six units with the sigmoid activation function. The L2 regularization function was applied to the cost function to avoid over-fitting.
25
0
Logistic
GDA
Regression
SVM (w/ Decision Neural
Bagging)
Tree
Network
and NN give the highest prediction accuracy on the validation set.
100 Logistic Regression 95 Neural Network
90 85 80 75 70 65 60 55 50
Figure 4. Algorithms yield higher accuracy for more recent songs. Features of pop songs are unique to their time period.
100 Logistic Regression 95 Neural Network
90 85 80 75 70 65 60 55 50
Sept - Summer Feb - Oct Winter May
Figure 5. Features of songs released in winter vary from features of other songs. We did not observe the same trends for song of summer.
References
[1] Billboard. (2018). Billboard Hot 100 Chart. [2] Chinoy, S. and Ma, J. (2018). Why Songs of the Summer Sound the Same. . [3] Guo, A. Python API for Billboard Data. . [4] Spotify Web API. https:// developer. [5] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. ISMIR Conference, 2011.
Stanford Machine Learning Poster Session | Stanford, California
2018
Contact: {egeorgie, msuta, ngburton} @stanford.edu
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- read book « top hits from tv movies musicals instrumental
- 20 9404 mfb channel update r1 dynamic media music
- 282 instrumental series alfred music
- sochefy s child up upanql
- chart data compiled from billboard s r b singles charts
- top 30 plugins and bundles for music production
- top hits from tv movies musicals instrumental solos tenor
- hitpredict predicting billboard hits using spotify data
- 100 jazz etudes by jacob wise darrell boyer
- sound planet e channel list2021 10 instrumental
Related searches
- predicting reading comprehension strategy
- data analysis using excel
- using sas for data analysis
- using excel for data analysis
- aggregating data using queries
- data analytics using excel examples
- analyzing data using excel
- find data value using z score
- billboard top hits of the 2000s
- billboard country hits by year
- data analysis using spss pdf
- billboard 1 hits by year