3 minutes
What movie should you watch tonight?
If you need to brush up on your movie classics but don't know which ones are available to you via your streaming subscriptions then this post is for you.
Streaming services regularly buy rights to new movies and update the list of movies they offer to their customers.
So let's try to answer which movie can you watch via your TV subcriptions (Netflix, Disney+,...)?
In this post, I describe how to use the (unofficial) JustWatch API to give you a list of movies available to you ranked by number of Academy Awards received.
List of movies by Academy Awards
First, we download the list of all movies that received at least 1 Academy Awards in one of the category.
# Download table from Wikipedia page
awards = pd.read_html('https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films')[0]
# Wikipedia table cleaning
awards['Film'] = awards['Film'].str.replace('\(.+\)','', regex=True).str.strip()
awards['Film'] = awards['Film'].str.replace('\[.+\]','', regex=True).str.strip()
awards['Awards'] = awards['Awards'].str.replace('\(.+\)','', regex=True).str.strip()
awards['Awards'] = awards['Awards'].str.replace('\[.+\]','', regex=True).str.strip()
awards['Awards'] = awards['Awards'].astype(int)
awards['Nominations'] = awards['Nominations'].str.replace('\(.+\)','', regex=True).str.strip()
awards['Nominations'] = awards['Nominations'].str.replace('\[.+\]','', regex=True).str.strip()
awards['Nominations'] = awards['Nominations'].astype(int)
List of TV providers for each movie
The website JustWatch is like a movie search engine for movies. If you search for a movie title, it will return a list of offers on where to watch it etc. Here we will use the (unofficial) JustWatch API as JustWatch doesn't offer an open API. You will have to install it by running pip install JustWatch
and for more info and documentation, check out their Github page.
from justwatch import JustWatch
import time
# Download list of providers in your country
just_watch = JustWatch(country='US')
provider_details = just_watch.get_providers()
providers_dict = {p['id'] : p['technical_name'] for p in provider_details}
# Loop through the table to download list of providers for each movie
awards['Provider'] = ''
for i in range(awards.shape[0]):
if i % 10 == 0:
print(f'{i}/{awards.shape[0]}')
film = awards['Film'].iloc[i]
results = just_watch.search_for_item(query=film)
if len(results['items']) > 0 and 'offers' in results['items'][0]:
# select only offers available through ["flatrate", "ads", "free"]
free_results = pd.unique([r['provider_id'] for r in results['items'][0]['offers'] if r['monetization_type'] in ["flatrate", "ads", "free"] and r['provider_id'] in providers_dict])
awards['Provider'].iloc[i] = ';'.join([providers_dict[p] for p in free_results])
time.sleep(.5) # necessary to avoid too many requests error from the server
# awards.to_csv('movies_towatch.csv')
It takes ~10min to fill out the Provider column in the table above (to avoid overloading the API with too many requests). Once you have downloaded everything, enter the list of your TV subscriptions (or the platform you want to use) and get the list of movies ranked by number of awards received:
subscriptions = ['netflix','amazonprime','appletvplus','hbonow','hbomax','disneyplus','hulu','youtubefree','kanopy']
awards[awards['Provider'].str.contains('|'.join(subscriptions))].sort_values('Awards', ascending=False).head(20)
List of Best Picture Films
If you are only interested in films that won Best Picture:
# Download best picture awards
best_pictures_decades = pd.read_html('https://en.wikipedia.org/wiki/Academy_Award_for_Best_Picture')[1:12]
best_pictures = None
for bp in best_pictures_decades:
bp.columns = ['Year','Film','Producer']
bp = bp.dropna().reset_index(drop=True)
bp['Year'] = bp['Year'].str.replace('\[.+\]','', regex=True)
bp['Year'] = bp['Year'].str.replace('\(.+\)','', regex=True)
if best_pictures is None:
best_pictures = bp
else:
best_pictures = best_pictures.append(bp)
# Add winner/nomination to each result
best_pictures['Result'] = 'Nomination'
best_pictures['Result'].iloc[0] = 'Winner'
for i in range(1,best_pictures.shape[0]):
if best_pictures['Year'].iloc[i] != best_pictures['Year'].iloc[i-1]:
best_pictures['Result'].iloc[i] = 'Winner'
# List of best pictures available to you via your TV subscriptions
subscriptions = ['netflix','amazonprime','appletvplus','hbonow','hbomax','disneyplus','hulu','youtubefree','kanopy']
best_pictures_providers = best_pictures.loc[best_pictures['Result'] == 'Winner',['Year','Film']].merge(awards[['Year','Film','Provider']], on=['Year','Film'])
best_pictures_providers[best_pictures_providers['Provider'].str.contains('|'.join(subscriptions))].sort_values('Year', ascending=False).head(20)
514 Words
2022-05-05 00:21