4 ways we use AI at Cover

(and 2 we're tinkering with)

Published: Oct 2024

By Ben Gregory

Chat

It's 2024 so of course we had to have a chat component in Cover. We're going to let y'all into on a little secret: It's ChatGPT. That's it. Ok, next.

Chat with Cover

Just kidding :) Well, it is ChatGPT but there's actually a little more to it that actually makes it a "Pop Culture Companion"and not just a ChatGPT wrapper.

What's great about Cover is that the more you use it, the more it will help you. So, for example, if you're looking for a show to watch, Cover will use your existing shows to help find something that you might like. To enable this, we use some of OpenAI's features to simulate "memory" so that Cover knows where you're at in your shows and then uses that to help you find something.

We explored a few different options for this but ultimately decided to use OpenAI's "Threads" API. Threads basically covers everything that we wanted to do by essentially storing all previous messages and progressively summarizing them as the conversation begins to approach the context window threshold. Now, when a user begins chatting with Cover, we check to see if there is a thread for that user. If there isn't, we create one and store the id in our database. We then seed the conversation with all show-related records we have of that user (what are they watching, what have they indicated they want to watch, etc.) so that Cover can have context on what the user is looking for.

And if there is an existing thread, we simply retrieve the id and pass it to the assistant to continue the conversation. And then it's just is one less thing to worry about.

As a sidenote, we really liked CrewAI and loved how easy it was to create role-based agents and "Chain-of-Thought" reasoning prompts. Integrating this approach would allow us to make our recommendations much more sophisticated and use sources from around the web to inform our recommendations in real time.

But our biggest concern though was how easy it was to explode the number of tokens used as the agents worked through their tasks. Granted, there are likely some best practices we could / would implement in the future but as just a test, using GPT-4o for a test of giving tv show recommendations, it used 345k tokens, meaning a total cost (for one recommendation) of $13!

To be fair, the recommendation it gave was honestly great and not something we would have thought of but that is just crazy expensive. There are absolutely ways to reduce the tokens used and make the process overall more efficient and we're working on making it better but it's not something we want to put in the hands of our users until we're comfortable with the cost and are confident it has reasonable limits. So some more basic recommendations are going to have to do for now.

Whisper

We've used a lot of transcription services in the past. Some perform better than others but the problem that we were facing was always the cost. On the one hand, something like Otter.ai was high accuracy but aimed and priced as more of a notetaking app for meetings. Rev was always a great service with the option to get human or AI transcription. The only problem is that pop culture podcasters tend to talk a lot (see House of R which published a 3 hour podcast on 2.5 minute Across the Spiderverse trailer - we love you Mal and Jo!) and at $0.25 a minute for AI transcription, we were looking at $20-30 per episode. We're bootstrapped at Cover and even if we weren't, we're just cheap people. That just wasn't going to work for us.

Then we found Whisper and after some testing, we found it hit the sweet spot of price ( $Free.99) and performance.

We first tested running Whisper on cpus to see how it would perform. We first ran the base model but for audio like this (multi-hour, multi-speaker), the accuracy wasn't great. We then moved to the small model and found it was taking ~40-50mins to transcribe a show, which is around the expected performance of 4x relative speed (meaning if audio is 40 mins long, it will take 10 mins to transcribe).

Generally you have to trade performance for accuracy

Bonus for this was that each process only took 2GBs of memory, meaning that we could run multiple processes on a relatively cheap server.

But if we're going to use a cheap server, what's the harm of using a cheap GPU? Thankfully we happened to have one of those so we loaded up the Turbo model and set the configuration to use CUDA cores. We should have seen a ~7-8x relative speed (about the same as the base model) and improved accuraacy but in practice, we found that the gains or only made when using GPU. Turbo on CPU was twice as slow as on GPU and eight times slower than the base model.

Turbo performed best on GPUs while smaller models worked best on CPUs

To be fair, it is a very old GPU so we can definitely see drastic improvements using a more modern one with more cores. Ultimately, because we liked the results of the Turbo transcription, we decided to push the feature that these transcriptions power back and let our little GPU that could transcribe 24/7 in the background. There are currently around 3600 episodes in the backlog so it's going to take some time to catch up but the beautiful thing is that once its transcribed, it's done. Moving forward, we expect it to be able to keep up with new podcasts episodes on a daily basis and even take some time off in the meantime.

Article Classification

As part of our effort to make Cover the most useful pop culture companion on the internet, we track ~175k shows and ~4.3MM episodes of television. But anyone who has been sucked into a show knows that the best part of enjoying television is the community that builds up around it. Fan theories we obsess over, easter eggs we may have missed, recaps that bring in larger backstories or lore; these are all things that truly brings the shows alive and into our hearts.

So to build up this bigger context, we have crawlers that run every day to look for content that could be relevant to these shows. But these are a lot of articles and believe it or not, we don't work 24/7. But season renewals and cast updates don't wait for our schedule so we need to make sure that they're classified and indexed immediately.

Now, classification is a relatively simple task and models have existed for years that can support this. It's actually such a common use case that the popular model repository Hugging Face has multiple categories just for specific types of classifications (e.g. text, token, zero-shot, audio, video, etc). But the problem with this is that a classification model takes time to train and, more importantly, it takes a meaningful dataset to train it on. Unfortunately, to the best of our knowledge, there isn't a dataset that categorizes every article that has been written about every episode and season of every show. And, even if there were, it still wouldn't be sufficient to support new shows as they come out.

But this is where LLMs come in -- we use OpenAI's Assistants API to pass a custom (and secret - shhh) prompt to GPT-4o-mini along with the title and body of the article. We then give the model access to a vector store of existing shows in our database along with descriptions of what the shows are about and ask it to classify the show according to the appropriate showId based on the text in the article. This is then passed to another assistant that checks the work of the first assistant and makes sure that the show is classified correctly (it does occassionally hallucinate and make up a completely fake id) before making a call to the database to create the association between the show and the article.

Coding Assistant a.k.a. The Librarian

This one's kinda boring -- there aren't any features to highlight or tables to show. But it's still really important to how we work and what we're able to get done.

Everyone at Cover is full-stack, which means we're often using a tool that we're not super familiar with. This is important to us because it's part of our ethos that we want everyone to understand the entire environment so they can jump in where they need to. And, from a more technical standpoint, so much of what we're building is interconnected that being able to just do front-end or just back-end work just doesn't help that much. Have you ever tried to build a new feature without being able to modify the API that supports it? It's a non-starter -- it blocks the entire process.

But what this means is that we're constantly running into syntax errors or unknown parameters in libraries one of us has rarely used before. That is where we're found OpenAI stumble and Anthropic's Claude really shine. Copy / pasting error codes and given Claude the context of what we're trying to do works more often than not. And, even with things that we know clearly how to do, we're finding that it's far faster to write clear instructions than it is to go through a tedious process of manually writing that one-off script or sql query. There are sometimes a couple things we need to correct or tweak after it's done but more often than not, it's 90% of the way there.

For a real example, we used it to write a custom sql query to get the next episode for each show that a user is watching (according to a custom ruleset we follow) that wasn't possible to do in our ORM and performance wise was far too slow to be done outside the DB (in Node.js it took about ~50 seconds). After a few iterations with Claude, we were happy with it -- that query is 137 lines long, creates seven subqueries, and uses five joins in the final output. And it takes 6 seconds to run.

The end result of all of this is not quite like we have a full developer who can do everything by themselves but that we have a senior dev who we can bounce questions off of and refine our own ideas with. Oh, and that senior dev can also give you customized documentation of how nearly any library or tool should adapt to our use case and instantly identify where you might have gone wrong. In fact, Claude is responsible for the Substack inspired styling of this blog post.

Chrome Extension (coming soon)

This one we're excited about both because it is an early preview of something we have coming and also because its a great showcase of something that would be really difficult to build if we weren't using AI.

We know that Cover will only work well if it is an accurate representation of what you're watching, sharing, and enjoying. So we want Cover to be out in the world with you wherever you're enjoying your content.

There's a lot of places where that happens and to start out, we're focusing on browser extensions. This is the area with the most fleshed out ecosystem with a straightforward API so we thought it'd be a good place to start.

The way this will work is similar to the article classification but instead of having a clear title and body, we're sending a combination of html tags and network requests to the assistant to make an assessment of what show is being watched. This is because some streaming services intentionally obfuscate the details of the show in the html. We're not saying this is malicious but even as a byproduct of another engineering decision, the end product is that you can't necessariliy just look at a meta tag and know what is being watched.

When everything is said and done, our users will be able to install the extension and from there be notified of when a show has been identified and if they want to add it to their Cover profile without having to login and do it manually.

Pop Culture Summaries (coming soon)

Remmember when we said we were transcribing podcasts? Well we're not doing them for it's own sake. All of that work is being done to power our Summaries feature. We're excited to show you what we've been working on soon but not yet. But we can give a little hint of what we're working on.

We feel that Rotten Tomatoes is...fine. That's not entirely fair as it was a pretty groundbreaking way to assess movies at the time and we truly really liked it when it was first out (yeah, we're old - what are ya gonna do?). But the Rotten Tomatoes post-Fandango acquisition hasn't really brought about anything new. To be entirely honest, it's still feels like they're riding the coattails of their pre-Flixster era model. Which again, we liked. But maybe the world deserves something new? We think it does. What if the world of reviews was more interactive instead of reducing a movie to a single score?

That's all we can say for the moment but it's exciting for us and hopefully it will be for y'all as well. We just got to make sure to tamp down how much these agents like to chat with each other when assessing a show. It's great, they're enthusiastic, but come on, y'all. Wrap it up. Move it along.

Thanks for reading! If you enjoyed this, please consider checking out Cover!