• CMU movie summary corpus the chosen dataset for the project, containing plot summaries, movie box office revenue, genre, release date, runtime, language and character/actor information.
  • The Movies Dataset from Kaggle : This dataset’s content is very similar to the CMU dataset’s, which here works to our advantage because we used to complete missing data in our dataset. This allows us to work with a more robust dataset and broaden the scope of our analysis.
  • IMDB Dataset from Kaggle : Gross isn’t a sufficient metric to evaluate public perception, which in our analysis, is critical. That’s where this additional dataset comes in. It offers key indicators of a movie’s success not present in our initial dataset such as average rating and the volume of votes a movie received.
  • The Oscar Award, 1927-2023 from Kaggle : This dataset features Oscar wins and nominations which is a significant measure of a movie’s acclaim and success. Moreover, this attribute is often closely correlated with the film’s release date, offering a temporal dimension to the success metrics in our analysis. This dataset contains the movies rewarded each year and for each category.