Given the data of Bollywood Movies, please perform one or more of the following data analysis tasks :
Enable multi modal Question Answer on top of this dataset.
User should be able to ask questions(in text format), and the output should be text and/or image.
User may also provide an image as an input, and the output should be the plot/points relevant to that image.
Convert the movie plot into entity-relation ship graph where each path traversal provides a different story arc of the movie
The data set has been used to show gender bias present in bollywood(http://proceedings.mlr.press/v81/madaan18a/madaan18a.pdf).
You may extend this work to change the movie text to be gender neutral.
Can you show a relationship between backdrop of the movie and gender bias presented in the movie?
For example - Is gender bias more prevalent in movies with a rural setting than with an urban one?
You may come up with any other innovative use of the dataset which leverages and proposes new text, image or video task and early solutions.
This repository contains three types of Bollywood Data:
scripts-data
trailers-data
wikipedia-data
Trailers data
This dataset contains the gender and emotion data for all Bollywood Movie Trailers released from 2008 to 2017.
The following dataset includes the folder :
individual-trailer-data: It has the gender data and emotion data detected at each frame for the trailer video.
The repository also includes the following files :
trailers_list.csv: Contains movie names and year of release of all the trailers in the dataset
complete-data.csv: It has gender and emotion information for each of the trailers in the data folder. It has the following columns :
frame_number - the frame number of the trailer in which emotion and gender detection occurred
man/woman - whether the detected person was a man or a woman
emotion - the emotion potrayed by the man/woman detected in the image
year - the year in which the movie was released
movie_name - the name of the movie
indiviual-trailer-data.zip : Compressed and zipped file of all indidividual trailer's data.
Wikipedia Data
This dataset contains data collected from wikipedia for bollywood movies.
Also, it contains the data files which have been generated by processing the wikipedia output. Details of each file is given as follows -
avg_wv_relation.csv - Contains word vector relations data used in Inter sentence level
coref_plot.csv - Contains coreferenced plot using OpenIE
female_adjectives.csv - Contains adjectives used for females extracted using Stanford Dependency Parser
female_adjverb.csv - Contains adjectives and verbs generated using Stanford Dependency Parser
female_centrality.csv - Contains centrality for females across all movies in text
female_mentions_centrality.csv - Contains centrality and mentions of females in movies
female_verb.csv - Contains verbs used for females generated using Stanford Dependency Parser
male_adjectives.csv - Contains adjectives used for males extracted using Stanford Dependency Parser
male_adjverb.csv - Contains adjectives and verbs generated using Stanford Dependency Parser
male_centrality.csv - Contains centrality for males across all movies
male_mentions_centrality.csv - Contains centrality and mentions of males in movies in text
male_verb.csv - Contains verbs used for males generated using Stanford Dependency Parser
songsDB.csv - contains sountrack information
songsFrequency.csv- contains soundtrack frequency
image_and_plot_mentions_fequency.csv - contains poster mentions and text mentions for males and females in each movie
Submission Format:
Please submit a A2 sized poster explaining your work.
Indraprastha Institute of Information Technology, Delhi
Okhla Industrial Estate,Phase III
(Near Govind Puri Metro Station)
New Delhi, India - 110020
Phone No: 91-11-26907400-7404 (5 lines)
Fax: 91-11-26907405