Notebook for exploring the movies data set and the summaries-to-genres classifier.


In [1]:
import pandas as pd
import numpy as np
import plotly
from plotly import graph_objs as go

from inaworld import MovieGenres
from inaworld import vectors

plotly.offline.init_notebook_mode()

MOVIE_DATA_PATH = './inaworld/movie_data.csv'



In [2]:
def plot_genre_counts(genre_counts):
    """Given a dict of genre tokens and counts, plot reverse sorted genre counts.
    """
    g, c = list(zip(*sorted(
        genre_counts.items(), key=lambda t: t[1], reverse=True)))
    data = [go.Bar(x=g, y=c)]
    layout = go.Layout(
        width=1100, height=400,
        bargap=0.1,
        xaxis=dict(tickfont=dict(size=10)),
        margin=dict(b=150))
    return plotly.offline.iplot(go.Figure(data=data, layout=layout), show_link=False)

def genres_tokens(g):
    """Given a single string of genres, return alphabetized, lowercase tokens. 
    """
    return sorted(map(lambda s: s.lower(), vectors.genres_tokenizer(g)))

Load the data set from file


In [3]:
movies = pd.read_csv(MOVIE_DATA_PATH)

In [4]:
movies.head()


Out[4]:
id title release_date box_office_revenue runtime genres summary
0 0 Ghosts of Mars 2001-08-24 14010832.0 98.0 ["Space western", "Horror", "Supernatural", "T... Set in the second half of the 22nd century, th...
1 1 White Of The Eye 1987 NaN 110.0 ["Erotic thriller", "Psychological thriller", ... A series of murders of rich young women throug...
2 2 A Woman in Flames 1983 NaN 106.0 ["Drama"] Eva, an upper class housewife, becomes frustra...
3 3 The Sorcerer's Apprentice 2002 NaN 86.0 ["Adventure", "Fantasy", "World cinema", "Fami... Every hundred years, the evil Morgana returns...
4 4 Little city 1997-04-04 NaN 93.0 ["Romance Film", "Ensemble Film", "Comedy-dram... Adam, a San Francisco-based artist who works a...

How many movies?


In [5]:
len(movies)


Out[5]:
42204

How many movies with empty genre lists?


In [6]:
len(movies[movies['genres'].str.len() <= 2])


Out[6]:
411

How many movies with empty summaries?


In [7]:
len(movies[movies['summary'].str.len() == 0])


Out[7]:
0

Load all movies with non-empty genres and get genre stats


In [8]:
mg = MovieGenres(min_genre_count=1).load()

What's hidden in the class?


In [9]:
print('Row-filtered, but unprocessed data: ', mg.data.keys())
print()
print('Various parameters: ', {k: v for k, v in vars(mg).items() if not isinstance(v, dict)})


Row-filtered, but unprocessed data:  dict_keys(['genres', 'genre_tokens', 'genre_vectors', 'summaries'])

Various parameters:  {'path': '/Users/epfahl/python_projects/inaworld/inaworld/movie_data.csv', 'min_genre_count': 1, 'test_size': 0.25, 'binary_classifier': <class 'sklearn.svm.classes.LinearSVC'>, 'stratify_split': True}

Explore genre stats


In [10]:
counts = mg.genre_counts()
plot_genre_counts(counts)


(Use the UI to explore!)

Rare genres


In [11]:
rare_count_max = 3
counts_rare = {k: v for k, v in counts.items() if v <= rare_count_max}
plot_genre_counts(counts_rare)


(Whoa! there are some heavy topics in here, like 'breakdance' and 'comdedy'...)

Train the classifier


In [12]:
%time mg.train()


/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 48 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 118 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 213 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 241 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 243 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 245 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 253 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 254 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 281 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 324 is present in all training examples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 350 is present in all training examples.

CPU times: user 32.9 s, sys: 1.71 s, total: 34.6 s
Wall time: 53.3 s
Out[12]:
<inaworld.inaworld.MovieGenres at 0x1111ccbe0>

Let's try some random spot checks.


In [13]:
idx = np.random.randint(0, high=mg.data['summaries'].size, size=20)

In [14]:
for i in idx:
    s = mg.data['summaries'][i]
    p = mg.predict(s)
    t = genres_tokens(mg.data['genres'][i])
    print('Summary: ', s)
    print('Predicted Genres: ', p)
    print('Truth Genres:    ', t)
    print()


Summary:  College students exploring an abandoned insane asylum accidentally shatter canisters holding the cremains of former mental patients; inhaling the dusty ash filling the air, they’re soon possessed by the souls once held within them. One, is a convicted serial killer from 1950. A.D. Calvo was inspired to write the story in 2005 upon reading a New York Times article about the cremation of Oregon's long-forgotten mentally ill.{{cite news}}
Predicted Genres:  ['supernatural', 'thriller']
Truth Genres:     ['supernatural', 'thriller']

Summary:  The movie begins with Jerry and his little nephew Tuffy  watching the Christmas ballet. Later, Jerry goes to the empty stage floor, where magic begins to happen. Toys come alive including Nelly the horse  and Paulie the Christmas Ornament . The magic then makes a Music Box Ballerina come to life, and Jerry dances with her. The stage is transformed into a wintry wonderland, where the toys are enjoying a dinner. Tom, who is in the alley looking for something to eat, hears this, and, with the other cats, raid the feast, trapping the toys. Jerry, Paulie, and Nelly try to stop this, but are shot out of a cannon. Tom traps the Ballerina in a cage, then brings her to the Cat King, who asks her to dance for him but is shot down. The Cat King tells the Ballerina there is nothing she can do about it, but she reminds him about Jerry, and that he will never give up. Later on, Tom is called to gather other cats and stop Jerry. Tuffy gives the Ballerina a string attached to a ring of keys. He then goes to warn Jerry and stop the cats. Meanwhile, Jerry, Paulie, and Nellie decide to follow the star to a man called the Toy Maker. They stop in front of a frozen river. All make it safely, except Jerry who falls in, and becomes tangled in a weed. He is freed, and is pulled up by Nelly and Paulie. This makes Paulie unravel. Tuffy gets to Tom, and dresses up as an angel and a devil. He is found out, and ends up sticking a trident in Tom's eye, causing the tower of cats to fall off of a cliff . He continues on to Jerry, warning him of the cats. Tom and his friends, disguised as Christmas trees, surround Jerry, but Tom gets attacked by squirrels, and shredded in a tree shredder. The cats attack, but the heroes escape, inside a tree. The cats beat up Tom by mistake. The heroes then come to a hill, where Paulie's head is sent flying into another hole. They go into the hole, only to find a fiery world with lava pits and dragons . A flame fairy gives Paulie his head back. A dragon wakes up, but is hypnotized by Jerry into lifting them out of the pit. They launch a cannon, which blasts Jerry and his friends into a house with clocks. Tom gives chase, but is pecked on by wooden birds.They are chased by the cats again, and run into a fairground. Tom is virtually destroyed here, being crushed again and again - of course, this being a cartoon, he always revives. They make it to a ridge, and Jerry blows up balloons with which they make it off safely. Tom, though, is blasted by cannons. One cat shoots an arrow, bursting Nelly's balloon. Tuffy grabs on to her, and unravels more of Paulie. Nelly is let down, and chased by the cats. Jerry saves her, but his grip fails him and her string slips out of his hand. The cats pull her string, and she tells them where the others are headed. The remaining three make it to the Toy Maker, who fixes Paulie and gives them a key which allows them to awaken an army of toy soldiers. The three depart with their newly attained army in order to take back their kingdom. Later, when the cats attempt to escape the army of toy soldiers, the Ballerina appears with the other toys, and she leads them in an army in rebellion against the Cat King. Tom vacuums up many of the soldiers, but the vacuum explodes, and they are blown onto the cats. Jerry and Tuffy are eaten by Tom, but Nelly returns, and throws a hammer, smashing Tom's teeth. Jerry then pushes a toy train and all the cats ride on it as it crosses the stage and exits, hitting the wall of the building next door to the theater and falling into a Dumpster. Then the Ballerina hugs Jerry telling him that she never doubted him. Soon, the wall next to them begins to crumble and collapses on Nelly. However, the magic revives her and removes her string, allowing her to talk without a string. Jerry and the Ballerina dance after receiving their crowns back. The Toy Maker and the ballerina from the Christmas play are shown to be watching from the audience; the real ballerina throws Jerry a rose, and the curtain is let down, ending the show.
Predicted Genres:  ['animation', "children's/family", 'historical epic']
Truth Genres:     ['animation', "children's/family", 'historical epic']

Summary:  Three very different siblings: Hans-Jörg a librarian who is a sex addict; Werner a politician in a troubled marriage with a son who enjoys to discredit his father; Martin who is now Agnes after having a gender reassignment operation. Agnes works as a dancer and is suffering from unrequited love.
Predicted Genres:  ['drama', 'family drama', 'lgbt', 'world cinema']
Truth Genres:     ['drama', 'family drama', 'lgbt', 'romantic drama', 'world cinema']

Summary:  Following a horrible street accident Agata is in a coma – and Catalina begins to experience the pain and terror that her unconscious sister is going through…Eunice Martínez Arias. Orgulloso de su ‘miedo’. February 24, 2007. El Siglo de Torreón. Catalina must try to solve the mystery of her twin sister’s accident at ‘Km. 31’ and discovers a local legend that tells of malignant spirits that prowl the highway ‘Km 31’ and who are said to prey on travellers…Fausto Ponce. Kilómetro 31: Una “pesadilla” hecha realidad. Revista Proceso. Following a series of terrifying events, Catalina realizes that their link is growing stronger and that her sister Agata is screaming for help from her unconscious state.Km 31, vuelve el cine de terror mexicano. January 22, 2007. Quinta Dimensión. With the help of Nuño, Agata’s long time friend and Omar, Agata’s boyfriend, they soon discover that not only is Agata in a coma, but she is also trapped between life and death, between reality and a terrible netherworld of evil spirits and ancient legend."Kilómetro 31" continua con éxito. February 28, 2007, La Voz.
Predicted Genres:  ['horror', 'thriller']
Truth Genres:     ['horror']

Summary:  The film is about Lakkhi  a Bengali village woman whose husband fell from a roof while she was expecting her first baby, and has remained cripple ever since. She has lost her baby from the shock, and as a meagre compensation, is offered work as a wet-nurse in middle-class families where presumably women have other things to do than feed their children. Perhaps because of the social difference, she is in no position to resist the advances from the “gentlemen” who take advantage of her presence, and she soon finds herself caught in a system whereby if she wants to bring some cash home, and feed an ailing husband, she needs to continue to breast-feed. Naturally this can only be achieved if she delivers again, and so she does. But she has no children. One day, before important elections, alongside with a street-cleaning programme, she is asked to leave town because she’s a prostitute. She then tries to meet the “gentlemen” who formerly used to be pleased with her in more than one way, but all of them reject her, now that the word “prostitute” has been pronounced publicly. She will end up a pensioner in the brothel and the last picture is that of one young man coming to look at her, a young man whom she had lovingly suckled when he was a baby.{{cite web}}
Predicted Genres:  ['art film', 'drama']
Truth Genres:     ['art film', 'drama']

Summary:  Frannie and Calvin, a couple in their late twenties in Manhattan, had been dating for a few months when Frannie discovers she is pregnant. Calvin leaves on a tour with his band before Frannie finds the courage to tell him about the pregnancy. Frannie goes home to visit her parents just outside Toronto where her mother convinces her to tell Calvin, though the cell phone signal is weak and distorted and she believes he hung up on her. On her way back to New York a border guard refuses Frannie entrance to the USA as her visa has expired. When she says she has her apartment and job in New York she receives no sympathy. She then confesses to being with child hoping to gain some understanding from the female border guard. The border guard then bars her from entry to the USA for 12 months. Frannie is forced to do her work as a magazine editor from her parent's home. Calvin shows up a little while later at Frannie's parents' home. Calvin and Frannie soon realise they have no idea how to cook, keep house, or raise a child and their relationship deteriorates. Michael Tate, a famous writer, offers to help Frannie get her visa reinstated because of the difficulties of having Frannie work remotely as his editor. Frannie moves back to Manhattan and begins to develop a relationship with Michael. In the end Frannie realises that the glamour and romance Michael has to offer is not what she wants and she seeks out Calvin who has returned to New York and his experimental jazz band that incorporates 'found instruments'.
Predicted Genres:  ['comedy']
Truth Genres:     ['comedy']

Summary:  Augustus Vinero  is a wealthy international criminal known for his habit of sending explosive wristwatches or necklaces to those not in his favor. When he hears of Ramel , a small boy who may know the location of the fabled Valley of Gold in Mexico, he sends a death squad of plainclothes mercenaries which destroys the farmhouse  where Ramel is being sheltered. Prior to his murder, the head of the farmhouse summoned his old friend Tarzan to track the kidnappers and rescue the boy. Aware of Tarzan's arrival, Vinero uses one of his assassins to impersonate a taxi driver to meet Tarzan at the airport. Tarzan is driven to an ambush in an empty stadium. After the driver is killed, Tarzan kills the sniper by crushing him with a giant Coca-Cola bottle used in the stadium for advertising. When meeting the local authorities, Tarzan is offered troops, technology and weapons for his mission. Tarzan turns them down in favor of his own equipment-a chimpanzee scout, a lion named Major, his weapons of a hunting knife and longbow and his uniform of a loincloth. Meanwhile Vinero and his private army are being led to the lost city by Ramel. Vinero's uniformed private army is well equipped with American World War II small arms, an M3 Stuart light tank, an M3 Half-track and a Bell 47 helicopter. Along the way, Tarzan rescues Sophia Renault , Vinero's mistress who attempted to help Ramel, only to be rewarded with an exploding necklace that Tarzan removes. Tarzan and Major kill Vinero's plainclothes mercenaries, and Tarzan, using a captured M1919 Browning machine gun  and bag of Mk 2 grenades, brings down the helicopter attacking them. Tarzan truthfully informs Vinero of his exploits and losses to Vinero's forces on the deceased party's radio, and that Vinero is next in line for similar treatment unless he releases the boy. Ignoring Tarzan's warning, Vinero's army, led by Ramel, have discovered the entrance to the Valley of Gold through a cave. Losing time, they build a wider path in able to bring their vehicles to the valley. Upon arrival in the peaceful city, Vinero demands all the gold in the city and provides motivation by having his tank shell the buildings which kills several of the city's inhabitants. All the gold is brought to Vinero who has his troops load the half track up with the items. However, the Chief of the village says there is only one more piece of gold that the greedy Vinero demands. Tracking Vinero's army to the cave entrance to the lost city, Tarzan further demonstrates his expertise in weaponry by wiping out Vinero's rear guard ambush party by crushing them with stalactites hanging over them which he shoots down with a captured M1918 Browning Automatic Rifle. Tarzan then kills the tank driver who is watching the rest of the army load the gold onto the halftrack. Tarzan eliminates the remainder of the army by expertly using the cannon of the tank on the halftrack and the army. Meanwhile the Chief has led Vinero to an empty room holding only one golden ornament on a wall. As Vinero eagerly attempts to pull it off the wall the door shuts and is sealed and the ceiling releases enough gold dust to fill the room and smother Vinero. The finale involves Tarzan battling Vinero's hulking Oddjob-type henchman, Mr. Train , in unarmed combat to the death.
Predicted Genres:  ['action', 'adventure', 'jungle film']
Truth Genres:     ['adventure', 'jungle film']

Summary:  The film is based on manuscripts of the interrogations of Adolf Eichmann  before he was tried and hanged in a prison in Israel. Eichmann recounts events from his past to a young Israeli officer, Captain Avner Less , who is faced with the immense task of tricking the skilled manipulator into self-incrimination. While the world waits, Less' countrymen call for immediate execution, forcing him and Eichmann to confront each other in a battle of wills.
Predicted Genres:  ['drama', 'political drama']
Truth Genres:     ['biographical film', 'biography', 'drama', 'history', 'war film']

Summary:  Jimeoin, in the title role, plays a man obsessed with becoming famous. He is passionate about being a celebrity, but unfortunately he just isn't very talented. After trying to secure roles in myriad productions he finally finds employment as an extra, and what follows is his misadventures as he becomes involved with shady business men, producers and mobsters, all of whom are fixated with show business.
Predicted Genres:  ['comedy', 'romance film', 'romantic comedy']
Truth Genres:     ['comedy', 'romance film', 'romantic comedy']

Summary:  Crippled trapeze aerialist and former star Mike Ribble  sees great promise in young, brash Tino Orsini . Ribble—only the sixth man to have completed the dangerous triple somersault—thinks his protégé is capable, under his rigorous training, of matching his feat. However, Orsini is distracted by the third member of their circus act, the manipulative Lola . Tensions rise as a love triangle forms.
Predicted Genres:  ['drama']
Truth Genres:     ['drama']

Summary:  Ron Decker, a young man convicted for drug possession, is sent to prison where veteran con Earl Copen takes Decker under his wing and introduces him into his own gang. Copen first helps out Decker when three Puerto Ricans attempt to lure him into a cell block to rape him, however Copen sees through their plans and talks to the Pueto Ricans, who quickly abandon interest in Decker. Over the next few days, Copen helps Decker out by getting him better jobs, food, and even transferring him to his own cell block. Mainly however Copen helps Decker's case and points out that under a new article passed by the legislature, a judge can modify a sentence in the first 90 days if he sees fit, so Copen  helps write false reports and gives Decker advice to stay out of trouble, which will make Decker appear as a "very small threat to society". However, after large inmate Buck Rowan attempts to rape Decker in the bathroom, Decker stabs Rowan in a fight involving Copen, paralyzing Rowan. Rowan signs a statement claiming Decker and Copen are responsible and their cells are stripped and they are restricted to them. Because of the stabbing, Decker's attempt at a modified sentence is denied and his sentence remains five years. Meanwhile, Copen manages to get word out Rowan is "snitching", and an inmate working at the infirmary poisons Rowan's IV with cleaning fluid. The case against Copen and Decker is thrown out as the victim and main witness is dead. Shortly after their release, Copen tells Decker he plans to escape, and they plot to hide in a garbage truck and avoid being crushed by the compressor by using a bar to stop it. Decker escapes in one truck, Copen however stays behind, unable to jump into the truck after the appearance of one of the prison guards. Decker manages to flee to Costa Rica and Copen stays behind, after stating "This is my prison, after all" and quoting Satan from Paradise Lost by John Milton: "Better to reign in hell than serve in heaven."
Predicted Genres:  ['drama']
Truth Genres:     ['crime fiction', 'drama', 'prison']

Summary:  Professor Utonium hopes to create the perfect little girl using a mixture of sugar, spice, and everything nice to improve Townsville, a city plagued by villains. He is shoved by his laboratory assistant, a destructive chimpanzee named Jojo, causing him to accidentally break a container of a mysterious substance called Chemical X that spills into the mixture and explodes in Jojo's face. The Professor finds that the experiment was a success, having produced three little girls whom he names Blossom , Bubbles , and Buttercup . These girls also have superpowers as a result of the additional Chemical X, though they all immediately grow to love each other as a family. During their first day of school, the girls learn about the game tag and begin to play amongst themselves, which quickly grows destructive when they begin using their powers. They take their game downtown, accidentally causing massive damage to the city until the Professor calms them down and cautions them against using their powers outside. As a result of the destruction, the citizens of Townsville treat the girls as outcasts while the Professor is arrested for creating the girls. The despondent girls try to make their way home on foot, but become lost in an alleyway and are attacked by the Gangreen Gang. They are rescued by Jojo, whose brain has mutated and given him enhanced intelligence as a result of the Chemical X explosion. Planning control of the city, Jojo gains the girls' empathy by saying he is also hated for his powers, and manipulates them into helping him build a laboratory and machine over a volcano in the middle of town that he claims will gain them the affections of the city. He also has them steal a batch of Chemical X from the Professor's lab. As a reward, Jojo takes the girls to the local zoo and secretly implants small transportation devices on all the primates there. That night, Jojo transports all the primates from the zoo into his volcano lair and uses his new machine to inject them with Chemical X, turning them into evil mutant primates like himself. The next morning, after the Professor is released from prison, the girls show him all the "good" they have done, only to discover the city being attacked by the monkeys. Jojo, renaming himself Mojo Jojo, publicly denounces the girls as his assistants, turning everyone, including the distraught Professor, against them. The girls blast off into space, dejected. Mojo Jojo announces his intentions to rule the planet, but becomes frustrated when his minions, now as intelligent and evil as he is, begin concocting their own plans to terrorize the people of Townsville. Overhearing this turmoil from space, the girls return to Earth and use their powers to defeat the primates and rescue the citizens. In response, Mojo injects himself with Chemical X and grows into a giant monster, but the girls defeat him after an intense battle by pushing him off a skyscraper. Hoping to help the girls, the Professor develops an antidote for Chemical X which Mojo Jojo lands on, shrinking him down to his original size. The girls consider using the Antidote X to erase their powers, thinking they would be accepted as normal little girls, but the citizens of Townsville protest, apologizing for misjudging the girls and thanking them for their heroic deeds. At the insistence of the Mayor, the girls agree to use their powers to defend Townsville and become the city's beloved crime-fighting team of superheroes: the Powerpuff Girls.
Predicted Genres:  ['adventure', 'animation', "children's/family", 'comedy', 'family film', 'family-oriented adventure', 'fantasy', 'superhero movie']
Truth Genres:     ['adventure', 'animation', "children's/family", 'comedy', 'family film', 'family-oriented adventure', 'fantasy', 'superhero movie']

Summary:  The film begins with Renzo Arbore and Luciano De Crescenzo driving in Rome, while discussing an original idea for a new movie. They pass under the window of real-life filmmaker Federico Fellini, who is writing a screenplay entitled F.F.S.S . A wind causes the screenplay to fall to the road below, and the two pick it up and decide to use Fellini's ideas themselves. Renzo Arbore plays Onliù Caporetto, a manager from Irpinia trying to bring success to Lucia Canaria . While travelling across Italy, they become involved in TV commercial in Milan, then go to Rome looking for a recommendation to work in RAI. Eventually they encounter Sceicco Beige , a music celebrity. They participate in the Festival di Sanremo 1983, where Raffaella Carrà sings Soli sulla luna and Ahi.
Predicted Genres:  ['comedy film']
Truth Genres:     ['comedy film']

Summary:  Tina Balser, an educated, frustrated housewife and mother, is in a loveless marriage with Jonathan, an insufferable, controlling, emotionally abusive, social-climbing lawyer in New York City. He treats her like a servant, undermines her with insults, and belittles her appearance, abilities and the raising of their two girls, who treat their mother with the same rudeness as their father. Searching for relief, she begins a sexually fulfilling affair with a cruel and coarse writer, George Prager, who treats her with similar brusqueness and contempt, which only drives her deeper into despair. She then tries group therapy, but this also proves fruitless when she finds her male psychiatrist, Dr. Linstrom, as well as the other participants, equally shallow and abusive.
Predicted Genres:  ['drama']
Truth Genres:     ['comedy-drama', 'drama', 'feminist film', 'marriage drama']

Summary:  Paglu , Abhi  and Amrita 'Amu'  are best friends since childhood. Abhi's father  is a very rich businessman, and his only son has no interest in work. Abhi has had many girlfriends, but none of them loved him. Paglu and Abhi's father tells Amu to marry Abhi because they know each other very well. One day Abhi comes and tells Amu that he loves someone that is his childhood friend; Amu thinks that she is the girl whom Abhi loves and she, too, falls in love with him. Abhi later reveals that Kartika  is whom he loves.. Karthika is actually after his money. Amu is heartbroken. Amu and Paglu start hating her and try to get them separated. Abhi gets angry with them and decides to marry her in another country. At the airport Amu hits Kartika and she falls unconscious. Amu and Paglu kidnap Kartika and force her to write a letter to Abhi saying that she doesn't love him. Paglu tells Abhi that Amu is the right girl for her, and Abhi falls in love with her. Paglu, with Karthika still kidnapped, finds out that her name is Anjali, not Karthika. She is not in love with Abhi but with his money, and she has a history of cheating people. Paglu frees her and she tells Abhi about the kidnapping. Abhi is unhappy hearing this and decides to marry Karthika anyway. On the wedding day, Paglu tries to stop them from getting them married by bringing all the people she cheated; Abhi gets angry and tells Paglu and Amu to get out. The next day, Abhi comes back, apologizes to them and says that he isn't married. He tells them that yesterday when they left Anjali told him that she has learned a lot from the time Paglu and Amu kept her in captivity. She apologizes that she can't marry him and that Amu is the right girl for him. Abhi proposes to Amu. Later, Paglu ends up with the changed Anjali.
Predicted Genres:  ['bollywood', 'drama', 'musical', 'romance film', 'romantic drama', 'world cinema']
Truth Genres:     ['bollywood', 'drama', 'musical', 'romance film', 'romantic drama', 'world cinema']

Summary:  The film is set in contemporary Tehran, and portrays the city life in three distinctive episodes of sentiment, sensitivity and wit. In the first story, we see a young woman who has been beaten by her husband. The woman is about to complain legally, but the husband is concerned about his job and the embarrassment. The next story is about a clergyman whose wallet and documents have been stolen. The clergyman tries to get the documents back from the thief. The last story is the story of an elderly couple whose TV has broken. The couple is alone in the building and is afraid of opening the door to the young repairman.{{cite web}}
Predicted Genres:  ['drama']
Truth Genres:     ['comedy', 'drama']

Summary:  {{quote}} Tom Sharky is a narcotics cop in Atlanta who is working on a transaction with a drug dealer. Another member of the force, Smiley, shows up unexpectedly during the sting, causing the drug dealer to run and Sharky to give chase, ultimately shooting the suspect on a MARTA bus only after the wounding of the bus driver. In the aftermath, Sharky is demoted to vice-squad, which is considered the least desirable assignment in the police department. In the depths of the vice-squad division, led by Friscoe, the arrest of small-time hooker Mabel results in the accidental discovery of a high-class prostitution ring that includes a beautiful escort named Dominoe who charges $1,000 a night. Sharky and his new partners begin a surveillance of her apartment and discover that Dominoe is having a relationship with Hotchkins, a candidate running for governor. With a team of downtrodden fellow investigators that includes Papa, Arch, and Nosh, referred to by Friscoe sarcastically as Sharky's "machine," he sets out to find where the trail leads. During one of the stakeouts, a mysterious crime kingpin known as Victor comes to Dominoe's apartment. He has been controlling her life since Dominoe was a young girl, but now she wants out. Victor agrees but forces her to have sex with him one last time. The next day, Sharky witnesses  Dominoe blown away by a shotgun blast through her front door, killing her and disfiguring her face beyond recognition. Sharky has privately been developing feelings for her while viewing through binoculars and listening to her bugged conversations. The man who shot her, known as Billy Score, is a drug addict and Victor's brother. He answers to Victor, as does Hotchkins, who is in love with Dominoe but remains a powerless political stooge under Victor's rule. Dominoe suddenly turns up to Sharky's surprise, and is told that her friend Tiffany used her apartment and is the one who was mistakenly shot by Billy Score. Dominoe is convinced that if Victor wants her dead, she is going to be dead, but reluctantly leaves with Sharky to be hidden away at his childhood home in the West End neighborhood. Meanwhile, Nosh informs Sharky that most of the surveillance tapes have disappeared from the police station, leaving both of them wondering if the investigation has been compromised. Nosh is then confronted by Billy Score, who kills him offscreen. Sharky confronts Victor at his penthouse apartment in the Westin Peachtree Plaza and vows to bring him to justice. Victor smugly knows that Dominoe is dead and cannot testify against him, but is stunned to be told by Sharky that she is still alive. While attempting to find Nosh at his home, two men spring an attack on Sharky and he is knocked cold. He awakens on a boat, where he is held captive and tortured by Smiley, who turns out to be working for Victor. Smiley informs him of the killing of Sharky's old narcotics division boss JoJo , and reveals that Nosh is dead as well. He cuts off two of Sharky's fingers while demanding to know where Dominoe can be found. Sharky attacks and shoots Smiley, and he manages to escape. Later, Sharky turns up with Dominoe at a Hotchkins political rally, to the candidate's considerable shock. Hotchkins is placed under arrest, and Victor finds out about it on the evening newscasts. Billy Score, in an agitated state, shoots and kills Victor. Almost immediately, Sharky and other police officers arrive at Victor's penthouse in an attempt to catch Billy. He is pursued through the upper floors of the Westin, where like a ghostly apparition he appears and disappears, killing Papa and seriously wounding Arch. Billy ultimately is gunned down by Sharky, crashing through a window and plummeting to his death nearly 700 feet below. In the end, Sharky returns to his childhood home, where Dominoe is now living with him.
Predicted Genres:  ['action', 'crime fiction', 'thriller']
Truth Genres:     ['action', 'action thrillers', 'action/adventure', 'crime thriller', 'drama', 'gangster film', 'suspense', 'thriller']

Summary:  Los Jinetes del Alba is a film adaptation of a novel by Jesús Fernández Santos, made as a five parts TV miniseries. It follows the life of Marian, a young woman who greatest ambitions is to be the owner of the resort where she works. The action is set in Las Caldas, a small town in Asturias, where the lives of its inhabitants are forever changed with the arrival of the Asturian revolution of 1934 and the Spanish Civil War.
Predicted Genres:  ['drama']
Truth Genres:     ['drama']

Summary:  In this film, filmmaker David Fisher recruits his three brothers and one sister to set out on a journey to find their lost sister. After the death of their parents, Fisher feels that his family has been grown apart and that his siblings have gone their own separate way, focusing on their relationships with their spouses, children and problems at work. Fisher feels that a search for their sister, who was taken from their parents at birth, would provide a good opportunity to bring them closer. Fisher and his siblings, whose parents were Holocaust survivors, set out on a journey to look into their past, which is also the history of the State of Israel and the time when it was established. The siblings become amateur detectives, hoping to find a shred of evidence to lead them to their sister. During this journey the camera reveals some intimate moments within a family, struggling to survive.
Predicted Genres:  ['documentary', 'drama']
Truth Genres:     ['documentary']

Summary:  In the year 2030, a civil war breaks out in the United States. In a final attempt to restore order, the president declares martial law. In 2033, a massive prison camp known as "the Red Zone" is built in a desolate city that soon holds over one million insane, violent felons. The United States is declared safe. A dangerous criminal known as the Reaper  has been extracting sarin, which he plans to spill into the nation's water supply. One of the prisoners, FX  secretly films the Reaper with a Wi-Fi digital camera as he discusses these plans, and the state's governor, Reagan Black  finds out about them. Black develops a plan to hold a "death race" within the prison system, assembling four teams of racers: * The Severed Head Gang, consisting of Danny Satanico  and Fred "The Hammer" , two members of the largest gang in the United States, known for decapitating their enemies. The team is given a customized 1995 town car. * Homeland Security, consisting of Colonel Bob  and Captain Rudy Jackson , formerly honored, but now disgraced members of the United States Army. The team is given a vintage 1943 Willys MB. * Vaginamyte, consisting of Double-Dee Destruction  and Queen B , two serial killers who seduced and murdered over 72 male and female victims. The team is given a yellow Lotus with a black widow spider design. * Insane Clown Posse , whose violent form of hip hop was attributed as indirectly influencing multiple murders, acts of terrorism and a school massacre which resulted in the rappers being convicted for these murders and being dubbed as "the Charles Manson of their time". Although the group's music has been banned, it continues to retain a strong fanbase. Violent J and Shaggy 2 Dope are given an ice cream truck customized with a meat grinder, machine guns and "all the bling-bling these two Detroit locals could find". The race is televised live, hosted by anchors Harvey Winkler  and Jennifer Ramirez . Black offers the teams gathering points for killing loose prisoners, promising freedom to the team that brings back the Reaper—dead or alive. When Danny Satanico suggests that the four teams escape, Black reveals that each team member has a chip implanted in their bodies which would kill any member that breaks the rules, using Satanico to demonstrate. When Insane Clown Posse's truck gets a flat tire, a fight ensues between the teams and loose criminals. In the distance, Violent J witnesses an explosion. The teams investigate, finding the burning Homeland Security jeep with two corpses inside. Violent J and Shaggy 2 Dope find FX filming the race. He tells them that there will be an ambush at their first destination, and they let him ride in their van. Each of the teams work together to surprise and kill the ambushers. Metal Machine Man , under the order of the Reaper, kills FX and attacks the racers before being hit by missiles fired by a pair of mysterious men. The teams fix their cars before dispatching. Violent J and Shaggy 2 Dope arrive at the Reaper's lair, and successfully infiltrate the fortress, preventing the Reaper and his henchmen from releasing the sarin into the water. The mysterious men arrive, firing a rocket into the room, and reveal themselves to be Colonel Bob and Captain Rudy, who were hired by Governor Black as inside men, and faked their deaths to convince the other teams that they had a chance of winning. Believing the Reaper died in the explosion, Bob and Rudy retrieve his severed hand and leave in Insane Clown Posse's truck. Violent J and Shaggy 2 Dope emerge from the rubble. Because Violent J is injured, Shaggy 2 Dope goes after Bob and Rudy alone. The Reaper appears and attempts to release the sarin as Violent J attempts to stop him. The Homeland Security team members arrive at the finish line, presenting the Reapers hand to Governor Black. Shaggy 2 Dope rises from the back of the truck, shooting at Bob, Rudy and the governor. Black presses the button to activate the explosives in the bodies of the Insane Clown Posse team members. The sarin explodes, causing a chain reaction which destroys the country.
Predicted Genres:  ['action', 'action/adventure', 'auto racing', 'comedy', 'horror', 'sports', 'thriller']
Truth Genres:     ['action', 'action/adventure', 'auto racing', 'comedy', 'horror', 'indie', 'sports', 'thriller']

(Most of these have pretty strong intersections between the predicted and truth sets.

Classification peformance


In [15]:
print(mg.report())


                                          precision    recall  f1-score   support

                               absurdism       1.00      0.48      0.65        25
                            acid western       1.00      1.00      1.00         4
                                  action       0.85      0.69      0.77      1485
                           action comedy       1.00      0.55      0.71        42
                        action thrillers       0.94      0.52      0.67        92
                        action/adventure       0.82      0.62      0.70       876
                         addiction drama       1.00      0.33      0.50        12
                                   adult       0.94      0.47      0.62        32
                               adventure       0.86      0.61      0.72       815
                        adventure comedy       1.00      0.47      0.64        32
                  airplanes and airports       1.00      0.54      0.70        13
                             albino bias       0.00      0.00      0.00         4
                              alien film       1.00      0.82      0.90        17
                          alien invasion       0.00      0.00      0.00         0
                               americana       0.85      0.50      0.63        22
                          animal picture       1.00      0.41      0.59        41
                                 animals       1.00      0.25      0.40         4
                        animated cartoon       0.94      0.68      0.79        25
                        animated musical       1.00      0.36      0.53        14
                               animation       0.90      0.71      0.79       599
                                   anime       1.00      0.68      0.81        84
                               anthology       1.00      1.00      1.00         2
                            anthropology       0.00      0.00      0.00         0
                                anti-war       1.00      0.50      0.67         8
                           anti-war film       1.00      0.44      0.62         9
apocalyptic and post-apocalyptic fiction       1.00      0.67      0.80         9
                             archaeology       0.00      0.00      0.00         1
                    archives and records       0.00      0.00      0.00         1
                                art film       1.00      0.46      0.63        90
                             auto racing       1.00      0.80      0.89        15
                             avant-garde       1.00      0.53      0.69        19
                                 b-movie       1.00      0.69      0.82       103
                               b-western       1.00      0.09      0.17        11
                       backstage musical       1.00      0.83      0.91         6
                                baseball       0.00      0.00      0.00         3
                              beach film       1.00      1.00      1.00         3
                        beach party film       1.00      1.00      1.00         2
                          bengali cinema       1.00      1.00      1.00         3
                              biker film       1.00      0.50      0.67         6
                       biographical film       0.93      0.47      0.62       168
                               biography       1.00      0.46      0.63       142
                          biopic feature       1.00      0.48      0.64        82
                            black comedy       0.98      0.47      0.64       189
                         black-and-white       0.89      0.60      0.72       931
                          blaxploitation       1.00      0.58      0.74        24
                bloopers & candid camera       1.00      1.00      1.00         1
                               bollywood       0.81      0.61      0.69       243
                                  boxing       1.00      0.55      0.71        11
                              breakdance       0.00      0.00      0.00         0
                     british empire film       1.00      0.25      0.40         8
                        british new wave       1.00      0.60      0.75         5
                         bruceploitation       1.00      1.00      1.00         1
                               buddy cop       1.00      1.00      1.00         3
                              buddy film       1.00      0.40      0.57        83
                           buddy picture       0.00      0.00      0.00         0
                                business       0.00      0.00      0.00         1
                                 c-movie       0.00      0.00      0.00         0
                                    camp       1.00      1.00      1.00         1
                             caper story       1.00      0.38      0.55        16
                            cavalry film       1.00      0.50      0.67         4
                             chase movie       1.00      0.50      0.67        22
                             chick flick       0.00      0.00      0.00         0
                         childhood drama       1.00      0.41      0.59        29
                              children's       0.95      0.52      0.67       104
                children's entertainment       1.00      0.33      0.50         3
                      children's fantasy       0.94      0.46      0.62        70
                       children's issues       0.00      0.00      0.00         0
                       children's/family       0.88      0.51      0.65       196
                          chinese movies       0.92      0.74      0.82       245
                          christian film       0.89      0.59      0.71        27
                         christmas movie       1.00      0.47      0.64        30
                          clay animation       1.00      1.00      1.00         2
                                cold war       1.00      1.00      1.00         2
                            combat films       1.00      0.38      0.55        24
                                 comdedy       0.00      0.00      0.00         0
                                  comedy       0.81      0.72      0.76      2605
                             comedy film       0.94      0.53      0.68       437
                           comedy horror       1.00      0.50      0.67         6
                        comedy of errors       1.00      0.53      0.69        49
                       comedy of manners       1.00      0.52      0.69        65
                         comedy thriller       1.00      0.43      0.60        14
                          comedy western       1.00      0.17      0.29         6
                            comedy-drama       0.97      0.44      0.61       337
                           coming of age       0.94      0.49      0.64       189
                      coming-of-age film       0.00      0.00      0.00         0
                      computer animation       1.00      0.59      0.74        46
                               computers       0.00      0.00      0.00         1
                            concert film       1.00      0.50      0.67         4
                      conspiracy fiction       0.00      0.00      0.00         0
                       costume adventure       1.00      0.36      0.53        22
                           costume drama       0.96      0.53      0.68        87
                          costume horror       0.88      0.54      0.67        13
                        courtroom comedy       1.00      0.67      0.80         3
                         courtroom drama       1.00      0.56      0.72        39
                           creature film       0.91      0.62      0.74       100
                                   crime       0.00      0.00      0.00         2
                            crime comedy       1.00      0.39      0.56        41
                             crime drama       0.98      0.41      0.57       106
                           crime fiction       0.87      0.62      0.73      1046
                          crime thriller       0.93      0.50      0.65       441
                                    cult       0.99      0.55      0.71       182
                       culture & society       1.00      0.41      0.58        44
                               cyberpunk       1.00      0.50      0.67         2
                   czechoslovak new wave       1.00      1.00      1.00         1
                                   dance       1.00      0.64      0.78        28
                           demonic child       1.00      0.50      0.67         2
                               detective       1.00      0.53      0.70        60
                       detective fiction       1.00      0.50      0.67        46
                                disaster       1.00      0.55      0.71        67
                               docudrama       1.00      0.52      0.69        61
                             documentary       0.94      0.69      0.80       311
                                dogme 95       1.00      0.50      0.67         4
                         domestic comedy       1.00      0.59      0.74        44
                           doomsday film       1.00      0.62      0.77        29
                                   drama       0.83      0.84      0.83      4788
                                dystopia       1.00      0.56      0.71        27
                         ealing comedies       0.00      0.00      0.00         2
                      early black cinema       1.00      0.67      0.80         3
                               education       0.00      0.00      0.00         0
                             educational       1.00      0.50      0.67         4
                           ensemble film       1.00      0.42      0.59        91
                   environmental science       0.00      0.00      0.00         0
                                    epic       0.96      0.52      0.68        48
                            epic western       1.00      0.50      0.67         4
                            erotic drama       1.00      0.51      0.68        35
                         erotic thriller       1.00      0.49      0.66        43
                                 erotica       1.00      0.50      0.67        56
                             escape film       1.00      0.73      0.84        11
                              essay film       0.00      0.00      0.00         1
                          existentialism       1.00      0.58      0.74        12
                       experimental film       1.00      0.58      0.73        26
                            exploitation       1.00      0.50      0.67         4
                           expressionism       0.00      0.00      0.00         0
                          extreme sports       1.00      1.00      1.00         2
                              fairy tale       1.00      0.65      0.79        17
         family & personal relationships       1.00      1.00      1.00         3
                            family drama       0.99      0.54      0.69       200
                             family film       0.89      0.65      0.75       763
               family-oriented adventure       1.00      0.46      0.63        57
                                fan film       1.00      0.67      0.80         3
                                 fantasy       0.89      0.57      0.69       496
                       fantasy adventure       0.96      0.59      0.73        44
                          fantasy comedy       0.96      0.46      0.62        52
                           fantasy drama       1.00      1.00      1.00         2
                            feature film       0.00      0.00      0.00         0
                       female buddy film       1.00      1.00      1.00         1
                           feminist film       1.00      0.21      0.35        19
                          fictional film       0.00      0.00      0.00         1
                                filipino       0.00      0.00      0.00         0
                         filipino movies       1.00      0.52      0.68        56
                                    film       1.00      0.33      0.50         6
               film & television history       1.00      0.40      0.57         5
                        film \u00e0 clef       1.00      0.18      0.31        11
                         film adaptation       0.99      0.45      0.62       320
                               film noir       1.00      0.55      0.71       134
                              film-opera       0.00      0.00      0.00         0
                             filmed play       0.00      0.00      0.00         2
                     finance & investing       0.00      0.00      0.00         0
                          foreign legion       1.00      0.25      0.40         4
                             future noir       1.00      0.40      0.57        10
                           gangster film       1.00      0.47      0.63        86
                                     gay       1.00      0.50      0.67        60
                            gay interest       1.00      0.50      0.67        60
                         gay pornography       1.00      0.67      0.80         3
                              gay themed       1.00      0.51      0.68        70
                           gender issues       1.00      0.80      0.89         5
                                  giallo       1.00      1.00      1.00         2
                     glamorized spy film       0.88      0.78      0.82         9
                              goat gland       0.00      0.00      0.00         1
                             gothic film       1.00      0.50      0.67        16
                  graphic & applied arts       0.00      0.00      0.00         2
                               gross out       1.00      0.58      0.73        19
                          gross-out film       1.00      0.58      0.73        19
                                gulf war       1.00      0.67      0.80         3
                             hagiography       1.00      0.25      0.40         4
                    hardcore pornography       0.00      0.00      0.00         0
                      haunted house film       1.00      0.50      0.67        24
                        health & fitness       0.00      0.00      0.00         0
               heaven-can-wait fantasies       1.00      0.30      0.46        10
                         heavenly comedy       1.00      0.33      0.50         6
                                   heist       1.00      0.46      0.63        28
                          hip hop movies       1.00      0.30      0.46        10
                historical documentaries       0.00      0.00      0.00         0
                        historical drama       1.00      0.64      0.78        33
                         historical epic       1.00      0.55      0.71        11
                      historical fiction       1.00      0.51      0.68        70
                                 history       1.00      0.55      0.71        77
                            holiday film       1.00      0.45      0.62        31
                           homoeroticism       0.00      0.00      0.00         0
                                  horror       0.93      0.79      0.86      1039
                           horror comedy       1.00      0.60      0.75        58
                            horse racing       1.00      0.50      0.67         2
                                  humour       1.00      0.50      0.67         2
                          hybrid western       1.00      0.50      0.67        12
                illnesses & disabilities       1.00      0.60      0.75         5
                          indian western       1.00      0.33      0.50         3
                                   indie       0.87      0.48      0.62       949
                     inspirational drama       1.00      0.33      0.50         3
                      instrumental music       1.00      1.00      1.00         1
             interpersonal relationships       1.00      0.33      0.50         6
                inventions & innovations       0.00      0.00      0.00         1
                         japanese movies       0.95      0.70      0.81       320
                              journalism       0.00      0.00      0.00         0
                         jukebox musical       1.00      1.00      1.00         2
                             jungle film       0.93      0.68      0.79        19
               juvenile delinquency film       1.00      0.56      0.71        18
                              kafkaesque       1.00      0.50      0.67         2
                    kitchen sink realism       1.00      0.50      0.67         2
                   language & literature       1.00      1.00      1.00         1
                                  latino       0.00      0.00      0.00         0
                             law & crime       1.00      1.00      1.00         6
                             legal drama       1.00      1.00      1.00         3
                                    lgbt       0.96      0.53      0.68       206
                libraries and librarians       0.00      0.00      0.00         2
                             linguistics       0.00      0.00      0.00         0
                             live action       1.00      0.33      0.50         3
                        malayalam cinema       1.00      1.00      1.00         2
                          marriage drama       0.88      0.48      0.62        31
                       martial arts film       0.88      0.68      0.77       174
                   master criminal films       1.00      0.50      0.67         2
                            media satire       1.00      0.68      0.81        19
                           media studies       0.00      0.00      0.00         0
                         medical fiction       1.00      0.67      0.80        12
                               melodrama       1.00      0.45      0.62       132
                            mockumentary       1.00      0.57      0.72        23
                              mondo film       0.00      0.00      0.00         0
                                 monster       0.96      0.71      0.81        31
                           monster movie       0.97      0.68      0.80        47
                            movie serial       1.00      0.33      0.50         3
                 movies about gladiators       0.00      0.00      0.00         1
                              mumblecore       1.00      0.50      0.67         2
                                   music       0.98      0.37      0.54       115
                                 musical       0.86      0.51      0.64       563
                          musical comedy       1.00      0.43      0.60        44
                           musical drama       1.00      0.68      0.81        40
                                 mystery       0.91      0.59      0.72       511
                    mythological fantasy       1.00      0.40      0.57         5
                        natural disaster       1.00      0.80      0.89         5
                    natural horror films       1.00      0.74      0.85        54
                                  nature       1.00      0.17      0.29         6
                                neo-noir       1.00      0.22      0.36        18
                              neorealism       0.00      0.00      0.00         1
                           new hollywood       1.00      0.50      0.67        24
                        new queer cinema       0.00      0.00      0.00         0
                                    news       1.00      0.50      0.67         2
                             ninja movie       0.00      0.00      0.00         0
                                northern       0.00      0.00      0.00         0
                         nuclear warfare       1.00      0.50      0.67         2
                                operetta       0.00      0.00      0.00         1
                                  outlaw       1.00      0.33      0.50         3
                       outlaw biker film       1.00      1.00      1.00         2
              parkour in popular culture       1.00      0.25      0.40         4
                                  parody       1.00      0.50      0.66       209
                          patriotic film       0.00      0.00      0.00         2
                           period horror       0.00      0.00      0.00         1
                            period piece       0.95      0.50      0.66       318
                              pinku eiga       1.00      0.17      0.29         6
                                  plague       0.00      0.00      0.00         2
                      point of view shot       0.00      0.00      0.00         0
                        political cinema       0.97      0.60      0.74        47
                    political documetary       1.00      1.00      1.00         2
                         political drama       0.98      0.51      0.67       117
                        political satire       1.00      0.63      0.77        27
                      political thriller       1.00      0.59      0.75        37
                      pornographic movie       1.00      0.41      0.59        29
                             pornography       1.00      1.00      1.00         1
                                pre-code       1.00      0.41      0.58        27
                                  prison       1.00      0.29      0.44        28
                           prison escape       0.00      0.00      0.00         0
                             prison film       1.00      0.60      0.75         5
                private military company       1.00      1.00      1.00         1
                         propaganda film       1.00      0.45      0.62        29
                            psycho-biddy       1.00      1.00      1.00         3
                    psychological horror       0.00      0.00      0.00         0
                  psychological thriller       0.97      0.51      0.67       292
                               punk rock       1.00      0.33      0.50         3
                              race movie       0.00      0.00      0.00         2
                                  reboot       1.00      0.60      0.75         5
                          religious film       1.00      0.42      0.59        12
                                  remake       1.00      0.38      0.56        26
                                 revenge       1.00      1.00      1.00         1
                  revisionist fairy tale       0.00      0.00      0.00         0
                     revisionist western       1.00      0.65      0.79        17
                              road movie       1.00      0.52      0.68        73
                             road-horror       1.00      0.75      0.86         4
             roadshow theatrical release       1.00      0.50      0.67         6
                          roadshow/carny       1.00      1.00      1.00         1
                            rockumentary       1.00      0.27      0.43        11
                            romance film       0.80      0.64      0.71      1672
                         romantic comedy       0.89      0.55      0.68       524
                          romantic drama       0.86      0.51      0.64       603
                        romantic fantasy       1.00      0.33      0.50        12
                       romantic thriller       0.00      0.00      0.00         0
                          samurai cinema       1.00      0.64      0.78        11
                                  satire       1.00      0.46      0.63       151
                            school story       0.00      0.00      0.00         1
          sci fi pictures original films       0.00      0.00      0.00         0
                        sci-fi adventure       1.00      0.50      0.67         6
                           sci-fi horror       1.00      0.50      0.67        34
                         sci-fi thriller       0.00      0.00      0.00         0
                         science fiction       0.92      0.76      0.83       560
                 science fiction western       0.00      0.00      0.00         1
                        screwball comedy       1.00      0.36      0.53        58
                              sex comedy       1.00      0.57      0.73        70
                           sexploitation       1.00      0.44      0.61        25
                              short film       0.95      0.71      0.81       833
                             silent film       0.96      0.51      0.67       325
                    silhouette animation       1.00      1.00      1.00         1
                          singing cowboy       1.00      1.00      1.00         2
                               slapstick       0.95      0.44      0.60       131
                                 slasher       0.97      0.63      0.76       142
                     slice of life story       1.00      0.68      0.81        28
                           social issues       1.00      0.39      0.56        28
                     social problem film       1.00      0.39      0.56        31
                           softcore porn       1.00      0.30      0.46        10
                             space opera       0.00      0.00      0.00         2
                           space western       1.00      0.50      0.67         4
                       spaghetti western       1.00      0.25      0.40        16
                           splatter film       1.00      0.67      0.80         6
                          sponsored film       1.00      0.25      0.40         4
                                  sports       0.95      0.64      0.77       166
                                     spy       0.94      0.58      0.72        84
                         stand-up comedy       0.00      0.00      0.00         1
                            star vehicle       0.00      0.00      0.00         0
                          statutory rape       0.00      0.00      0.00         0
                               steampunk       1.00      0.33      0.50         3
                             stoner film       1.00      0.58      0.74        12
                             stop motion       1.00      0.60      0.75        25
                               superhero       0.92      0.48      0.63        23
                         superhero movie       0.94      0.54      0.69        61
                        supermarionation       0.00      0.00      0.00         0
                            supernatural       0.98      0.61      0.75       160
                              surrealism       1.00      0.33      0.50        15
                                suspense       0.98      0.53      0.69       172
                      swashbuckler films       1.00      0.56      0.72        25
                        sword and sandal       1.00      0.60      0.75         5
                       sword and sorcery       1.00      0.57      0.73         7
                 sword and sorcery films       1.00      0.67      0.80         9
                            tamil cinema       1.00      0.28      0.43        18
                                    teen       0.91      0.55      0.69       196
                        television movie       1.00      0.57      0.73       166
         the netherlands in world war ii       0.00      0.00      0.00         0
                          therimin music       0.00      0.00      0.00         1
                                thriller       0.82      0.70      0.75      1600
                             time travel       1.00      0.76      0.87        17
                               tokusatsu       0.00      0.00      0.00         2
                               tollywood       1.00      0.78      0.88         9
                                 tragedy       1.00      0.62      0.76        21
                             tragicomedy       1.00      0.47      0.64        15
                                  travel       1.00      0.25      0.40         4
                          vampire movies       0.00      0.00      0.00         2
                              war effort       1.00      1.00      1.00         2
                                war film       0.88      0.72      0.79       404
                        werewolf fiction       1.00      1.00      1.00         1
                                 western       0.95      0.77      0.85       242
                                whodunit       1.00      0.17      0.29        12
                   women in prison films       1.00      0.33      0.50         6
                        workplace comedy       1.00      0.43      0.60        14
                            world cinema       0.81      0.58      0.68      1196
                           world history       1.00      0.50      0.67         4
                                   wuxia       1.00      0.40      0.57        25
                                 z movie       0.00      0.00      0.00         0
                             zombie film       1.00      0.76      0.87        55

                             avg / total       0.89      0.63      0.73     37717

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning:

Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning:

Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.

(This actually looks pretty good. Go figure...)

What's the performance without stratification of the train/test fold?

(For this, we need to resinstantiate the class and retrain.)


In [16]:
mg_new = MovieGenres(stratify_split=False).load().train()
print(mg_new.report())


/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning:

Label not 308 is present in all training examples.

                                          precision    recall  f1-score   support

                               absurdism       0.00      0.00      0.00        25
                            acid western       0.00      0.00      0.00         2
                                  action       0.58      0.38      0.46      1439
                           action comedy       0.00      0.00      0.00        25
                        action thrillers       0.75      0.03      0.05       106
                        action/adventure       0.52      0.25      0.34       844
                         addiction drama       0.00      0.00      0.00         8
                                   adult       0.00      0.00      0.00        26
                               adventure       0.60      0.25      0.35       839
                        adventure comedy       0.00      0.00      0.00        29
                  airplanes and airports       0.00      0.00      0.00        12
                             albino bias       0.00      0.00      0.00         1
                              alien film       0.00      0.00      0.00        25
                          alien invasion       0.00      0.00      0.00         1
                               americana       0.00      0.00      0.00        22
                          animal picture       1.00      0.03      0.07        29
                                 animals       0.00      0.00      0.00         7
                        animated cartoon       0.00      0.00      0.00        30
                        animated musical       0.00      0.00      0.00        11
                               animation       0.77      0.42      0.54       639
                                   anime       0.73      0.11      0.20        70
                               anthology       0.00      0.00      0.00         1
                            anthropology       0.00      0.00      0.00         0
                                anti-war       0.00      0.00      0.00        10
                           anti-war film       0.00      0.00      0.00        11
apocalyptic and post-apocalyptic fiction       0.00      0.00      0.00         5
                             archaeology       1.00      0.67      0.80         3
                    archives and records       0.00      0.00      0.00         2
                                art film       0.00      0.00      0.00        82
                             auto racing       0.00      0.00      0.00        11
                             avant-garde       0.00      0.00      0.00        16
                                 b-movie       0.50      0.01      0.02       105
                               b-western       0.00      0.00      0.00        10
                       backstage musical       0.00      0.00      0.00         3
                                baseball       0.00      0.00      0.00         3
                              beach film       0.50      0.33      0.40         3
                        beach party film       0.00      0.00      0.00         0
                          bengali cinema       0.00      0.00      0.00         4
                              biker film       0.00      0.00      0.00         5
                       biographical film       0.25      0.01      0.01       151
                               biography       0.67      0.01      0.03       154
                          biopic feature       0.00      0.00      0.00        82
                            black comedy       0.50      0.00      0.01       220
                         black-and-white       0.63      0.23      0.34       946
                          blaxploitation       0.00      0.00      0.00        22
                bloopers & candid camera       0.00      0.00      0.00         1
                               bollywood       0.40      0.16      0.23       274
                                  boxing       0.50      0.11      0.18         9
                     british empire film       0.00      0.00      0.00         7
                        british new wave       0.00      0.00      0.00         5
                         bruceploitation       0.00      0.00      0.00         0
                               buddy cop       0.00      0.00      0.00         1
                              buddy film       0.00      0.00      0.00        85
                                business       0.00      0.00      0.00         1
                                    camp       0.00      0.00      0.00         1
                             caper story       0.00      0.00      0.00        20
                            cavalry film       0.00      0.00      0.00         2
                             chase movie       0.00      0.00      0.00        28
                         childhood drama       0.00      0.00      0.00        27
                              children's       0.25      0.02      0.03       122
                children's entertainment       0.00      0.00      0.00         3
                      children's fantasy       0.50      0.02      0.03        64
                       children's/family       0.53      0.04      0.08       202
                          chinese movies       0.79      0.53      0.64       244
                          christian film       1.00      0.07      0.13        29
                         christmas movie       0.50      0.06      0.11        16
                          clay animation       0.00      0.00      0.00         0
                                cold war       0.00      0.00      0.00         3
                            combat films       0.00      0.00      0.00        28
                                  comedy       0.59      0.44      0.50      2553
                             comedy film       0.28      0.02      0.04       430
                           comedy horror       0.00      0.00      0.00         5
                        comedy of errors       0.00      0.00      0.00        37
                       comedy of manners       0.00      0.00      0.00        61
                         comedy thriller       0.00      0.00      0.00        16
                          comedy western       0.00      0.00      0.00        15
                            comedy-drama       0.17      0.01      0.01       313
                           coming of age       0.50      0.02      0.04       173
                      coming-of-age film       0.00      0.00      0.00         0
                      computer animation       1.00      0.03      0.05        40
                               computers       0.00      0.00      0.00         2
                            concert film       0.00      0.00      0.00         3
                      conspiracy fiction       0.00      0.00      0.00         0
                       costume adventure       0.00      0.00      0.00        17
                           costume drama       0.67      0.09      0.16        86
                          costume horror       0.50      0.08      0.13        13
                        courtroom comedy       0.00      0.00      0.00         0
                         courtroom drama       0.00      0.00      0.00        48
                           creature film       0.44      0.10      0.17        78
                                   crime       0.00      0.00      0.00         5
                            crime comedy       0.00      0.00      0.00        51
                             crime drama       1.00      0.01      0.02       109
                           crime fiction       0.56      0.30      0.39      1052
                          crime thriller       0.42      0.07      0.12       440
                                    cult       0.81      0.08      0.15       158
                       culture & society       0.00      0.00      0.00        59
                               cyberpunk       0.00      0.00      0.00         1
                   czechoslovak new wave       0.00      0.00      0.00         3
                                   dance       0.50      0.10      0.16        21
                           demonic child       1.00      0.50      0.67         2
                               detective       0.33      0.01      0.03        69
                       detective fiction       0.33      0.02      0.04        50
                                disaster       1.00      0.09      0.17        54
                               docudrama       0.00      0.00      0.00        57
                             documentary       0.84      0.45      0.59       298
                                dogme 95       0.00      0.00      0.00         2
                         domestic comedy       0.00      0.00      0.00        47
                           doomsday film       1.00      0.05      0.09        22
                                   drama       0.67      0.67      0.67      4795
                                dystopia       0.00      0.00      0.00        23
                         ealing comedies       0.00      0.00      0.00         2
                      early black cinema       0.00      0.00      0.00         1
                               education       0.00      0.00      0.00         0
                             educational       0.00      0.00      0.00         5
                           ensemble film       0.00      0.00      0.00        99
                   environmental science       0.00      0.00      0.00         0
                                    epic       0.33      0.02      0.04        48
                            epic western       0.00      0.00      0.00         5
                            erotic drama       0.00      0.00      0.00        28
                         erotic thriller       0.00      0.00      0.00        44
                                 erotica       1.00      0.03      0.06        69
                             escape film       0.00      0.00      0.00        10
                              essay film       0.00      0.00      0.00         3
                          existentialism       0.00      0.00      0.00         9
                       experimental film       0.00      0.00      0.00        22
                            exploitation       0.00      0.00      0.00         1
                           expressionism       0.00      0.00      0.00         1
                          extreme sports       0.00      0.00      0.00         3
                              fairy tale       1.00      0.11      0.20         9
         family & personal relationships       0.00      0.00      0.00         6
                            family drama       0.33      0.02      0.03       175
                             family film       0.70      0.32      0.44       853
               family-oriented adventure       0.50      0.02      0.03        56
                                fan film       0.00      0.00      0.00         4
                                 fantasy       0.54      0.18      0.27       528
                       fantasy adventure       0.33      0.02      0.04        50
                          fantasy comedy       0.00      0.00      0.00        52
                           fantasy drama       0.00      0.00      0.00         8
                            feature film       0.00      0.00      0.00         3
                       female buddy film       0.00      0.00      0.00         0
                           feminist film       0.00      0.00      0.00        18
                          fictional film       0.00      0.00      0.00         2
                                filipino       0.00      0.00      0.00         2
                         filipino movies       0.67      0.04      0.07        56
                                    film       0.00      0.00      0.00         4
               film & television history       0.00      0.00      0.00         2
                        film \u00e0 clef       0.00      0.00      0.00         8
                         film adaptation       0.27      0.01      0.02       296
                               film noir       1.00      0.01      0.03       138
                              film-opera       0.00      0.00      0.00         0
                             filmed play       0.00      0.00      0.00         0
                     finance & investing       0.00      0.00      0.00         0
                          foreign legion       0.00      0.00      0.00         0
                             future noir       0.00      0.00      0.00        11
                           gangster film       0.60      0.04      0.07        81
                                     gay       0.00      0.00      0.00        52
                            gay interest       0.00      0.00      0.00        51
                         gay pornography       0.00      0.00      0.00         6
                              gay themed       0.50      0.03      0.06        62
                           gender issues       0.00      0.00      0.00         4
                                  giallo       0.00      0.00      0.00         2
                     glamorized spy film       0.80      0.31      0.44        13
                              goat gland       0.00      0.00      0.00         2
                             gothic film       0.00      0.00      0.00        19
                  graphic & applied arts       0.00      0.00      0.00         1
                               gross out       0.00      0.00      0.00        11
                          gross-out film       0.00      0.00      0.00        11
                                gulf war       0.00      0.00      0.00         1
                             hagiography       0.00      0.00      0.00         9
                    hardcore pornography       0.00      0.00      0.00         1
                      haunted house film       0.00      0.00      0.00        22
                        health & fitness       0.00      0.00      0.00         0
               heaven-can-wait fantasies       0.00      0.00      0.00         7
                         heavenly comedy       0.00      0.00      0.00         6
                                   heist       0.00      0.00      0.00        36
                          hip hop movies       0.00      0.00      0.00         3
                historical documentaries       0.00      0.00      0.00         1
                        historical drama       0.00      0.00      0.00        33
                         historical epic       0.00      0.00      0.00        18
                      historical fiction       1.00      0.01      0.03        75
                                 history       0.00      0.00      0.00        87
                            holiday film       0.50      0.03      0.06        29
                                  horror       0.82      0.60      0.70      1057
                           horror comedy       1.00      0.02      0.04        53
                            horse racing       0.00      0.00      0.00         1
                                  humour       0.00      0.00      0.00         2
                          hybrid western       0.00      0.00      0.00         3
                illnesses & disabilities       0.00      0.00      0.00         5
                          indian western       0.00      0.00      0.00         4
                                   indie       0.32      0.05      0.09       993
                     inspirational drama       0.00      0.00      0.00         7
                      instrumental music       0.00      0.00      0.00         2
             interpersonal relationships       0.00      0.00      0.00         9
                inventions & innovations       0.00      0.00      0.00         2
                         japanese movies       0.75      0.34      0.46       307
                              journalism       0.00      0.00      0.00         1
                         jukebox musical       0.00      0.00      0.00         2
                             jungle film       0.71      0.29      0.42        17
               juvenile delinquency film       0.00      0.00      0.00        13
                              kafkaesque       0.00      0.00      0.00         4
                    kitchen sink realism       0.00      0.00      0.00         6
                   language & literature       0.00      0.00      0.00         3
                                  latino       0.00      0.00      0.00         1
                             law & crime       0.00      0.00      0.00         4
                             legal drama       0.00      0.00      0.00         2
                                    lgbt       0.81      0.15      0.25       197
                libraries and librarians       0.00      0.00      0.00         2
                             live action       0.00      0.00      0.00         0
                        malayalam cinema       0.00      0.00      0.00         3
                          marriage drama       0.00      0.00      0.00        39
                       martial arts film       0.70      0.36      0.47       160
                   master criminal films       0.00      0.00      0.00         7
                            media satire       0.00      0.00      0.00        18
                           media studies       0.00      0.00      0.00         3
                         medical fiction       0.00      0.00      0.00         9
                               melodrama       0.00      0.00      0.00       137
                            mockumentary       0.00      0.00      0.00        15
                              mondo film       0.00      0.00      0.00         0
                                 monster       0.83      0.45      0.59        22
                           monster movie       0.67      0.04      0.08        46
                            movie serial       0.00      0.00      0.00         2
                 movies about gladiators       0.00      0.00      0.00         2
                              mumblecore       0.00      0.00      0.00         2
                                   music       0.29      0.02      0.04        86
                                 musical       0.47      0.13      0.20       640
                          musical comedy       0.00      0.00      0.00        48
                           musical drama       0.00      0.00      0.00        35
                                 mystery       0.52      0.14      0.22       564
                    mythological fantasy       0.00      0.00      0.00        10
                        natural disaster       0.00      0.00      0.00         4
                    natural horror films       1.00      0.05      0.09        44
                                  nature       0.00      0.00      0.00         6
                                neo-noir       0.00      0.00      0.00        18
                           new hollywood       0.00      0.00      0.00        24
                                    news       0.00      0.00      0.00         2
                                northern       0.00      0.00      0.00         3
                         nuclear warfare       0.00      0.00      0.00         0
                                operetta       0.00      0.00      0.00         2
                                  outlaw       0.00      0.00      0.00         2
                       outlaw biker film       0.00      0.00      0.00         3
              parkour in popular culture       0.00      0.00      0.00         1
                                  parody       0.50      0.01      0.01       194
                           period horror       0.00      0.00      0.00         1
                            period piece       0.32      0.02      0.03       334
                              pinku eiga       0.00      0.00      0.00        10
                                  plague       0.00      0.00      0.00         0
                      point of view shot       0.00      0.00      0.00         2
                        political cinema       0.00      0.00      0.00        36
                         political drama       0.33      0.01      0.02       123
                        political satire       0.00      0.00      0.00        17
                      political thriller       0.00      0.00      0.00        43
                      pornographic movie       1.00      0.05      0.10        38
                             pornography       0.00      0.00      0.00         1
                                pre-code       0.00      0.00      0.00        33
                                  prison       0.50      0.05      0.09        20
                           prison escape       0.00      0.00      0.00         1
                             prison film       0.00      0.00      0.00         4
                private military company       0.00      0.00      0.00         0
                         propaganda film       0.00      0.00      0.00        19
                            psycho-biddy       0.00      0.00      0.00         2
                    psychological horror       0.00      0.00      0.00         0
                  psychological thriller       0.30      0.02      0.04       290
                               punk rock       0.00      0.00      0.00         8
                              race movie       0.00      0.00      0.00         0
                                  reboot       0.00      0.00      0.00         1
                          religious film       0.00      0.00      0.00        15
                                  remake       0.00      0.00      0.00        24
                                 revenge       0.00      0.00      0.00         0
                  revisionist fairy tale       0.00      0.00      0.00         0
                     revisionist western       0.00      0.00      0.00        20
                              road movie       0.00      0.00      0.00        69
                             road-horror       0.00      0.00      0.00         3
             roadshow theatrical release       0.00      0.00      0.00         9
                          roadshow/carny       0.00      0.00      0.00         0
                            rockumentary       0.00      0.00      0.00        12
                            romance film       0.51      0.29      0.37      1683
                         romantic comedy       0.59      0.12      0.20       527
                          romantic drama       0.43      0.07      0.13       694
                        romantic fantasy       0.00      0.00      0.00        13
                          samurai cinema       0.00      0.00      0.00         8
                                  satire       0.00      0.00      0.00       146
                            school story       0.00      0.00      0.00         2
          sci fi pictures original films       0.00      0.00      0.00         0
                        sci-fi adventure       0.00      0.00      0.00        13
                           sci-fi horror       0.00      0.00      0.00        33
                         sci-fi thriller       0.00      0.00      0.00         3
                         science fiction       0.78      0.46      0.58       620
                 science fiction western       0.00      0.00      0.00         1
                        screwball comedy       0.00      0.00      0.00        57
                              sex comedy       1.00      0.02      0.03        66
                           sexploitation       0.00      0.00      0.00        17
                              short film       0.79      0.47      0.59       757
                             silent film       0.78      0.14      0.24       302
                          singing cowboy       0.00      0.00      0.00         0
                               slapstick       0.50      0.05      0.09       116
                                 slasher       0.90      0.16      0.27       173
                     slice of life story       0.00      0.00      0.00        30
                           social issues       1.00      0.04      0.07        27
                     social problem film       0.00      0.00      0.00        28
                           softcore porn       0.00      0.00      0.00        14
                             space opera       0.00      0.00      0.00         1
                           space western       0.00      0.00      0.00         3
                       spaghetti western       1.00      0.07      0.12        15
                           splatter film       0.00      0.00      0.00         7
                          sponsored film       0.00      0.00      0.00         1
                                  sports       0.73      0.32      0.44       169
                                     spy       0.63      0.13      0.21        95
                         stand-up comedy       0.00      0.00      0.00         4
                            star vehicle       0.00      0.00      0.00         2
                               steampunk       0.00      0.00      0.00         7
                             stoner film       0.00      0.00      0.00        10
                             stop motion       1.00      0.03      0.06        31
                               superhero       0.50      0.08      0.14        25
                         superhero movie       0.72      0.21      0.32        63
                        supermarionation       0.00      0.00      0.00         0
                            supernatural       0.71      0.07      0.12       149
                              surrealism       0.00      0.00      0.00        22
                                suspense       0.50      0.01      0.01       152
                      swashbuckler films       0.67      0.17      0.27        24
                        sword and sandal       0.00      0.00      0.00         5
                       sword and sorcery       0.00      0.00      0.00        10
                 sword and sorcery films       0.50      0.11      0.18         9
                            tamil cinema       0.00      0.00      0.00        17
                                    teen       0.54      0.11      0.18       199
                        television movie       0.00      0.00      0.00       157
         the netherlands in world war ii       0.00      0.00      0.00         0
                          therimin music       0.00      0.00      0.00         1
                                thriller       0.57      0.37      0.45      1633
                             time travel       0.00      0.00      0.00        30
                               tokusatsu       0.00      0.00      0.00         0
                               tollywood       0.00      0.00      0.00         6
                                 tragedy       0.00      0.00      0.00        13
                             tragicomedy       0.00      0.00      0.00         8
                                  travel       0.00      0.00      0.00         5
                          vampire movies       0.00      0.00      0.00         1
                                war film       0.75      0.46      0.57       380
                        werewolf fiction       0.00      0.00      0.00         0
                                 western       0.86      0.43      0.58       237
                                whodunit       0.00      0.00      0.00        14
                   women in prison films       0.00      0.00      0.00         5
                        workplace comedy       0.00      0.00      0.00        23
                            world cinema       0.45      0.20      0.27      1312
                           world history       0.00      0.00      0.00         6
                                   wuxia       1.00      0.05      0.09        21
                                 z movie       0.00      0.00      0.00         0
                             zombie film       0.90      0.29      0.44        66

                             avg / total       0.54      0.28      0.34     37944

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning:

Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.

/Users/epfahl/anaconda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning:

Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.

(Yeah, definitely not as good.)