As part of course Topics in Digital Humanities we decided to focus in Israeli cinema industry since country establishment (1948) until today (2019). Our topic to explore was mapping social network of every involved people in this industry - from directors and actors to soundmen and cinematographers.
A connection between two entities is a transitive relation. Connection means x related to y If and only if x,y took part in same movie.
We extracted the data for the social network from cinemaofisrael site by a web crawler writen in JavaScript. The data has expanded by Wikidata in order to create a wide and comprehensive MongoDB database (database details below).
Web crawler files:
Crawlers and Database integation:
All the connections in Israeli cinema industry has mapped from our database to massive social network graph which saved in Json format.
For discover connection between two cinema entities, use find_connection("json_graph.json", name1, name2) from this file. To build and handle the graphs we use NetworkX library.
For discover more than one connection, use find_all_connections(graph_file, name1, name2, maximal_length).
cinema-of-israel-db with MongoDB. Includes two collections:
- movies: 1021 movie records. You can find information (Hebrew) like: cast, characters, brief, years et cetera.
- persons: 16352 cinema entity records. You can find information like: gender, years, acting career et cetera.
To restore DB: install mongodb server and then from project directory (tdh192) type in console 'mongorestore --db cinema-of-israel-db ./db-backup/cinema-of-israel-db'.
-
find_connection:
Input: name of json graph file to restore X first entity name (source) X second entity name (target).
Output: connection information as dictionary, connection chart (a path graph) between name1 and name2 (if exists).
This function restore relations graph from json, find and display connection between name1 and name2 (if exists). For discover connection between two cinema entities, usefind_connection("json_graph.json", name1, name2).
Source and target nodes will be in red color. Each edge specipies the movie which connect the entities. -
find_all_connections:
Input: name of json graph file to restore X first entity name (source) X second entity name (target) X maximal connection length.
Output: array icludes information of each connection which has found, connection chart for each connection which has found.
Similar to find_connection, but this function find all connections between name1 and name2 (if exists) with at most maximal_length connecting movies (edges).
Call this query byfind_all_connections(graph_file, name1, name2, maximal_length).
Queries from this file.
Examples:
find_connection("json_graph.json", "גיל רוזנטל", "פיני טבגר")
Output:
find_all_connections("json_graph.json", "גיל רוזנטל", "פיני טבגר", 6)
Output:

