tutorial-node-classification/README-MH.qmd at main · yfiua/tutorial-node-classification · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: Tutorial on node classification in networks
author:
  - name:
      given: Jun
      family: Sun
    email: jun.sun@gesis.org
    orcid: 0000-0002-4789-7316
    affiliations:
      - name: "GESIS - Leibniz Institute for the Social Sciences"
csl: apa.csl
format:
  html: default
  ipynb: default
license: Apache-2.0
---


# Tutorial on node classification in networks

## Learning goals

* Understand the task of node classification in networks
* Learn how to prepare network data for node classification tasks
* Learn how to perform node classification using NetworkX
* Learn how to perform node classification using deep learning
* Learn how to evaluate node classification results

## Target audience

This tutorial is aimed at social scientists and data scientists who are interested in analyzing network data. The tutorial is suitable for beginners and intermediate learners
with some knowledge in Python and supervised machine learning, such as classification tasks.

## Social science use cases

Node classification is a common task in social network analysis, aiming to predict the labels of nodes within a network using its structure and properties. Node classification is essential in various applications in computational social science, including social network analysis, role analysis, and trust analysis.
For example, in a citation network, we can classify research papers into different research fields. In a social network, we can classify users into different groups based on their social interactions.

## Estimated duration

The tutorial is designed to be completed in 4-5 hours. The duration may vary depending on the prior knowledge of the participants.
The exercises are optional and can be done after the tutorial, which may take 2-3 hours.
The entire content can be covered in a full-day workshop.

## Setting up the computational environment

### Hardware
The tutorial can be run on a laptop or desktop computer.
A CPU is sufficient for completing the content in the tutorial and exercises.
For large datasets and deep learning models, a GPU is recommended.

### Software
Python >= 3.9 is required. In addition, the necessary packages are listed in [requirements.txt](requirements.txt). Using `pip` as an example, install the packages with

```sh
pip install -r requirements.txt
```

### Datasets
Run `prep.sh` to download some datasets that are used in the tutorial. Alternatively you can download them manually during the tutorial.

```sh
./prep.sh
```

## Table of contents

### [Introduction: Node classification in networks](introduction.md)

### [Airport classification Using NetworkX](classification-nx-airports.ipynb)

### [Twitch user classification using PyTorch Geometric (PyG)](classification-dl-twitch.ipynb)

### Exercise

* [Exercise: Classification of research papers in a citation network](exercise.md)
* [Solution 1: With NetworkX](classification-nx-cora.ipynb)
* [Solution 2: With PyG](classification-dl-cora.ipynb)

### [Supplementary material: Import a network in PyG](supplementary.ipynb)

## Author
[Dr. Jun Sun](https://github.com/yfiua) @ GESIS