The Network of Climate-Weather related Wikipedia pages

Moses Boudourides
4 min readNov 24, 2019

--

In one of the projects at the Networks course that I was teaching at New York University Abu Dhabi in Fall 2019, my students have explored the network of Climate-Weather related Wikipedia pages.

Starting from the following three Wikipedia pages:

and using the Python module wikipedia 1.4.0, we have collected Wikipedia pages originating from the above three Climate-Weather Wikipedia pages:

  • 495 pages from ‘Global Warming’ (redirected from ‘Climate Change’),
  • 1140 pages from ‘Carbon Dioxide’, and
  • 290 pages from ‘Weather’.

The number of common pages hyperlinked by ‘Global Warming’ and ‘Weather’ was 41. The number of common pages hyperlinked by ‘Carbon dioxide’ and ‘Weather’ was 24. The number of common pages hyperlinked by ‘Global warming’, ‘Carbon dioxide’ and ‘Weather’ was 16. The number of common pages hyperlinked by ‘Global Warming’ and ‘Carbon dioxide’ was 207.

Moreover, there exist 179,104 hyperlinks among the pages starting from all the Climate-Weather originating Wikipedia pages. Therefore, this network has 1,669 nodes (webpages) and 179,104 edges (hyperlinks).

The Climate-Weather graph is a simple, unweighted directed graph (without any isolates), it is not strongly connected, it has 32 strongly connected components, and it is weakly connected. Its density is 0.064, its transitivity is 0.543, and its reciprocity is 0.615. What the three last numbers indicate is that the Climate-Weather graph is a sparse graph, with a moderate number of triangles, and rather high percentage of reciprocating links.

The following is a plot of the Climate-Weather graph:

The largest strongly connected component of the Global Warming & Weather graph has 1637 nodes and 177095 edges. Its density is 0.066, its transitivity is 0.551, and its reciprocity is 0.622. The percentage of pages from the whole graph included in the largest strongly connected component is 0.981, and the percentage of links from the whole graph included in the largest strongly connected component is 0.989. This means that the biggest part of the Global Warming & Weather graph is included in its largest connected component. The diameter of this graph is 6. This means that the maximum geodesic distance between any two webpages in the graph is 6.

The following is the plot of the largest strongly connected component of the Global Warming and Weather graph:

Coloring nodes (pages) according to where they originate from, we get the following four categories:

  • Pages hyperlinked only from the “Global Warming” starting page (dark sea green)
  • Pages hyperlinked only from the “Carbon Dioxide” starting page (salmon)
  • Pages hyperlinked only from the “Weather” starting page (pink)
  • Pages hyperlinked by at least two starting pages (pale turquoise)

The assortativity coefficient of the Climate-Weather network is equal to 0.689743, with regards to the attribute of the origin of nodes (pages) from the above four categories. Apparently, the assortativity coefficient is sufficiently high meaning that pages from a category tend to be hyperlinked to other pages from the same category.

If we group the four categories of nodes (pages) inside separate blocks, the plot of the Climate-Weather network is the following:

Finally, we are considering four separate egocentric subnetworks of Climate-Weather around geothermal, acid, bacteria, and volcano. Notice that the color of nodes (pages) corresponds to one of the above four categories of origin and the size is proportional to the in-degree of the node (page).

--

--