Tobias Berggren Jensen
Feb 15 · 14 min read

Authors: Aysha Irgens, Kristoffer Pehrson, Anne-Louise Dyrbing, Britt Vang, Tobias Berggren Jensen

We aim to investigate the controversy awakened by the company Area9, who launched an application designed for vulnerable kids in primary school; the application offers the kids a small amount of money for every completed school task. In our opinion, a controversy has emerged; on one hand there are the ones who believe in the potential of implementing digital devices in school; on the other the ones who criticized the application with a strong belief that this is bad for kids’ learning: Debatten dr (2019).

In order to map this controversy we established a list of issues, that we wish to investigate through our network in Gephi, namely:

  • How can digital solutions help and motivate vulnerable kids to solve their school tasks?
  • Which influence will digital learning tools cause to the psychosocial abilities of students?
  • Which challenges and possibilities does the implementation of digital solutions cause to school kids?
  • How will the classroom change in the implementation of new digital technologies?

Harvesting a dataset of Wikipedia pages

Figure 1. Our protocol, showing the different steps, of how we harvested our data to use in different software-platforms to create networks, graphs and timelines.

Firstly, we ran a script in order to extract all ‘in-text’ links from the Wikipedia category ‘Educational Technology’. Running the script with a crawl depth on 1, we harvested 1381 pages, which derived from the category’s pages (level 0) and the sub-categories’ pages (level 1). By running a script that scrapes the HTML code of all the pages, it was feasible to create a network containing, only, the pages connected by the ‘in-text’ links.

Secondly, we gathered the full text from all the pages, in order to search for the keywords: ADHD, classroom environment, grades, motivation, social and stress, which were integrated in our network, for us to interpret the relations between the pages, the keywords and the issues.

Lastly, a timeline was created, starting from 2014–01–01, for us to see when changes were/are made on the pages, and thereby interpret these changes.

A network of Wikipedia pages related to each other

The network of ‘Educational Technology’ are visualized in Gephi, a software for network visualization and analysis, where we are able to interpret the network of pages, which we have tried to get a grasp of through different clusters.

In Gephi, the clusters are divided by modularity, to make it easier to interpret the appearing clusters and already we see a differentiation between the theme within the clusters.
We got clusters that focuses on online education and distance education and learning, while others are focusing on technological devices and hardware in various ways.
Also we see clusters that are out of our interest as the cluster labelled Assistive Technology, indicates the use of technology for disabled people.

Figure 2. Network of all gathered Wikipedia pages and their relation to each other.

Our interpretation of six different clusters within the network

Through a keyword search on 6 different school related terms we were able to see changes in our Wikipedia network. We could hereby separately interpret how and why our keywords were related to other keywords in the network, or if some of the 6 keywords showed same pattern. The 6 keywords represent themes we were interested to explore and that we presumed could tell us something either related to educational apps like Area9, or something that could indicate controversies or detect specific scientific disciplines related to the keywords.

However, one result of this was that we experienced that some of our words were too specific (ADHD), therefore showing less mentions in our Wikipedia network, or opposite some were too generic (motivation). Nevertheless the keyword search indicates that some words and hereby sub-themes are more represented in scientific research within ‘Educational technology’ and thus illuminates gaps within certain areas of Educational technology’.

Figure 3. 6 derivatives showing different networks within each keyword.

A semantic analysis of noun-phrases of Wiki data

By using CorTexT we are able to interpret how the debate is articulated on our different Wikipedia pages. We aim to find the 100 most articulated nouns-phrases on our 1381 pages which produces a network. In this way we are able to discover discursive differences below the generic level in our dataset — meaning that we can map how the terms in our wiki pages occur together (co-occuring) in the same pages. Co-occurrences indicate how many times a term is found together with other terms, found in the same wiki page. When a term is very unevenly distributed across all Wikipedia pages it gets a high specificity core.

The specificity terms selected by CorTexT, enables us to detect specific discourses within certain clusters, thus help us to detect relevance for our investigation. For example in this network we see a wider disperse between the topics, such as Assistive Technology, which was tightly clustered in the previous network. Although Assistive Technology belongs to our category, we can see that it might not be a relevant topic for us, as they do not share the same interests or use the same terms as the clusters on the left side.

It seems like the three clusters on the left side are closely related in their use of language on their pages, using a lot of school or student related terms — which shows us that we might investigate this area according to our chosen controversy.

Figure 4. A co-occurrence network containing the extracted noun-phrases within our Wikipedia pages.

Timelines for two pages found in our network of pages

In order for us to create a timeline, with the visualization tool RawGraphs, we used a script that extract the revision history of both our chosen Wikipedia pages, found in our network of pages, namely: M-Learning and Learning Management System. We chose M-Learning since it is the term Area9 uses for their learning application. As the user revisions on the pages gives us an overview of the development of the pages, we can see that:

  1. M-Learning has not changed radically over the last couple of years, but changed a lot in the beginning of 2014, and probably also in the end of 2013.
  2. Learning Management System are actually in constant change, but are especially peeking in the beginning of 2015 and in the middle of 2017. This may be due to the great development in school regi, where you make more use of systems and devices in teaching.

The timeline may be an indicator of disagreement within the editing community, or it may just be that new knowledge are generated.

Figure 5. The timeline for M-Learning.
Figure 6. The timeline for learning management system.

Introduction to our further research within the topic ‘Educational Technology’

From our previous investigation we discovered that a controversy within the field of ‘Educational Technology’, more specifically, the application launched by Area 9 Lyceum (a tech startup) are hard to find within Wikipedia pages. Since ‘Educational technology’ is a generic word for our field of interest: technology in school, and the Area 9 Lyceum application is launched in 2018, we decided to change our research strategy. Our aim is to investigate if ‘Educational Technology’ provides any learning outcomes for student in primary school. Therefore we want to conduct the scientific articles regarding the topic, ‘Educational Technology’ and the title, learning outcome. This strategy enables us to map the scientific field of ‘Educational Technology’ and look into the effect, issues, controversies and different discourses academia might provide. We harvested the new data from the scientific platform, Scopus, where we were able to get an insight of how scientists are debating this topic, thus allowing us to compare our results to the findings of our Wikipedia research. By focusing on learning outcomes in our new search, we wanted to map a new dataset in five different ways:

  • Network of harvested author keywords from Scopus articles
  • Timelines and interpretations of our collected author keywords
  • Interpretation of our network of co-occurred terms in our collected articles
  • Indication of relevant noun-phrases and articles
  • Top 3 articles that are most typical for each cluster

To contextualize both the keywords and noun-phrases, we made a semantic analysis, which enabled us to relate syntactic structures. It was also feasible to identify the relationship to other words or phrases.

Figure 7. Our protocol, showing the different steps, of how we harvested our data from Scopus.

Network of harvested author keywords from Scopus articles

Our starting point is to harvest scientific articles from Scopus, a search engine containing articles within: life sciences, social sciences, physical sciences and health sciences. This database allows us to find articles within the domain of our chosen controversy. We have limited the search to: title, abstract and keywords with a word combination of: “educational technolog*” AND “learning outcom*”

After harvesting our dataset of a total of 1028 articles, we created a network in Gephi to visualize the relation between the authors’ keywords. To limit our network, to the most relevant nodes, we have chosen to remove all nodes that only occurred 1 time; also we removed ‘educational technology’, ‘higher education’ and ‘e-learning’, which was the most centering nodes of the network, thus blocking the visualization of the network.

Although, the network is very spread, we are still able to identify different clusters in our network, that says something about the theme within each cluster originating from the authors’ keywords inside the aforementioned articles. The network shows us some relevant and interesting clusters in relation to our topic, such as ‘teaching/classroom environment’, ‘blended learning’ and ‘student life’, namely because they contain keywords that are related to our research interest. To limit our focus we have decided not to look into the cluster ‘medical education’, though its density, as we found it irrelevant to our investigation.

Beside the described clusters we have other nodes that does not belong to any cluster, but are a part of the network. They are spread out on the map, in some occasions as sub-clusters and between other clusters, which could be due to its very generic and broad words.

All of these un-clustered nodes are showing some relevant keywords, such as gamification, distance learning, but most of them are less important to our further investigation.

Figure 8. Network of all gathered keywords made by authors in each article — and their relation to each other.

Timelines of articles keywords

We are interested in locating the scientific field of the keyword ‘Educational Technology’ to get a picture, indicating the scientific development of publications in academia. This graph shows that though the author keyword ‘Educational Technology’ appeared the first time in 1994 and has since 2004 started to increase until 2018.

Figure 9. Volume of published papers each year.

Now we extracted the most commonly used keyword in our collected 1028 academic articles. This will indicate which research topic the field consist of, which we found useful for our further investigation. The timeline enables us, as the controversy mappers, to easily narrow the search for traces of dispute. The graph visualize author keywords, and hereby most written “themes” within ‘Educational Technology’ there has been researched year by year. The graph shows that the author keyword E-learning is one of the most used keywords (41 occurrences), not surprisingly this keyword did not occur as frequently in articles before 2005, because ‘Educational Technology’ was still relatively unexplored. The timeline only accounts for the year of publication and it does not take into account that many articles have been in working process over a long period. This might give an imperfect indication of when ‘Educational Technology’ became a growing scientific field.

Furthermore, ‘Educational Technology’, e-learning and online learning is mentioned the most in 2017 and 2018 and this arguably mirror the digitalisation of our society in general.

Figure 10. Timeline of author keywords and their occurrences.

A semantic analysis of noun-phrases of Scopus data

To explore and visualize the most articulated terms within the abstracts of our 1028 articles, we extracted noun-phrases through CorTexT, which mapped the phrases as networks based on their co-occurrence in our collected articles. This means that the terms are connected to other terms through their co-occurrence across the articles in the dataset, just like we described earlier with the Wiki pages. In the co-occurrence network, we used the same Scopus dataset with 1028 articles. We then harvested the 200 terms with the highest specificity core. Harvesting above this amount (we both tried with 300, and 500 terms), created a dense and unclear view of a “hairball network”; implicating that CoreTexT begins to select “more generic” or less relevant terms, because the specificity core decreases. We limited our search to collect the noun-phrases that are occurring at a minimum in 3 articles. If we compare this visualization to the noun-phrases in our Wikipedia network, we can see many similarities such as learning systems, universities, online plat learning platform etc. which obviously is relevant to both platforms. We find the noun-phrases in the Scopus network more specific and targeted to open our research and enable us to step deeper in to our field of interest. We are aware, that these noun-phrases solely are found in the abstracts, opposite our Wikipedia search looking for 100 most specified noun-phrases across the Wikipedia pages.

Figure 11. Co-occurrence network of noun-phrases; the 200 most specified noun-phrases within 1028 articles.

Our clusters seem to be more well connected in the center of the network and less connected in the outer clusters. We discovered that both the light green and purple cluster are relevant to our research, as we, through our “search word”, aimed to harvest articles that talks about learning difficulties, outcomes, activities, games and middle/elementary school according to our set issues. Therefore, the extracted noun phrases from these 2 clusters correlate with our field of interest.

Although these two clusters are relevant and are very much interlinked, we see that the green and orange clusters are also interlinked, but the orange cluster are in no relevance to us, as this articulates medical school and systems, indicating higher educations, which are not an area within our topic.

Some clusters, like the two blue clusters (middle and top), can be relevant to us, but are not completely relevant beginning at the noun-phrases mapped in the network. They utilize useful terms, but refers mostly to higher education.

Indication of relevant noun-phrases and articles

In order to go more in depth with our assumptions of our co-occurrence network, we chose to project various kinds of indicators to make sense of the clusters in the co-occurrence network.

Firstly, we made a ‘heat map’, showing where articles from different science disciplines tend to be located, to investigate what each discipline represented to out dataset. To do this we extracted all articles from four different disciplines, namely: psychology, computer sciences, social sciences and engineering. We choose these disciplines, in a wide range of Scopus disciplines, because these disciplines showed the largest impact to our investigation. The result gave us in total more articles, namely 1038 articles than the original dataset of 1028 articles. The 1038 are spread over; 135 articles within engineering; 774 articles in social sciences; only 14 articles within psychology; and finally 215 articles in the discipline computer science. The result can be due to new publications on Scopus and therefore we chose to still include the “new” dataset in our next following visualizations.

Figure 12. The 4 chosen disciplines that are heating our co-occurrence map.

The figure above illustrates our co-occurrence network, but with heated areas. These are heated differently within the different disciplines. The heat projected in each cluster is where terms extracted from the 1028 articles (dataset) are most commonly found.

Our heat maps can indicate, in what discipline, we can find the most commonly terms in our dataset to investigate our controversy. By mapping the noun-phrases, and heating the clusters, we are able to collect the title of the 3 articles that are most typical for every cluster.

Most influential articles by discipline

The heat maps showed us the terms relating to the 4 different science disciplines. Now we take a deeper look into the literature by extracting the top 3 articles that are most typical for each cluster.The network (se below) shows the 30 top articles given to our term extraction of 200 terms with highest specificity core across the articles in the corpus.

Figure 13. A network of the 3 articles that are most typical for each cluster.

To find out whether these articles provided any results of interest we focused on: Educational Technology and Learning Outcomes and decided manually to read the abstract for these 30 articles. We had 5 criterias of relevance due to our research interest:

  • Similar context (location specific) — does the study take place in primary school?
  • Similar context (tool specific) — does the study provide information about an app, a device or digital learning platform for teaching?
  • Effects — does the study show any effects related to ‘Educational Technology’?
  • Side effects — does the study provide any different results, that might impact our field?
  • Methodology — does the study provide a case description that could enhance our approach and understanding?
Top 10 selected articles. For further reading of abstracts click here

We rated the 30 articles, and chose the 10 most relevant based on the abovementioned criterias, and if we look into the articles it appears that most of the articles are placed within the disciplines of computer science and social science — which is the same situation with our previous heat maps. A huge part of the articles investigates how games and other digital learning tools can be used to improve different aspects of learning outcomes. Despite the articles provide relevant information according to our controversy, we nevertheless find a gap in our research, as we among other effect seek to find out which or how psycho-social effects, digital learning influence children. This could be an area for further investigation, in the relatively new field of ‘Educational Technology’ used as a teaching tools in primary schools.


If we look at our list of issues, which we aimed to address in this research, we overall see that the scientific literature consist of an increasing amount of articles provided of Educational Technology and learning outcomes. Furthermore, we can conclude that the most published literature, by now, primarily are represented in the research disciplines: social science, computer science and engineering.

Most of these studies focuses on technological aspects of ‘Educational Technology’, opposite the less represented human aspects within psychology and social science. In this relation, we see that very few of the collected datasets concerns primary school, instead they involve open course institutions, higher education etc.

Since the danish government is testing the application on primary schools, we think that it might be relevant to enrich their decision about implement the application in larger scale, with studies researching the social and psychological effects — in our case in danish primary schools. We find it important that further scientific investigation may engage actors such as pedagogs, teachers, children, parents and psychologists.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade