A statistical look at the blogging project

A visual summary of how much we have procrastinated and what we have talked about

Thimbles
4 min readApr 4, 2014

I wanted to write something special for my last blog. Something meaningful and fun…but what exactly? Being a Linguistics/Neuroscience stream CogSci major, I don’t really have anything clever to say about psychological tests. Then this idea came to me. Why don’t I try to do a little analysis of what we have been blogging about all semester? It took two sleep-deprived days plus all the programming and linguistic skills I could muster, but I made it.

Sadly, as you will see, this turned out to be just a showcase of some graphs and charts rather than the math-y statistical analysis that I’d hoped for. And there is a reason for that other than my poor grasp of all statistical concepts. The data I have collected from our Medium collection are mostly qualitative and with minimal information about the bloggers, so putting it through quantitative analyses just wouldn't make any sense at all.

Dataset/Corpus:

A total of 401 blogs submitted by students of PSYC 406 to the Medium collection before 9 pm on 03/Apr/2014. All submissions from Dr. Stotland have been excluded. (Sorry :P)

*In figure 1, some blogs are grouped by their time of update rather than the original submission date.

**I expect a lot more (about a hundred?) blogs to show up on the last day (Apr.4th), but there just wouldn't be enough time for the data processing, so…we will work with the 401 we got.

First up, take a look at the overall trend of blogs submitted daily from the day this project was announced to the due date. Not surprisingly, we see a gradual increase in early March, followed by rapid increases in the number of submissions closer to the deadline. This class has 104 students, so a total of 520 blogs are supposed to be submitted by Friday 4th. Assume that all 520 blogs will be submitted by the deadline, then 246 out of 520 blogs are submitted/updated in the last week (Mar.31st-Apr.4th). That makes 47.3% of the total number of blogs. This percentage suggests that…well…we are a bunch of procrastinators.

Figure 1. No blogs are submitted during midterm weeks (approx. Feb.14-20). There is a slight increase during Reading Week (Mar.3-7). See those huge spikes from just days before the deadline? ☺

Next, let’s take a look at the length of these blogs. The majority of them fall within the 2-3 minutes range, which are, on average, about 390-635 words long (calculated by taking the mean of 20 random 2-min and 3-min blogs, respectively).

Last but not least, what words are used most often in our collection?

*Keep in mind that all function words such as “the” and “a” are excluded from the list, as well as any word with a frequency lower than 60.

**Brackets indicate that the frequencies of words with the same stem have been combined together.

  • test — 2406 (test/tests/testing)
  • psychology — 814 (psychology/psychological)
  • personality — 418
  • intelligence — 271 (intelligence/IQ)
  • disorder — 236 (disorder/disorders)
  • mental — 212
  • construct — 205 (construct/constructs)
  • validity — 193
  • pain — 166
  • social — 160
  • learn — 153 (learn/learning)
  • reliability — 141
  • health — 130
  • blog — 123
  • sleep — 115
  • job — 114
  • anxiety — 105
  • attachment — 101
  • cognitive — 89
  • sexual — 88
  • love — 83
  • language — 72
  • MBTI — 67
  • ADHD — 67

The two most popular words are self-explanatory (it’s the name of the course!). At third position, with a frequency of 418 — that’s on average more than once in every blog! — we have “personality”. Apparently students of PSYC 406 really enjoys discussing personality tests. And then there are “intelligence” and “mental” “disorder” (I suppose these two words usually come as a phrase). The “construct” of a test, as well as its “validity” and “reliability” get their share of popularity too. The methods we use to measure “pain” also appears to be an interesting topic. I find it quite amusing that the word “blog” is mentioned 123 times, guess I am not the only one to view this project as somewhat of a social psychology experiment. Another thing that caught my eye is the fact that “sleep”, “job” and “anxiety” have similar frequencies and are grouped closely together. The same grouping pattern is seen in “attachment”, “sexual” and “love”. Language tests are discussed by some bloggers too, and finally MBTI tests and ADHD are both mentioned 67 times.

Now that I have presented all of my data, I am not really sure what to make of it. Perhaps just as a visual summary of what the class of PSYC 406 has done in a semester? Or it could be a guideline to future PSYC 406 students on the blogging project? A warning to not procrastinate on the blogs like we have? All I know is that I have certainly had A LOT of fun reading your blogs and doing this data analysis :D

To wrap it up, here’s a pretty word cloud generated from OUR data ☺

--

--