The Type of My Activity on Reddit

In general we can say that there are two main types of activity on Reddit: creating a new content or interacting with someone else’s existing content. Even though one could argue that commenting on someone else’s post is also a type of interaction with an existing content, for the sake of simplicity in this particular analysis I will count it as a content creation because it is a new content on its own that has it’s on seperate score and sub-comments. Therefore, I will focus on the posts and comments that I have created and the posts and comments that I have voted on.

In the light of the data I have collected I wanted to compare these two types of activity in terms of what affects me to do make one type and not the other. After some exploration, I have decided to focus on the following three questions:

  1. Does the subreddit itself have any effect on the type of my activity?
  2. Does the type of a subreddit, or its tags, have any effect on the type of my activity?
  3. Is there significant difference between me voting or creating on a subreddit that I am not subscribed to?


The Subreddits

First, let’s take a look at the subreddits that I have interacted with. I want to see if there is any connection between the subreddits I have interacted with and the type of my activity and if there is any significant difference between the subreddits that I create content on and the subreddits that I vote on. I want to test the following hypotheses:
Null Hypothesis: The subreddit and the type of my activity are independent.
Alternative Hypothesis: The subreddit and the type of my activity are dependent.

Figure 4.1 - Created and Voted Counts by Subreddits

After combining “created” and “voted” data in a crosstable, I have perfomed a Chi-Square Test against a significance level of 0.05. The test resulted in a p-value of 1.39E-11 which is way below the significance level. So, the test result rejects the null hypothesis and which means that the subreddit and the type of my activity are dependent. However, as we can see from the figure, the number of “vote” interactions are way higher than the number of “create” interactions. So, additionally I have also calculated Cramer’s V to see the strength of the relationship. The result was 0.214 which shows a moderate strength.

I also wanted to see the relationship between “voted” count and “created” count for each subreddit. The following figure shows the ratio of voted and total count for subreddits. Note that the figure is sorted by the total number of interactions.

Slider can be used to hide/show subreddits according to minimum number of total interactions.

Figure 4.2 - Voted Ratio by Subreddits

This figure also clearly displays the sheer size difference between the two type. However, now we can see some interesting details. It is clear that as the number of interactions decrease I tend to just vote. Probably these are subs that I am not active or even subbed to such as r/meirl.

However, there some subreddits that are noticably close to a 0.5 ratio (excluding the ones with very low number of interactions) such as r/flashcarts, r/NintendoDSi, and r/webdev which all are mainly ask question/give answer type of subreddits. This might be because I tend to ask questions or make comments on these subreddits rather then voting and passing. I think it is also it interesting to note that I am not subscribed to any of these three subreddits.



The Tags

I wanted to make almost the exact same analysis but this time with the tags. I wanted to see if there is any relationship between the tags and my activity. So, in order to test if there is any dependency between the tags and vote direction I have performed a Chi-Square Test with a significance level of 0.05 again to test the following hypotheses:
Null Hypothesis: The tag and the type of my activity are independent.
Alternative Hypothesis: The tag and the type of my activity are dependent.

The following figure shows the distribution of the total number of "create" and "vote" interactions for each tag.

Figure 4.3 - Created and Voted Counts by Tags

The test rejected the null hypothesis with a p-value of 4.66E-6 which means that the tag and the type of my activity are not independent. However, the test also showed that the effect size is very small with a Cramer’s V of 0.09. This means that the tag is not a good predictor of my activity type.
Again this result is highly affected by the huge difference between the number of “create” and “vote” interactions. So, I have also calculated the ratio of “vote” and total count for each tag. The following figure shows the result. Note that the figure is sorted by the total number of interactions.

Figure 4.4 - Voted Ratio by Tags

Also from this figure it is clear that for any tag I vote way more than I create, around 90%. This displays the effect of size difference for the previous test. Also compared to subreddits tags do not show any variation in the ratio. This might be because tags are more general and there only eight of them for every subreddit, so in aggragetion they tend to get more uniform towards the higher number of voting interaction.



The Subscriptions

Lastly, I wanted to see if there is any significant difference between me voting or creating on a subreddit that I am not subscribed to. So, I have performed a Chi-Square Test with a significance level of 0.05 again to test the following hypotheses:
Null Hypothesis: The subscription and the type of my activity are independent.
Alternative Hypothesis: The subscription and the type of my activity are dependent.

Figure 4.5 - Created and Voted Counts by Subscriptions

This time the test was not able to reject the null hypothesis. The resulting p-value was 0.66 which is way above the significance level. So, the test result says that the subscription and the type of my activity are independent. Considering the two previous tests, this result is not surprising, because the subscription is like even a more general version of the tag system where there are only two options.

Figure 4.6 - Voted Ratio by Subscriptions

However, along with the ratio figure which again shows 90% voting ratio, this test's results were the most surprising for me. For some reason I have expected that I would have a higher ratio of "creation" in the subreddits that I am not subscribed to because I would be more likely to ask questions or make comments in those subreddits while almost never vote since they are not on my feed. Since the test and figures contradicts my first thoughts, I have decided to think about my own behavior and try to find a reason for this result. I have come up with a possible reason.

There are actually some subreddits, such as r/BaldursGate3, that I was subscribed for considerable amount of time and interacted with a lot, but then at some point I lost interest and unsubscribed from them to clear my feed. Since the data I have collected do not include any information on my past subscriptions, these type of subreddits are counted as not subscribed even though there is a history of subsrciption and high interaction.