My Votes on Reddit
Reddit also provided a dataset of all my votes including both the ones that are made on posts and comments. This is basically a like/dislike system, where I can upvote or downvote posts and comments. The data also included what I have voted on, so I wanted to see if there was any pattern to my voting behavior.
When we look at the total number of votes I've made in each direction, up and down, it can be see that I tend to almost never downvote which means that it would be hard to analyze and find any patterns directly from the number of my votes only. In order to tackle this problem, through Reddit API I have collected the total score of each item I have voted on, which is the number of upvotes minus the number of downvotes. This way, I can instead form a hypothesis on the general popularity of the items I have voted on.
The Total Score
One analysis that comes to mind quickly is to see if I follow the general trend or in other words if I tend to upvote on more popular items while downvoting on less popular items. One way of looking into this is comparing the total score of the items I have upvoted and downvoted. The following figure shows the distribution of the total score of the items I have upvoted and downvoted. Note that data includes huge outliers and weird distributions due to the nature of Reddit with different subreddits that have different sizes, so I have used a log scale for the x-axis.
I have formed the following hypotheses,
Null Hypothesis: My upvote or downvote is not corralated with the total score of the post or comment.
Alternative Hypothesis: I vote up on posts and comments that have a higher score and down on posts and comments that have a lower score.
Then I have performed a t-test against a significance level of 0.05 to see if I can reject the null hypothesis. However, the result p-value was a big number, 0.81, which means that I cannot reject the null hypothesis.
However, this test might be bit problematic because still there is a huge difference between the amount of upvotes and downvotes I have made. Also, due to the nature of Reddit with different subreddits at different interactivity level and popularity, and feed display algorithm the items that I have voted tends to stay in a certain range of total score.
The Tags
Each item in Reddit belongs to a subreddit one way or another. So, technically I can analyze my voting behavior on each subreddit; however looking into each individual subreddit could be a bit messy for this particular instance, instead we can use the tags that are assigned to each subreddit via human annotations (see about for more details.). I wanted to see if there is any relationship between the tags and my voting behavior. So, in order to test if there is any dependency between the tags and vote direction I have performed a Chi-Square Test with a significance level of 0.05 using scipy’s chi2_contingency.
The following figure shows the distribution of the total number of upvotes and downvotes for each tag.
The test rejected the null hypothesis with a p-value of 8.3E-6 which means that the tag and vote direction are not independent. However, the test also showed that the effect size is very small with a Cramer's V of 0.09. This means that the tag might not a good predictor of my voting behavior.
One possible reason for this result is might be again the difference between the number of upvotes and downvotes I have made. Also, the tags are assigned to subreddits and not to individual posts or comments. So, the tags might be generalizing the content of the posts or comments a bit too much, which uniformizes the voting behavior.
However, a small direct look into the votes will show that I tend to downvote on posts and comments that are tagged as game. One possible reason for this behavior might be the vast diversity of taste in the gaming community and people's tendency to enter into arguments and discussions about their and others' tastes which leads to downvotes.