HackerNews – Project Introduction

This is a Python project that encompasses various data science concepts including but not limited to:

  • Data Engineering: Using the ETL process to retrieve information from the public Hacker News API.
  • Database Design with the use of Microsoft Azure Postgres.
  • Data Visualizations with Microsoft PowerBI.
  • Making data predictions with the use of machine learning.
  • Statistical analysis between two groups of users.

This project aims to answer the following questions:

  1. Once a user’s post is considered one of the top 500 posts on Hacker News, does the user become more active?
  2. Is there a correlation between a post’s attributes and popularity on if the post makes it to the top 500 in subsequent weeks?
  3. Is there a statistically significant difference in the activities of users who are among the top 500 and whose posts are not?