This is a Python project that encompasses various data science concepts including but not limited to:
- Data Engineering: Using the ETL process to retrieve information from the public Hacker News API.
- Database Design with the use of Microsoft Azure Postgres.
- Data Visualizations with Microsoft PowerBI.
- Making data predictions with the use of machine learning.
- Statistical analysis between two groups of users.
This project aims to answer the following questions:
- Once a user’s post is considered one of the top 500 posts on Hacker News, does the user become more active?
- Is there a correlation between a post’s attributes and popularity on if the post makes it to the top 500 in subsequent weeks?
- Is there a statistically significant difference in the activities of users who are among the top 500 and whose posts are not?