post cover image

[Completed] SDS CP #20 - Sentiment Analysis using YouTube Comments

Project Overview

This project focuses on performing sentiment analysis using tweets collected from YouTube channels. The project targets beginner to intermediate-level data scientists and involves building a machine learning pipeline to extract, analyze, and predict the sentiment of comments on YouTube videos. An ETL (Extract, Transform, Load) pipeline will be orchestrated using Apache Airflow to ensure seamless data management, while a machine learning model will be developed and deployed using Streamlit for real-time sentiment analysis.

Project Objectives

1. Data Collection and Storage:

  • Use the YouTube API to collect comments of recent/relevant videos of a YouTube channel.

  • Store the collected comments in a structured database for analysis.

2. ETL Pipeline Setup:

  • Build an ETL pipeline using Apache Airflow to automate the process of fetching, cleaning, and storing tweet data.

3. Sentiment Analysis Model Development:

  • Perform data preprocessing and exploratory data analysis (EDA).

  • Use a machine learning model from Huggingface to classify the sentiment of comments as positive, negative, or neutral.

4. Model Deployment:

  • Deploy the sentiment analysis model using Streamlit/Huggingface to provide an interactive web application for real-time analysis.


Workflow

Phase 1: Setup (1 Week)

  • Follow Intro to Git & GitHub tutorial on SDS to get started with cloning the SDS GitHub repo to your laptop/desktops.

  • Setup an account on Google Developer using the following video for guidance (https://www.youtube.com/watch?v=th5_9woFJmk).

Phase 2: Data Collection (1 Week)

  • Register and authenticate with YouTube API.

  • Write Python scripts to fetch and store comments.

Phase 3: ETL Pipeline Development (1 Week)

  • Build Airflow DAGs for automated tweet extraction, cleaning, and storage.

  • Test and optimize data loading into a database.

Phase 4: Sentiment Analysis Model (1 Week)

  • Conduct EDA and preprocess comments data.

  • Evaluate a sentiment analysis model.

Phase 5: Model Deployment (1 Week)

  • Build a Streamlit app with input options for YouTube channel/video comments to retrieve.

  • Deploy the app on Streamlit/Huggingface.


Link to GitHub: https://github.com/SuperDataScience-Community-Projects/SDS-CP020-sentiment-analysis-using-youtube

Post a comment