placeholder pic

Web-Scrapping

June 23, 2023

Web-Scrapping

YouTube Scrapper

Data Retrieval

Streamlit App

YouTube API Scrapped Streamlit App

The app includes:

  • Basic info about the channel
  • Scrollable table to sort out channel videos by views/likes/upload date
  • Distribution/box plots for video likes/views
  • Scatter plot between views and likes along with their regression
  • Views/Likes by year
  • Time Series Plot for Videos vs Upload Date

LuLu Market Place Scrapper

Data Retrieval

  • Using Scrapy library, product details have been scraped from various products listed for various sub categories under the electronics section
  • The scraped data is saved in a JSON file. This is a list of dictionaries where each dict element has product id, name price, url, and summary.
  • The spider crawling code to extract this data is in file lulu_product_details.py

Streamlit App

LuLu Electronics Streamlit App

The app includes:

  • Category distribution
  • Price analysis
  • Word cloud for each sub-category based on product descriptions

Yahoo Finance Stock Market Data Mining

Data Retrieval

  • This crawler uses Scrapy to crawl finance.yahoo.com, and collects information on stocks in various sectors including healthcare, technology, energy, and more. It collects information such as the ticker, name, price, change in price, etc. of stocks and stores them.

    • An example data is added. The data is collected on the 1st August, 2023
    • In order to practise storing data in PostgreSQL and MongoDB in the local machine using Scrapy, 2 pipeline scripts are written along with necessary additions in settings.py

Streamlit App

  • A streamlit app is developed to visualize the collected data. The app has following visualizations:
    • Top 10 companies based on selected sector and numerical column('avg_vol_3_month', 'intraday_price', 'market_cap', 'pe_ratio_ttm', 'percent_change', 'price_change', 'volume')
    • Data distribution and boxplot for selected sector and numerical column
    • Donut chart showing % distribution of companies within a sector based on a numerical column divided into 6 ranges.
    • Scatter plot for Market Cap vs. PE Ratio for a selected sector. The size of a point is determined by average volume
    • Scatter matrix showing pairwise relationships between numerical columns
    • Violin Plot showing distribution of selected numerical column by sector
    • Bar Chart showing average volume by sector
    • Pie Chart showing distribution of Companies by sector
    • Heatmap representing the correlation matrix between numerical columns

← Back to home