Web Scraping and EDA with Python

Web scraping using Python from Real Website & Exploratory Data Analysis of Highest Ranked United Kingdom Companies by Forbes

Introduction

I was able to scrape data from Wikipedia using two Python libraries ('beautiful soup' & "request"). Then performed exploratory data analysis on the United Kingdom's top 50 ranked companies in 2021, ranked by Forbes.

This gave us a comprehensive dataset that includes each company's headquarters location and industry sector, along with their financial figures in billions of US dollars for the year 2021.

Then, we moved on to the exploratory data analysis phase. We used Python's powerful data science libraries, such as pandas and seaborn, to explore the distribution of revenue, profit, assets, and value among these companies. We also analyzed the number of companies in each industry and location and examined the correlation between different financial variables.

Click here to view the analysis report

Here are some key insights found:

Based on the data analysis of the top 50 UK companies listed in the Forbes Global 2000, we can draw the following conclusions:

Revenue, Profit, Assets, and Value Distributions: The majority of the companies have revenue, profit, assets, and value of less than 100 billion US$. The distributions of these variables are positively skewed, indicating that a few companies have extremely high revenue, profit, assets, and value.
Industry Analysis: The 'Banking' industry has the highest number of companies among the top 50 UK companies. This is followed by the 'Oil & Gas Operations' and 'Insurance' industries. The 'Telecommunications services', 'Pharmaceuticals', 'Conglomerates', and 'Food Retail' industries have the least number of companies among the top 50.
Location Analysis: London has the highest number of companies among the top 50 UK companies. This is followed by Slough and Brentford. Other locations have significantly fewer companies.
Correlation Analysis: 'Revenue' has a strong positive correlation with 'profit', indicating that companies with higher revenue tend to have higher profit. 'Profit' have a low positive correlation with 'Value', suggesting that companies profit tend to have positve low influence on their values. 'Profit' and 'Assets' have a moderate positive correlation. 'Assets' and 'Value' have a low positive correlation.

These findings provide valuable insights into the characteristics and performance of the top 50 UK companies. They can be used to inform business strategies and decision-making. However, it's important to note that these findings are based on the data from the year 2021, and the current situation may be different. Therefore, using the most recent data for making business decisions is recommended.