Global Market Insights after the Pandemic.
January 21, 2021 by Chris
2020 was a difficult year and most of the market segments diverged from the projections! With the new facts at hand, and the emerging needs after the pandemic, we should expect changes in the market.
Below is an attempt to get a feeling of the emerging sectors by analyzing some Global Market Insights reports.
And as always, let's automate ...
Package Installation
!pip install requests beautifulsoup4 import requests import re from bs4 import BeautifulSoup import string import pandas as pd
class GlobalMarketInsights: __DEFAULT_BASE_URL = 'https://www.gminsights.com/industry-reports' @staticmethod def _escape(input): printable = string.ascii_letters + string.digits + string.punctuation + ' ' return ''.join(c if c in printable else ' ' for c in input ) def _description_matcher(self, descr): descr = GlobalMarketInsights._escape(descr.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')) start_date = end_date = percentage = market_name = None r1 = re.search('^(.*) (?:Market|Aftermarket) .*$', descr, re.IGNORECASE) if r1: market_name = r1.group(1).strip() r2 = re.search('.* between (\d+) (?:and|to) (\d+)', descr) if r2: start_date = r2.group(1).strip() end_date = r2.group(2).strip() r3 = re.search('.* (?:from|of) (\d+) to (\d+)', descr) if r3: start_date = r3.group(1).strip() end_date = r3.group(2).strip() r4 = re.search('([-+]?\d*\.\d+|\d+)%', descr) if r4: percentage = r4.group(1) if None in (market_name, percentage, start_date, end_date): raise Exception(f"Couldn't parse: {descr}") else: return { "market": market_name, "percentage" : float(percentage), "start": int(start_date), "end": int(end_date) } def get(self, page=1): page = requests.get(f"{GlobalMarketInsights.__DEFAULT_BASE_URL}?page={page}") soup = BeautifulSoup(page.text, 'html.parser') single_rds = soup.find_all('div', class_='single_rd') reports = [] for single_rd in single_rds: single_rd_children = single_rd.findChildren() for single_rd_child in single_rd_children: if single_rd_child.has_attr('class') and single_rd_child['class'][0] == 'rd_desc': description = single_rd_child.getText() try: reports.append(self._description_matcher(description)) except Exception as e: # print(e) pass break return reports def fetch_all_reports(self): # get the total number of pages and start iterating page = requests.get(f"{GlobalMarketInsights.__DEFAULT_BASE_URL}?page=1") lun_q = 'Displaying \d+ records out of (\d+) on Page \d+ of (\d+)' r = re.search(lun_q, page.text) if r: number_of_records = r.group(1) number_of_pages = r.group(2) else: raise Exception('No pages or data!') all_reports = [] for page in range(1, int(number_of_pages) + 1, 1): page_reports = self.get(page=page) all_reports += page_reports return int(number_of_records), all_reports
Scraping web pages is always challenging. In this case especially, the task was a bit tedious since the different report descriptions where not following a unique pattern.
global_market_insights = GlobalMarketInsights() number_of_records, all_reports = global_market_insights.fetch_all_reports() print(f"Parsed {len(all_reports)} out of {number_of_records} report descriptions!")
Parsed 1200 out of 1964 report descriptions!
Next, we add the reports to a dataframe for better presentation and easier data manipulation.
gmi_reports_df = pd.DataFrame(all_reports) gmi_reports_df.head()
market | percentage | start | end | |
---|---|---|---|---|
0 | Food phosphate | 6.0 | 2021 | 2027 |
1 | Supply Chain Analytics | 16.0 | 2021 | 2027 |
2 | Cooking Coconut Milk | 8.5 | 2020 | 2026 |
3 | Steel Rebar | 4.0 | 2021 | 2027 |
4 | 2,5-Dimethyl-2,4-Hexadiene | 2.5 | 2021 | 2027 |
So far, so good! Let's try to sort by percentage and see which sector is projected to perform more than 30% the following years.
sector_projection_ascending = gmi_reports_df.sort_values('percentage', ascending=False) sector_projection_ascending.loc[(sector_projection_ascending['percentage']>30) & (sector_projection_ascending['start']>=2020)]
market | percentage | start | end | |
---|---|---|---|---|
435 | SD-WAN | 60.0 | 2020 | 2026 |
575 | Cannabidiol (CBD) | 52.7 | 2020 | 2026 |
393 | (Light Fidelity) Li-Fi | 50.0 | 2020 | 2030 |
40 | Healthcare Artificial Intelligence | 43.7 | 2020 | 2026 |
153 | Automotive Subscription Services | 40.0 | 2020 | 2026 |
964 | AI in Manufacturing | 40.0 | 2020 | 2025 |
363 | Artificial Intelligence (AI) in BFSI | 40.0 | 2020 | 2026 |
50 | Robotic Process Automation | 40.0 | 2020 | 2026 |
207 | Fuel Cell Electric Vehicle | 38.0 | 2020 | 2026 |
637 | AI in Automotive | 35.0 | 2020 | 2026 |
503 | Artificial Intelligence Chipsets | 35.0 | 2020 | 2026 |
105 | Total Knee Replacement | 34.7 | 2020 | 2026 |
212 | Vaginal Rejuvenation | 33.7 | 2020 | 2026 |
342 | Carbon Wheels | 32.3 | 2020 | 2026 |
It is becoming pretty obvious that everything around Artificial Intelligence yields the best projections, and is an attractive area for investments :)