In this article we will try to retrieve information using webscraping to plot some interesting graphs about energy consumption. This website provide interesting materials in energy sector: bp.com. Firstly, we're gonna install and import everthing we need.

url = "https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy/year-in-review.html"

!pip install requests
!pip install bs4
!pip install iPython

from IPython.display import IFrame
IFrame('https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy/year-in-review.html',
       width = 800, height = 450)

import requests
from bs4 import BeautifulSoup

response = requests.get(url)
html = response.content

type(response)

requests.models.Response

response.status_code

200

So here is our the beginning of our html page :

html[0:1000]

b'\n    <!DOCTYPE HTML>\n    <html lang="en">\n        <head>\n            \n            <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push(\n{\'gtm.start\': new Date().getTime(),event:\'gtm.js\'}\n);var f=d.getElementsByTagName(s)[0],\nj=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n\'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n})(window,document,\'script\',\'dataLayer\',\'GTM-WJFXK46\');</script>\n            \n            \n    <meta charset="utf-8"/>\n    <meta http-equiv="X-UA-Compatible" content="IE=edge"/>\n    <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>\n    <meta name="keywords" content="Statistical Review of World Energy,Advancing the energy transition,Energy economics,Energy industry,Power generation,Spencer Dale"/>\n    <meta name="description" content="Growth in energy markets slowed in 2019 in line with weaker economic growth and a partial unwinding of some of the one-off factors that boosted energy demand in 2018'

We will use BeautifulSoup to make things quicker.

soup = BeautifulSoup(html, 'html.parser')
type(soup)

bs4.BeautifulSoup

The parser transform our html in something more readable

soup # to see the result

Then we use CSS selector to find what we want, here we want to extract the array : Fuel shares of primary energy and contributions to growth in 2019.

element = soup.select('td')

soup.select("div.field-items table tr td")

[]

type(element)

list

texte_data = [elements.get_text() for elements in element]

texte_data = texte_data[0:35]
texte_data

['\xa0Oil',
 '\xa0193.0',
 '\xa01.6',
 '\xa033.1%\xa0',
 '\xa0-0.2%',
 '\xa0Gas',
 '\xa0141.5',
 '\xa02.8',
 '\xa024.2%',
 '\xa00.2%',
 '\xa0Coal',
 '\xa0157.9',
 '\xa0-0.9',
 '\xa027.0%',
 '\xa0-0.5%',
 '\xa0Renewables*',
 '\xa029.0',
 '\xa03.2',
 '\xa05.0%',
 '\xa00.5%',
 '\xa0Hydro',
 '\xa037.6',
 '\xa00.3',
 '\xa06.4%',
 '\xa00.0%',
 '\xa0Nuclear',
 '\xa024.9',
 '\xa00.8',
 '\xa04.3%',
 '\xa00.1%',
 '\xa0Total',
 '\xa0583.9',
 '\xa07.7',
 '\xa0',
 '\xa0']

EnergySource, ConsumptionExajoules, AnnualChangeExajoules, ShareOfPrimaryEnergy, PercentagePointChangeShare2018 = texte_data[::5], texte_data[1::5], texte_data[2::5], texte_data[3::5], texte_data[4::5]

EnergySource
ConsumptionExajoules

['\xa0193.0',
 '\xa0141.5',
 '\xa0157.9',
 '\xa029.0',
 '\xa037.6',
 '\xa024.9',
 '\xa0583.9']

import pandas as pd

df = pd.DataFrame({"Energy Source" : EnergySource, "Consumption Exajoules" : ConsumptionExajoules, "AnnualChangeExajoules" : AnnualChangeExajoules, "ShareOfPrimaryEnergy" : ShareOfPrimaryEnergy, "PercentagePointChangeShare2018" : PercentagePointChangeShare2018})
df

df.PercentagePointChangeShare2018[6]="0"
df.PercentagePointChangeShare2018
df.ShareOfPrimaryEnergy[6]="0"
df.ShareOfPrimaryEnergy[0]

'\xa033.1%\xa0'

df.ShareOfPrimaryEnergy = df.ShareOfPrimaryEnergy.str.replace("\xa0", "").str.replace("%", "").str.replace(" ", "").astype(float)

df

df.dtypes

Energy Source                      object
Consumption Exajoules              object
AnnualChangeExajoules              object
ShareOfPrimaryEnergy              float64
PercentagePointChangeShare2018     object
dtype: object

df["Consumption Exajoules"]=df["Consumption Exajoules"].astype(float)

df.AnnualChangeExajoules = df.AnnualChangeExajoules.astype(float)

df.PercentagePointChangeShare2018 = df.PercentagePointChangeShare2018.str.replace("%", "").astype(float)

df.PercentagePointChangeShare2018

0   -0.2
1    0.2
2   -0.5
3    0.5
4    0.0
5    0.1
6    0.0
Name: PercentagePointChangeShare2018, dtype: float64

df["Energy Source"]=df["Energy Source"].str.replace("*", "").str.replace("\xa0", "").str.replace(" ", "").astype(str)

df["Energy Source"]=df["Energy Source"].astype(str)

df["Energy Source"]

0             Oil
1             Gas
2            Coal
3     Renewables*
4           Hydro
5         Nuclear
6           Total
Name: Energy Source, dtype: object

ax = df.plot(kind='bar',x='Energy Source', figsize=(15,3))
ax.set_title("Energy repartition")
ax.set_ylabel("Exajoule")

Text(0, 0.5, 'Exajoule')

df.to_csv('Consumption.csv')

df

df[0:6].pivot_table(index='Energy Source', values='Consumption Exajoules')

df[0:6].pivot_table(index='Energy Source', values='Consumption Exajoules').plot(kind='pie', x="Energy Source", subplots=True, stacked=True)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x0000023603EC70B8>],
      dtype=object)

df[0:6].pivot_table(index='Energy Source', values='Consumption Exajoules').plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x2360871ac50>

df[["Energy Source","Consumption Exajoules"]][0:6]

df

! jupyter nbconvert --to html "Energy_Webscraping.ipynb"

[NbConvertApp] Converting notebook Energy_Webscraping.ipynb to html
[NbConvertApp] Writing 365849 bytes to Energy_Webscraping.html

Webscraping in Energy Sector

Join Newsletter

Written by Stéphan

	Energy Source	Consumption Exajoules	AnnualChangeExajoules	ShareOfPrimaryEnergy	PercentagePointChangeShare2018
0	Oil	193.0	1.6	33.1%	-0.2%
1	Gas	141.5	2.8	24.2%	0.2%
2	Coal	157.9	-0.9	27.0%	-0.5%
3	Renewables*	29.0	3.2	5.0%	0.5%
4	Hydro	37.6	0.3	6.4%	0.0%
5	Nuclear	24.9	0.8	4.3%	0.1%
6	Total	583.9	7.7

	Energy Source	Consumption Exajoules	AnnualChangeExajoules	ShareOfPrimaryEnergy	PercentagePointChangeShare2018
0	Oil	193.0	1.6	33.1	-0.2%
1	Gas	141.5	2.8	24.2	0.2%
2	Coal	157.9	-0.9	27.0	-0.5%
3	Renewables*	29.0	3.2	5.0	0.5%
4	Hydro	37.6	0.3	6.4	0.0%
5	Nuclear	24.9	0.8	4.3	0.1%
6	Total	583.9	7.7	0.0	0

	Energy Source	Consumption Exajoules	AnnualChangeExajoules	ShareOfPrimaryEnergy	PercentagePointChangeShare2018
0	Oil	193.0	1.6	33.1	-0.2
1	Gas	141.5	2.8	24.2	0.2
2	Coal	157.9	-0.9	27.0	-0.5
3	Renewables	29.0	3.2	5.0	0.5
4	Hydro	37.6	0.3	6.4	0.0
5	Nuclear	24.9	0.8	4.3	0.1
6	Total	583.9	7.7	0.0	0.0

	Consumption Exajoules
Energy Source
Coal	157.9
Gas	141.5
Hydro	37.6
Nuclear	24.9
Oil	193.0
Renewables	29.0