Retrieve Enenco Energy Data using Selenium-Wire and Power BI
In a previous article, I described how to retrieve (Hourly) Eneco Energy consumption data using Postman, manually. In this article, I will describe how to automate these steps using Selenium Wire and how to automatically load the results into Power BI.
Introduction
The Eneco website offers the possibility to download your energy consumption data. Unfortunately, the lowest level of granularity is ‘day-level’, which is a huge limitation. We would rather have the data on hourly level as shown in dashboards on both the website and the mobile app. In order to get data on this level we need to re-run the API call that is made by the website (in the background). In this article, I will go through the approach and Python code to do this.
High-level solution — a few words on Selenium Wire
The main advantage of selenium wire (Python) is that it not only allows us to select DOM elements and click on buttons but it also gives us access to the underlying web / API requests that are made. This allows us to copy API keys and authorization header keys that are used with these requests. Once we have these header authorization keys we can construct a new API request with these header values. From there we can load the data into Power BI.
Contents of this article / the approach
We will start by going through the Jupiter Notebook that helps us test/verify the code. From there we will export the python code. We will add a ‘Python Source’ step in Power BI / Power Query so we can load the data in Power BI.
Preparation
In order to run the code you will need to install Selenium Wire, ChromeDriver, and possibly Visual Studio Code to run the Jupiter Note book.
I recommend using Visual Studio Code to run your Jupiter Notebook. The main advantages that I see are;
- Variables are nicely displayed and one can open DataFrames or lists to inspect the contents. It's also very pleasant that the ‘Type’ is given:
- You can easily collapse the output from cells with the keyboard shortcut ‘O’
I normally develop Python using Anaconda, but since we are using Python from Power BI, I noticed that it works better using another kernel. We do need to make sure all packages are installed:
The code / Solution
The Jupiter Notebook can be found here.
Let's go through the code.
The first block imports the required packages. The second block specifies the path to the chromedriver. The chromedriver runs the browser that we can control via Python code. You need to make sure that the path to the chromedriver file that you downloaded is specified correctly.
path = "C:\\Users\\PATH TO YOUR CHROMEDRIVER\\chromedriver.exe"
website = "https://inloggen.eneco.nl/"
In this block we open up the website:
options = webdriver.ChromeOptions()
options.set_capability("goog:loggingPrefs", {'performance': 'ALL'})
driver = webdriver.Chrome(executable_path=path, chrome_options=options)
driver.implicitly_wait(20)
driver.get(website)
From there we wait for certain elements to appear so we can either input our username/password or click on buttons. You will need to specify your username and password in the following code blocks:
try:
time.sleep(5)
inlog = WebDriverWait(driver,40).until(EC.presence_of_element_located((By.CSS_SELECTOR,"input#okta-signin-username")))
inlog.send_keys('your e-mail')
except:
driver.save_screenshot('fail_login.png')
driver.quit()
print("fail")
try:
time.sleep(5)
inlog = WebDriverWait(driver,40).until(EC.presence_of_element_located((By.CSS_SELECTOR,"input#input52")))
inlog.send_keys('YOUR PASSWORD')
except:
driver.save_screenshot('fail_pwd.png')
driver.quit()
print("fail")
btw; I had to add the time.sleep(5)
command to prevent the server from blocking us.
A couple of blocks down, the magic happens. This is where recent requests are evaluated by searching for API requests with ‘usages’ in the url (ps. view the previous article to understand why we search for ‘usages’) :
for r in driver.iter_requests():
if str(r).find('usages') > 0:
au = r.headers['authorization']
apikey = r.headers['apikey']
This allows us to copy the authorization and apikey header keys. Once we have these we can execute a new API call (with adjusted parameters; being the time period that we are interested in; as described in the previous article):
p.s. scroll to the right to see the header keys: authorization and apikey. Also notice the API call asks for data for a couple of days.
r=requests.get("https://api-digital.enecogroup.com/dxpweb/nl/eneco/customers/41128821/accounts/2/usages?aggregation=Day&interval=Hour&start=2022-12-25&end=2023-01-03&addBudget=false&addWeather=true&extrapolate=false", headers={"authorization":au, "apikey":apikey}, verify=False)
result = r.content
You might wonder how we can determine which header keys are required to copy. For this, we can use Postman, as described in the next paragraph.
Figuring out what parameters to provide
In the previous article, we already learned how to copy an API call and how to replay this in Postman. We apply the same approach here, but now we deselect as many header keys as possible to determine which header keys we need to copy (as described above).
Turns out, ‘apikey’ and ‘authorization’ parameters are required, as shown below:
Loading the results in a Pandas Data Frame
Once we have made the request we can parse the JSON results to retrieve the hours and the belonging energy consumption:
r = json.loads(result)
df = pd.DataFrame(r['data']['usages'])
dates = []
usages = []
for entry in df['entries']:
for a in entry:
dates.append(a['actual']['date'])
usages.append(a['actual']['totalUsageCostInclVat'])
You might wonder how to easily figure out the best way to navigate the JSON structure. For this, I recommend using Power BI as well. We can use the file (and approach) from the previous article (using the manual approach) to navigate the JSON structure:
The final step in the python code is loading the results in a Pandas Data Frame, since this is what Power BI expects:
result = pd.concat([pd.DataFrame(dates), pd.DataFrame(usages)], axis=1)
I recommend running all the steps from the Jupyter Notebook in Visual Studio. When you run each cell you will see selenium opening a browser and providing the required input (username/password) and clicking the buttons. This allows you to verify that all steps are working properly.
I have some issues running anaconda from Power BI, so I recommend using another kernel, and specifying that one in Power BI (options -> Python scripting). Select the one where you (pip) installed ‘Selenium Wire’ and perhaps other packages.
Once we see that everything is working properly, we can export the Python file so we can run this from Power BI:
Within Power Query we add a new Data Source, a Python source:
I recommend to first create the steps to load a simple table:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
From there add a step to load the results, please notice that we are referencing ‘result’ as that is the name of the data frame that holds the results:
let
Source = Python.Execute("your python script"),
df1 = Source{[Name="result"]}[Value]
in
df1
I recommend adding the python code after creating these 2 steps, else the python code runs multiple times. So click on ‘source’ here:
Then quickly press ‘load and apply’. Within the UI we will then press ‘refresh’. From here we will see selenium opening up the browser and performing the specified steps:
Once the refresh is completed our dashboard has the data from the API call!
You might get an error like this, in that case you need to install the ‘matplotlib’ package. See the ‘preparation’ paragraph for details:
Conclusion
This concludes this article. Personally, I think selenium wire and its ability to inspect API calls is extremely powerful. This allows us to modify and run API calls to get the data of our interest without having to scrape DOM elements from the UI. I have used this for multiple use cases successfully and I recommend that you apply this approach as well.
Please let me know your thoughts in the comments!
Further reading/viewing
In case you are interested in collecting/scraping data from the web using Power BI / Power Query, please take a look at this article as well. Hope that it's of value to you!