Diary of a kind-hearted hacker: Part 4

Scraping the unscrapable

Raymond 'Red' Reddington
3 min readJan 29, 2022

DISCLAIMER: This article translated from a private blog of Chinese hacker community. The author of the translation does not take any responsibility about the content of the original article.

Seeing some webpage in your browser means, you can also save it. Thus, there’s no “unscrapable” websites. But there are many hardly scrapable ones.

Once, I got a request to grab a list of electric car parks or charging zones (green points on the map) of Bavaria. (https://ladeatlas.elektromobilitaet-bayern.de)

First I looked into HTML structure, to see if I can parse it quickly. Not an option… It was powered by Google, so it shows some images instead of actual content.

Then I went to the “Network” tab and found the service request along with other garbage loading. It was a POST request to the getObjects.php, quite intuitive, was’n it.

I copied headers, cookies, put it into my script and tried the first time. Nothing! Some minutes later, I figured out, that I missed the payload of the request. When putting it, I had a bad feeling…

This time the request went smoothly, but my conjectures were justified. As it was in the request, the response data was not readable either. It couldn’t be due to compression/decompression, as python resolves it automatically. I tried all the available encoding or encryption algorithms, but in vain. Well done developers, you won this time — I thought…

After a few cups of coffee and some angry music, I returned to the Ladeatlas. I found the exact place in the JavaScript, where this getObjects.php is being triggered. This time, the “Stack Trace” tab helped.

Although the script was minified and corrupted for read, it was a matter of time to find the exact place where this response is being parsed, and a JSON object revealed.

The last step of this journey was to get somehow this desired JSON. I could achieve this by evaluating it in the console, then copying.

--

--