A setup to semi-automate a check for third-party scripts
Open Web Privacy Measurement with OpenWPM
In May 2018, when GDPR-Panic was at its peak, I needed a setup to check websites for GDPR-related third-party scripts. Since I am administrating more than 30 websites I needed to automate this task somehow.
The more technical parts of GDPR all relate to third-party scripts and cookies.
Even when it was me, who developed the site, I could not guarantee that editors have not posted some embedding codes into the wordpress editor. Embedding a youtube video for example, will load scripts, styles and images from youtube server to the browser of a visitor. Therefor the IP-Address of the visitor is shared with youtube – which is – strictly – already a a violation of GDPR, since you (website owner) is transfering personal data (IP address of your visitor) to a third-party without the consent of this person.
So to be safe you should do a check which third-party services are used on each page.
Setup is the following:
- Extract URLs from a website, not just the homepage, at least every linked page. (There are many scripts for this)
OpenWPM produces 14 tables that you can inspect and cross-check.
The following OpenWPM script runs its test on the big german newspaper “spiegel.de”.
Python Script for OpenWPM:
This will produce a sqllite database at the place you specified with
with a lot of interesting data.
Some simple queries can give you a list of hosts who set cookies in your browser:
“www.spiegel.de” “.ioam.de” “.spiegel.de” “c.spiegel.de” “.doubleclick.net” “.yieldlab.net” “.xplosion.de” “ups.xplosion.de” “.config.parsely.com” “.theadex.com” “.adfarm1.adition.com” “ad13.adfarm1.adition.com” “.twiago.com” “.adsrvr.org”
Lets do the same on an other interesting table:
And this is the result (18 distinct script_urls)
Hope this GDPR-Law would show more teeth (denglish) soon.
Maybe OpenWPM could help with some reliable and large-scale privacy data.