Whilst selenium is great for website automation, it feels sometimes that some frontend should exist where the actual website elements can be clicked by the user. When the purpose of "automating" a website shifts to "watching a website for changes" then web-browser automation like selenium just becomes a web client of sorts and a much smaller set of options must be implemented in order to obtain a good end-user product.
The bottom-line is that not all websites even present any hooks, for example, RSS or any other "subscription-mechanisms" that do not shuffle data around the Internet (ie: Disquss spam), such that more than often checking a website for updates becomes a mundane daily chore for an Internet user.
changedetection.io and its sister-project sockpuppetbrowser, is a website watching solution that will leverage sockpuppetbrowser as a web client via the playwright protocol in order to retrieve the string-contents of some element on a website in order to watch whether that element has changed in any way. In the background, sockpuppetbrowser launches headless Chrome browsers that navigate to the page, perform the automation necessary to go forth (for example, logging into a website) and then navigate to read the string value of an HTML element and compare its string value with the hash of a the previously last attempted retrial.
When a change is noticed, changedetection.io will send a notification to any supported notification server - as an example, Gotify was used for this task because it is a great self-hosted notification server and Winify is a great tool created by Wizardry and Steamworks that implements a simple but effective notification scheme for Windows.
For example, the following image depicts a flow that checks the avatar balance of our store account in Second Life.
The automation involves first going to the login-page, logging-in and then navigating away to the marketplace where the balance in L$ is displayed on the top bar. Of course, other data can be pulled, or rather any data can be pulled provided that it is displayed in the browser but getting the in-world balance is a nice application. After the balance is fetched, a notification is sent to Gotify that notifies the user of a balance change.
sockpuppetbrowser runs a "full browser" that even internalizes the graphical "rendering" because the browser is able to display a screenshot (just like seleneium) which means that for a large amount of websites to watch, launching additional browsers can be costly in terms of resource penalties.
Fortunately, sockpuppetbrowser is puppeteered using playwright and accessed via a websocket URL such that the request can be intercepted by haproxy and then piped back to multiple sockpuppetbrowser instances behind haproxy that are spread out across a cluster.
In this case in particular, a Docker swarm was used out of convenience, such that one replica per machine was dispatched with haproxy providing some uptime and/or performance arbitration to the backend machines running sockpuppetbrowser. The following is a rough sketch of what the network topology looks like with all the pieces in-place:
+-------------------+ +--->| sockpuppetbrowser | +--------------------+ +---------+ | +-------------------+ --->| changedetection.io +--->| haproxy +---+ +--------------------+ +---------+ | +-------------------+ +--->| sockpuppetbrowser | +-------------------+ . . .
or, with words:
Naturally, one instance of sockpuppetbrowser would be running on every node in order to spread out the resource consumption without hogging one single node.
To that end, here is what a haproxy configuration could look like:
frontend sockpuppetbrowser_main bind :5550 default_backend sockpuppetbrowser_servers backend sockpuppetbrowser_servers timeout tunnel 1h server docker1 docker.internal:5551 check server docker2 docker.internal:5552 check
where:
5550
is the front-facing port that changedetection.io accesses requesting a response from sockpuppetbrowser,5551
and 5552
are two ports on two machines within the Docker swarm both running sockpuppetbrowser and listening on separate meshed portsHere is the matching configuration of Docker services using compose files.
version: '3.8' services: changedetection: image: dgtlmoon/changedetection.io:latest ports: - 5500:5000 volumes: - SOME_STORAGE_PATH:/datastore environment: - PLAYWRIGHT_DRIVER_URL=ws://docker:5550 deploy: replicas: 1 placement: max_replicas_per_node: 1
such that:
PLAYWRIGHT_DRIVER_URL
points to the HAproxy machine with the hostname docker
and with HAproxy listening on port 5550
,SOME_STORAGE_PATH
is the path to some directory on the host system where changedetection.io can store its dataversion: '3.8' services: haproxy-sockpuppetbrowser: image: haproxy:latest ports: - 5550:5550 volumes: - SOME_CONFIG_PATH:/usr/local/etc/haproxy deploy: replicas: 1 placement: max_replicas_per_node: 1
where:
SOME_CONFIG_PATH
is the path where haproxy will be able to store its configurationversion: '3.8' services: sockpuppetbrowser: image: dgtlmoon/sockpuppetbrowser:latest ports: - 5551-5552:3000 healthcheck: test: python3 /usr/src/app/docker-health-check.py --host http://localhost interval: 30s timeout: 5s retries: 3 start_period: 10s deploy: replicas: 2 placement: max_replicas_per_node: 1
which runs two replicas of sockpuppetbrowser in the Docker swarm with one replica per swarm node as the only constraint (in this scenario, in order to ensure that every physical machine runs an instance of sockpuppetbrowser).
For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.