About

Whilst selenium is great for website automation, it feels sometimes that some frontend should exist where the actual website elements can be clicked by the user. When the purpose of "automating" a website shifts to "watching a website for changes" then web-browser automation like selenium just becomes a web client of sorts and a much smaller set of options must be implemented in order to obtain a good end-user product.

The bottom-line is that not all websites even present any hooks, for example, RSS or any other "subscription-mechanisms" that do not shuffle data around the Internet (ie: Disquss spam), such that more than often checking a website for updates becomes a mundane daily chore for an Internet user.

changedetection.io & sockpuppetbrowser

changedetection.io and its sister-project sockpuppetbrowser, is a website watching solution that will leverage sockpuppetbrowser as a web client via the playwright protocol in order to retrieve the string-contents of some element on a website in order to watch whether that element has changed in any way. In the background, sockpuppetbrowser launches headless Chrome browsers that navigate to the page, perform the automation necessary to go forth (for example, logging into a website) and then navigate to read the string value of an HTML element and compare its string value with the hash of a the previously last attempted retrial.

When a change is noticed, changedetection.io will send a notification to any supported notification server - as an example, Gotify was used for this task because it is a great self-hosted notification server and Winify is a great tool created by Wizardry and Steamworks that implements a simple but effective notification scheme for Windows.

For example, the following image depicts a flow that checks the avatar balance of our store account in Second Life.

The automation involves first going to the login-page, logging-in and then navigating away to the marketplace where the balance in L$ is displayed on the top bar. Of course, other data can be pulled, or rather any data can be pulled provided that it is displayed in the browser but getting the in-world balance is a nice application. After the balance is fetched, a notification is sent to Gotify that notifies the user of a balance change.

Scaling Up

sockpuppetbrowser runs a "full browser" that even internalizes the graphical "rendering" because the browser is able to display a screenshot (just like seleneium) which means that for a large amount of websites to watch, launching additional browsers can be costly in terms of resource penalties.

Fortunately, sockpuppetbrowser is puppeteered using playwright and accessed via a websocket URL such that the request can be intercepted by haproxy and then piped back to multiple sockpuppetbrowser instances behind haproxy that are spread out across a cluster.

In this case in particular, a Docker swarm was used out of convenience, such that one replica per machine was dispatched with haproxy providing some uptime and/or performance arbitration to the backend machines running sockpuppetbrowser. The following is a rough sketch of what the network topology looks like with all the pieces in-place:

                                           +-------------------+
                                            +--->| sockpuppetbrowser |
    +--------------------+    +---------+   |    +-------------------+
--->| changedetection.io +--->| haproxy +---+
    +--------------------+    +---------+   |    +-------------------+
                                            +--->| sockpuppetbrowser |
                                                 +-------------------+
                                                           .
                                                           .
                                                           .

or, with words:

  • the user makes a request or a request is scheduled to changedetection.io to see whether a website has changed,
  • changedetection.io tries to connect sockpuppetbrowser but the WebSockets URL to sockpuppetbrowser points to haproxy in fact,
  • haproxy distributes the connection to the machines in the cluster that are running sockpuppetbrowser

Naturally, one instance of sockpuppetbrowser would be running on every node in order to spread out the resource consumption without hogging one single node.

To that end, here is what a haproxy configuration could look like:

frontend sockpuppetbrowser_main
        bind :5550
        default_backend sockpuppetbrowser_servers

backend sockpuppetbrowser_servers
        timeout tunnel 1h
        server docker1 docker.internal:5551 check
        server docker2 docker.internal:5552 check

where:

  • 5550 is the front-facing port that changedetection.io accesses requesting a response from sockpuppetbrowser,
  • 5551 and 5552 are two ports on two machines within the Docker swarm both running sockpuppetbrowser and listening on separate meshed ports

Docker Configuration

Here is the matching configuration of Docker services using compose files.

changedetection.io

version: '3.8'
services:
  changedetection:
    image: dgtlmoon/changedetection.io:latest
    ports:
      - 5500:5000
    volumes:
      - SOME_STORAGE_PATH:/datastore
    environment:
      - PLAYWRIGHT_DRIVER_URL=ws://docker:5550
    deploy:
      replicas: 1
      placement:
        max_replicas_per_node: 1

such that:

  • PLAYWRIGHT_DRIVER_URL points to the HAproxy machine with the hostname docker and with HAproxy listening on port 5550,
  • SOME_STORAGE_PATH is the path to some directory on the host system where changedetection.io can store its data

HAProxy

version: '3.8'
services:
  haproxy-sockpuppetbrowser:
    image: haproxy:latest
    ports:
      - 5550:5550
    volumes:
      - SOME_CONFIG_PATH:/usr/local/etc/haproxy
    deploy:
      replicas: 1
      placement:
        max_replicas_per_node: 1

where:

  • SOME_CONFIG_PATH is the path where haproxy will be able to store its configuration

sockpuppetbrowser.io

version: '3.8'
services:
  sockpuppetbrowser:
    image: dgtlmoon/sockpuppetbrowser:latest
    ports:
      - 5551-5552:3000
    healthcheck:
      test: python3 /usr/src/app/docker-health-check.py --host http://localhost
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    deploy:
      replicas: 2
      placement:
        max_replicas_per_node: 1

which runs two replicas of sockpuppetbrowser in the Docker swarm with one replica per swarm node as the only constraint (in this scenario, in order to ensure that every physical machine runs an instance of sockpuppetbrowser).


docker/scaling_website_automations.txt ยท Last modified: 2025/06/30 11:55 by office

Wizardry and Steamworks

© 2025 Wizardry and Steamworks

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.