ChangeLog

  • 28 November 2018 - updated.
  • 16 April 2015 - disabled SSL checks.

About

Sometimes it is interesting to see what images can be found on the Internet and Google does a good job of that. However, Google does not expose an API for image searches by picture upload. For that purpose, one can use a few packages to create a python script that automatically uploads a bunch of images and checks whether Google returns any results.

Ingredients

For this project, you will need the following:

  • Firefox (just for testing).
  • Selenium for automation and creation of the initial scripts.
  • PhantomJS for running a headless browser to get rid of the Firefox dependency.

Using Selenium

Using Firefox, browse to the selenium download page and install the latest release. It will, most likely, install about three addons for Firefox.

Once the addons are installed, you can go to Firefox→Preferences…→Addons and select Preferences on the Selenium IDE addon. On the very first page, you will find an option named Enable Experimental Features which should be enabled.

After that, you can go to a page and select from the Firefox menu Tools→Selenium IDE to open up the macro recorder. After that, you can press the red button on Selenium's pane and perform a few actions in the browser. You will notice that a script will be written in Selenium IDE. Once you are done, you can click the red record button again and it will stop recording.

Now, to generate a script for various languages, you can select the Selenium IDE window and go to Options→Format and chose the language in which the script will be generated. Finally, you can go to the Selenium IDE and select the Source tab and copy the final script somewhere.

If you are using Python bindings like the example script here does, then you may need to install the Python bindings for selenium. This can be accomplished using easy_install and pip on OSX:

sudo easy_install pip
pip install -U selenium

Installing PhantomJS

PhantomJS can be used by Selenium such that your script will run headless (without any GUI). To install PhantomJS, on OSX using homebrew, simply issue:

brew install phantomjs

Note that the script posted in the code section bellow was designed for phantomjs at 2.1.1 and that you may have to install an older phantomjs if your current phantomjs does not work.

Other operating systems may have their own way of installing PhantomJS. In any case, just follow the installation procedures to install PhantomJS.

Pulling it Together

The following python script runs on the command-line, takes as parameter a directory, and searches all the images inside that directory through Google image search. In case the found image is found on other pages, the script takes a screenshot of the Google results page in the current working directory where the command was run.

#!/usr/bin/python
###########################################################################
##  Copyright (C) Wizardry and Steamworks 2014 - License: GNU GPLv3      ##
##  Please see: http://www.gnu.org/licenses/gpl.html for legal details,  ##
##  rights of fair usage, the disclaimer and warranty conditions.        ##
###########################################################################
 
############################### Defines ####################################
# These are the messages that appear on the page once an image is found on 
# other pages or when a searched image is unique. For images found on other
# pages, the Google search page will contain the text:
COMMON_INDICATOR = "Pages that include matching images"
# For unique images, the Google search page will contain the text:
UNIQUE_INDICATOR = "Your search did not match any documents"
# For images that are similar (colors, background, etc...)
VISUAL_INDICATOR = "Visually similar images"
# These do not have to be localised because we are using google.com.
###########################################################################
 
# imports and packages
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support import expected_conditions as EC
from contextlib import contextmanager
import unittest, time, re, time, os, sys, random
 
# the tool takes as parameter a directory so check the command-line arguments
if len(sys.argv) != 2:
    print "Syntax: " + sys.argv[0] + " " + "<directory>"
    sys.exit(1)
folder = os.path.abspath(sys.argv[1])
if not os.path.isdir(folder):
    print "Syntax: " + sys.argv[0] + " " + "<directory>"
    sys.exit(1)
 
# we need to set the user-agent because the default user-agent mentions X11 which
# makes Google offer an image search page without the possibility to upload an image
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    # Google Chrome User-Agent
    "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"
)
# setup webdriver with phantomjs
driver = webdriver.PhantomJS(
	desired_capabilities=dcap,
    # ignore any SSL errors and disable any system proxy usage
	service_args=['--ignore-ssl-errors=true', '--proxy-type=none']
)
driver.set_window_size(800, 1024)
 
# connect through HTTPs to the imae search, specify english as the language (hl), and open the search pane (sbi).
base_url = "https://www.google.com/imghp?hl=en&sbi=1"
wait = WebDriverWait(driver, 60)
 
# open the folder specified on the command-line and for every file perform the following actions:
#   * go to https://images.google.com
#   * click the camera button
#   * click the upload button
#   * send the path to the file in the file-picker
#   * wait until the page contains an indicator for unique, respectively common images
#   * if it is a common image (found on other pages), take a screenshot of the results
#   * if it is not a common image, print out the name of the image indicating that it is unique
#   * if any error occurs during processing, print an error message and take a screenshot
listing = os.listdir(folder)
for infile in listing:
    if not infile.startswith('.'):
        # random intervals are added to avoid Google's bot sensing - this slows the search 
        # but makes the whole process more apparent of a human being searching for images
        time.sleep(random.uniform(1, 5))
        try:
            driver.get(base_url)
            time.sleep(random.uniform(1, 5))
            driver.find_element_by_link_text("Upload an image").click()
            time.sleep(random.uniform(1, 5))
			# click the "Choose File" button.
            driver.find_element_by_id("qbfile").send_keys(os.path.join(folder,infile))
            wait.until(
                lambda d: 
                    COMMON_INDICATOR in driver.page_source or 
                    UNIQUE_INDICATOR in driver.page_source or
                    VISUAL_INDICATOR in driver.page_source
            )
            if COMMON_INDICATOR in driver.page_source:
                driver.save_screenshot('COMMON_' + infile + '.png')
                print 'Image: ' + infile + ' is not unique'
            else:
                print 'Image: ' + infile + ' is unique'
        except Exception, e:
            driver.save_screenshot('ERROR_' + infile + '.png')
            print 'Error processing: ' + infile + ' : '
driver.close()

Note that the script needs selenium and phantomjs to be installed as indicated above.

Postamble

Note that automating the Google Image Search is, apparently, a violation of their Terms of Service. Be vigilant.


web/automating_google_image_search.txt · Last modified: 2022/04/19 08:28 by 127.0.0.1

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.