Sometimes it is interesting to see what images can be found on the Internet and Google does a good job of that. However, Google does not expose an API for image searches by picture upload. For that purpose, one can use a few packages to create a python script that automatically uploads a bunch of images and checks whether Google returns any results.
For this project, you will need the following:
Using Firefox, browse to the selenium download page and install the latest release. It will, most likely, install about three addons for Firefox.
Once the addons are installed, you can go to Firefox→Preferences…→Addons
and select Preferences
on the Selenium IDE
addon. On the very first page, you will find an option named Enable Experimental Features
which should be enabled.
After that, you can go to a page and select from the Firefox menu Tools→Selenium IDE
to open up the macro recorder. After that, you can press the red button on Selenium's pane and perform a few actions in the browser. You will notice that a script will be written in Selenium IDE. Once you are done, you can click the red record button again and it will stop recording.
Now, to generate a script for various languages, you can select the Selenium IDE window and go to Options→Format
and chose the language in which the script will be generated. Finally, you can go to the Selenium IDE and select the Source
tab and copy the final script somewhere.
If you are using Python bindings like the example script here does, then you may need to install the Python bindings for selenium. This can be accomplished using easy_install
and pip
on OSX:
sudo easy_install pip pip install -U selenium
PhantomJS can be used by Selenium such that your script will run headless (without any GUI). To install PhantomJS, on OSX using homebrew, simply issue:
brew install phantomjs
Note that the script posted in the code section bellow was designed for phantomjs
at 2.1.1
and that you may have to install an older phantomjs if your current phantomjs does not work.
Other operating systems may have their own way of installing PhantomJS. In any case, just follow the installation procedures to install PhantomJS.
The following python script runs on the command-line, takes as parameter a directory, and searches all the images inside that directory through Google image search. In case the found image is found on other pages, the script takes a screenshot of the Google results page in the current working directory where the command was run.
#!/usr/bin/python ########################################################################### ## Copyright (C) Wizardry and Steamworks 2014 - License: GNU GPLv3 ## ## Please see: http://www.gnu.org/licenses/gpl.html for legal details, ## ## rights of fair usage, the disclaimer and warranty conditions. ## ########################################################################### ############################### Defines #################################### # These are the messages that appear on the page once an image is found on # other pages or when a searched image is unique. For images found on other # pages, the Google search page will contain the text: COMMON_INDICATOR = "Pages that include matching images" # For unique images, the Google search page will contain the text: UNIQUE_INDICATOR = "Your search did not match any documents" # For images that are similar (colors, background, etc...) VISUAL_INDICATOR = "Visually similar images" # These do not have to be localised because we are using google.com. ########################################################################### # imports and packages from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.common.desired_capabilities import DesiredCapabilities from selenium.webdriver.support import expected_conditions as EC from contextlib import contextmanager import unittest, time, re, time, os, sys, random # the tool takes as parameter a directory so check the command-line arguments if len(sys.argv) != 2: print "Syntax: " + sys.argv[0] + " " + "<directory>" sys.exit(1) folder = os.path.abspath(sys.argv[1]) if not os.path.isdir(folder): print "Syntax: " + sys.argv[0] + " " + "<directory>" sys.exit(1) # we need to set the user-agent because the default user-agent mentions X11 which # makes Google offer an image search page without the possibility to upload an image dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = ( # Google Chrome User-Agent "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11" ) # setup webdriver with phantomjs driver = webdriver.PhantomJS( desired_capabilities=dcap, # ignore any SSL errors and disable any system proxy usage service_args=['--ignore-ssl-errors=true', '--proxy-type=none'] ) driver.set_window_size(800, 1024) # connect through HTTPs to the imae search, specify english as the language (hl), and open the search pane (sbi). base_url = "https://www.google.com/imghp?hl=en&sbi=1" wait = WebDriverWait(driver, 60) # open the folder specified on the command-line and for every file perform the following actions: # * go to https://images.google.com # * click the camera button # * click the upload button # * send the path to the file in the file-picker # * wait until the page contains an indicator for unique, respectively common images # * if it is a common image (found on other pages), take a screenshot of the results # * if it is not a common image, print out the name of the image indicating that it is unique # * if any error occurs during processing, print an error message and take a screenshot listing = os.listdir(folder) for infile in listing: if not infile.startswith('.'): # random intervals are added to avoid Google's bot sensing - this slows the search # but makes the whole process more apparent of a human being searching for images time.sleep(random.uniform(1, 5)) try: driver.get(base_url) time.sleep(random.uniform(1, 5)) driver.find_element_by_link_text("Upload an image").click() time.sleep(random.uniform(1, 5)) # click the "Choose File" button. driver.find_element_by_id("qbfile").send_keys(os.path.join(folder,infile)) wait.until( lambda d: COMMON_INDICATOR in driver.page_source or UNIQUE_INDICATOR in driver.page_source or VISUAL_INDICATOR in driver.page_source ) if COMMON_INDICATOR in driver.page_source: driver.save_screenshot('COMMON_' + infile + '.png') print 'Image: ' + infile + ' is not unique' else: print 'Image: ' + infile + ' is unique' except Exception, e: driver.save_screenshot('ERROR_' + infile + '.png') print 'Error processing: ' + infile + ' : ' driver.close()
Note that the script needs selenium
and phantomjs
to be installed as indicated above.
Note that automating the Google Image Search is, apparently, a violation of their Terms of Service. Be vigilant.