I sent all one hundred of the Most Influential Images of All Time through Google's Image Recognition Cloud Vision API Tool in an attempt to answer: "How does Artificial Intelligence see the world?" and "How does Artificial Intelligence interpret visual information?"
 

About the Cloud Vision API:

Google's Cloud Vision API is an image scanning and recognition tool that analyzes pictures, breaks them down into thousands of categories and labels, is capable of detecting individual objects and faces within images, and can find and read words contained within images.

In other words, Google's Cloud Vision API uses Computer Vision, Deep Learning, and Machine Learning to scan images and attempts to make sense of them through a number of different means:

 
  • Interpreting Insight From Your Images: Easily detect broad sets of objects in your images, from flowers, animals, or transportation to thousands of other object categories commonly found within images.

  • Leveraging the Power of the Web: Vision API uses the power of Google Image Search to find topical entities like celebrities, logos, or news events.

  • Applying Labels to Images: Detect broad sets of categories within an image, ranging from modes of transportation to animals.

  • Identifying Explicit Content: Detect explicit content like adult content or violent content within an image.

  • Identifying Logos: Detect popular product logos within an image.

  • Identifying Landmarks: Detect popular natural and man-made structures within an image.

  • Reading and Extracting Text from Images: Detect and extract text within an image, with support for a broad range of languages, along with support for automatic language identification.

  • Scanning for Facial Detection: Detect multiple faces within an image, along with the associated key facial attributes like emotional state or wearing headwear.

  • Identifying Image Attributes: Detect general attributes of the image, such as dominant colors and appropriate crop hints.

 

the image analyses:


After running Time's The Most Influential Images of All Time through Google Cloud Vision, a few interesting trends emerged, along with a couple interesting tidbits of analysis:

 
  • The Cloud Vision API is an incredibly smart and interesting tool that gets a lot of things about images right, but also a lot of things wrong.
     
  • The system sometimes lacks awareness of what it is attempting to label or identify, which can lead to incorrect or upsetting analyses.
     
  • Sometimes the image associations can be directly correlated with the picture, but other times the associations can be way off.
     
  • The image labels can be contradictory — Artificial Intelligence is often guessing at what might be there.
     
  • Since the tool is told to search for text in every image, it will occasionally 'create' text in an attempt to identify it.
     
  • The recognition is often lacking emotional intelligence or contextual awareness of an image and its meaning.
     
  • Empathy and Human Understanding are two major attributes that Artificial Intelligence is currently lacking.
     
  • The system is trained to get smarter, so it will be interesting to track changes and confidence levels over time to see how it grows and learns.
     
 

Below are a few excerpts from the analysis that highlight some errors, flaws, or quirks in the system:

In The Oscars Selfie, the Cloud Vision identified Meryl Streep as "Very Likely" to be showing Surprise with 57% Confidence. In contrast, Channing Tatum's "Joy" emotion was identified to be "Very Unlikely" with 58% confidence.
 

Although not pictured in The Situation Room, the tool was able to associate this image with "Osama Bin Laden" using other information from the internet to provide context to the photograph.
 

Starving Child and Vulture was labeled as "Adventure" with 55% confidence, which lacks situational awareness and emotional intelligence.
 

Cloud Vision labeled the Molotov Man photograph as "Unlikely" for containing violence but also registered "Militia" with 59% confidence.
 

The AI does not cross attributes to form a 'best guess' of what the image is. The Coffin Ban photograph is labeled with "Public Transport" with 78% confidence and "Sports Venue" with 70% confidence.
 

In The Hooded Man, this iconic photograph of a Guantanamo Bay prisoner had a child-like label intelligence: the image was labeled as wearing a "Costume" with 63% confidence.
 

The First Cell Phone Picture was this photograph of newborn, Sophie Kahn, and was labeled as having "Possible" violence.
 

Often times it feels like the system is playing a game of Pictionary and is guessing at what might be in the image. Michael Jordan included guesses of "Motocross" and "Parachuting" with 58% confidence.
 

The system was 'forcing' text out of this image, accruing a combined text string of "= 99That real 99 OLy "NOTHI RUTG94 ve 19 DAYS Tw E998 ision (mth HIM. MASTODIAN it a reason to are a source as a contain".
 

Despite the upsetting and graphic nature of the Alan Kurdi photograph, the Cloud Vision tool labeled the photograph with "Vacation" and "Fun", both with a 76% confidence level.
 

Falling Man captured a man who jumped out of the Twin Towers on 9/11, but was labeled as containing a "Bird" with 70% confidence or "Finch" with 83% confidence, and also included a label for "Interior Design" due to the building pattern with 69% confidence.
 

The famous Tank Man photograph registered 96% confidence for containing a Tank. Which raises the question: What would it take to achieve 100% confidence?
 

The Terror of War was labeled with "Winter" at 70% confidence and "Sports" with 64% confidence, which lacks human empathy and emotional intelligence.
 


view all of The Most Influential Images of All Time:



More Experiments: