Over the past couple of months, my colleague Sung and I spent a large part of our creative time analyzing the “Whys” and “Hows” of Google’s local carousel. After we determined some of the factors for optimizing local results and its effects on businesses, we decided to extend our research outside of the local carousel and dive into ranking factors for Google’s Image carousel. Our mission: to expose opportunities for top industry terms in a much less competitive space. But first, we would have to figure out why certain categories were being listed, and why the images within those categories were being shown. This half of our joint post will be divided into 2 sections focusing on the ranking factors for both categories and the individual images within those categories.
Image Categories:
Sung insisted that we begin our research searching for the competitive industry term “wedding dresses” (not true!) Later, more specific terms such as “bridesmaids dresses” and “maid of honor dresses” were used. Using Google Suggest, we wanted to learn more about what users were looking for specifically, hoping that the terms listed there would correspond with what was being shown in the image category carousel.
Okay, so users are searching for wedding dresses by year, by style, by material (is lace a material?), and by designer. No big deal here. Hit ENTER.
As you can see, only some of the terms in gSuggest correspond with the groupings, (lace and sweetheart line). You’ll also notice that there are 6 groupings listed, each with multiple images (from different websites) within the category groups. Why 6? You’re guess is as good as ours. (Later in our research we discovered that Google changed the number from 6 to 5 categories for some terms). So how is Google deciding what groups of images to show? If the group sequence is Big-Sweetheart Neckline-Strapless-Lace Back-Color-Simple today, will it be there tomorrow? What about next week? What’s causing those groups to shift and how can we predict those changes? Here’s a quick look at the movement over the course of 4 days:
7/23: Big - Sweetheart Neckline - Strapless - Lace Back - Color - Simple
7/24: Big - Sweetheart Neckline - Strapless - Lace Back - Color - Simple
7/25: Sweetheart Neckline - Strapless - Lace Back - Color - Red Wedding Dresses - Simple
7/26: Red Wedding Dresses - Sweetheart Neckline - Strapless - Lace Back - Color - Simple
On 7/23 and 7/24, Google Images displayed identical category sequences with “Big Wedding Dresses” in the lead category position. On 7/25 “Red Wedding Dresses” appears in the 5th position, then in the 1st position the following day. Where the heck did “Big Wedding Dresses” go? There must have been a shift in interest or search volume, right?
For the week of July 21-27, “red wedding dresses” had a slight spike in interest, whereas “big wedding dresses” had an uneventful week. This could explain why the RWD category managed to enter the sequence and work its way up to the leading spot. But if you look at search interest over the course of the year, the RWD category has consistently had more search interest than “big wedding dresses”, despite its own downward trend. So how did BWD snag the leading position and remain there for two consecutive days? Clearly, search interest must not be the only factor for a category to rank in the carousel.
The image above shows the day that “Red Wedding Dresses” made it into the category carousel. The following day, 7/26, is when search interest dropped to 56, but again, it somehow managed to snatch the leading group position that same day. (2013 data for “big wedding dresses” not available). Obviously, the trends aren’t captured/recorded in real time, so it’s likely that the category was riding on a spike from the day before. So, does a spike in search interest boost a competitive term into the category carousel? We’d be fools to think it was that simple, which is what prompted us to look into search volume for clues.
For the month of July, “red wedding dresses” had more search volume than “big wedding dresses” at 40,500 and 22,000 respectively. But, if RWD had the most search volume out of both terms, then why was it missing from the 7/23 and 7/24 category sequences? It looks as though the category sequences aren’t indicative of the data in Google trends or the data from search volume.
In order to track the shifts in categories and later, the movement of images within those groups, Sung and I used OutWit to scrape image search queries and its results. In short, OutWit allowed us to extract the data from an image search query URL and organize the data.
[Sung] Outwit is a great tool that we used to scrape the wedding dress categories along with the picture URL. Instead of having to go back everyday and copy and pasting the URL’s manually, we used OutWit’s custom scraper to find where in source code the category name was listed.
Here’s an example of how the data was organized for the image search query: “wedding dresses.”
Below you can see the date of the scrape, the carousel categories of the day, and the top 10 URLs listed within those groupings.
Unfortunately, the tool is unable to retrieve DA, PA, backlinks or any of the other metrics that value a site; it merely pulls the surface data that you’re looking for. We didn’t see much that told us why these categories were ranking in the image carousel, so we figured it depended on the collective strength of the URLs within each group, which leads us to our next section.
Individual Images:
So how is Google determining which images to show in the results of these groups? Contextuality. Ya, I said it. Page context. Image context. Google loves when you make it easy for it to discern what a site, page and image are about. To start, we selected “lace back”, then clicked the leading image shown within this group which directed us to a website called Confetti Daydreams.
Here is what we noticed about the page containing the image: the title tags are clean; it uses keywords that are descriptive and natural; the URL also contains a natural keyword description. “Lace” + “Wedding Dresses” are used throughout the page and tags signaling to Google its relevance to the query and the category in which it belongs.
While this isn’t news to anyone, it’s important to note Google’s affinity for clarity and context, but none of this really tells us why or how an image makes it into the groupings. In order to gain insight on why certain images are shuffled through the top ten and why others maintain their position, we looked at the top four images in the first two categories on a weekly basis and found that more often that not, Google rewards images that meet the following criteria:
- Short and descriptive alt tags: for example, <img src=”http://www.sunglooksgreatinaweddingdress.com/stunninglacebackweddingdress.jpg”alt=”Stunning Lace Back Wedding Dress” A/>
- Descriptive image file name: Using the example above, a descriptive but not overly optimized file name can help Google determine what the picture contains and how to categorize it. <img src= ”http://www.sunglooksgreatinaweddingdress.com/stunninglacebackweddingdress.jpg”alt=Stunning Lace Back Wedding Dress” A/> Nearly all of the images we found within the image groupings were .jpg files. (Note: if you’re trying to optimize an image, the alt tags and file names don’t always have to be the same. What you should try to remember when saving an image to your computer is to keep the file name unique and descriptive. So, save as “stunninglacebackweddingdress.jpg” rather than “weddingdress.jpg”.
- Surrounding Text & Images: Any text and/or context you can attach to an image is always a good idea. Most of the images that we researched had natural keyword captions, and were surrounded by similar images with descriptive alt tags, descriptive file names and a moderate amount of content surrounding the image. Others, had descriptive alt tags and image file names, but had little to no surrounding content to support the image (these usually originated on spammy or foreign sites).
- Inbound links to the image: Obviously, the more trusted and authoritative links pointing to a website (and/or one of its pages), the stronger it becomes. Unfortunately, the rules are applied differently to ranking images. When we compared the backlinks of multiple leading category images to the rest of its neighbors in the group, we found that the images had either a few backlinks or none at all.
- DA: We also checked the domain authority of each image within its category (usually around 7-9 pictures) hoping to find them in decreasing order. No dice there. In fact, it seems that Google takes the time to consider these metrics (DA, PA) for only the first URL, then assigns the positions of the other image URLs at random.
To wrap up, the only pattern we’ve noticed is that there's no pattern at all. Google seems to be an equal opportunity category image ranker, pulling them from ecommerce sites, blogs lacking value, and foreign spam sites. We thought that Google was ranking the leading category image because it had more "instances" or "uses" across the web than its neighbors within the category, but we came up empty handed there. Many of the images meet the contextual aspects (surrounding text, file names, alt tags, etc.), but some of them live on wildly over-optimized pages/websites.
Though we weren’t able to draw any firm conclusions based on our small sample size, (and hopefully this will prompt others to do some investigating of their own), we can hypothesize on the following:
- Google may not only recognize the quality of an image, but also the elements within the image (smiling faces, sharper images, color analysis). It’s likely that Google is using visual recognition software or possibly assigning unique ID codes to specific entities of an image.
- Google may be ranking images based on the frequency of images uploaded to the page and/or website where the image lives.
- Tags within the images’ source code may be used to filter out low-quality photos.