PDF’s Author Metadata in Google Search Results
Looking into the first document in the sample SERP above deeper by opening the file in Adobe Acrobat and clicking on Document Properties this is what we get:
Does PDF metadata have any significant effect on ranking?
This is probably something we all need to test further to come up with a good theory showing consistent results that is repeatable. And if it is worth the time to test because of its significance to your business.
There are factors we are aware of, and there are also factors we really do not know until tested further and we can only make good intelligent guesses. But then again, it is a guess. Nevertheless, my best guesses based on this observation are:
- Google reads Metadata in PDFs.
- The Author will appear in the SERP description snippets if placed in the Metadata properties.
- Your PDF will get the extra screen real estate in the SERPs that will hopefully be more enticing to click.
But continue reading more into this blog post and it seems Google know more than the metadata.
“Cited by:” Google SERPs Description Snippet
When you are looking for PDF files, Google now displays authors of the PDF file and publication date. Only when it is available.
It may be related to Google Scholar (“cited by…”)
Have you ever seen that before?
I decided to check this further. I checked first in the result of this query, the same example used on GoogleBlogoscoped.
As explained above, the author in the SERPs is most probably pulled from the metadata of this PDF file. Now going to the Cited by link goes to Google Scholar and shows this on the page:
I downloaded the PDF file and started looking for the citation. In the whole document there’s no mention of the former document except in the references page.
I’m quite surprised to see that there isn’t even a link. It was just simply mentioned in the reference pages. Looks like Google is showing some advanced bibliography deciphering. Bibliographies are where citations in books are officially placed. Bibliography writing has various writing format standard rules that may only differ slightly from book to book. If Google can read bibliographies well, then this is like a whole new kind of PDF interconnectivity. This may or may not have any bearing on the rank of a PDF document in the universal search results but I see no reason not to. If you have a PDF of a popular book and this book has been a reference to many people because of the excellent content, that already shows the authority of the author of the book. I see it as justified to give that book some higher authority for bibliography references.
Is the author information really coming from the metadata?
In Google Scholar, the authors are placed above the description snippet.
After downloading that document, and checking the metadata, something is quite different this time. The author and title metadata does not match the author and title on the document and Google seems to know the right authors to place on the SERPs.
What does this suggest? Similar to bibliography formats, the whole research paper is following a format of writing. If Google Scholar is all about scholarly research papers, then all follow general research paper writing formats. Looking at these patterns, Google seems to be able to pull out the necessary essential data: parsing it into the required database fields it needs to find.
If the bibliographies are a new form of link building, is this exploitable?
If there are crazy link builders that build links like… well crazy. Then that would probably not exist here. All Google Scholar results come for research journals that have archive copies on the web. To get into Google Scholar, you really have to come up with some good scholarly quality work and try your best to get into research journals first.
Do I perceive this as essential in real life SEO
Depending on whom you are doing SEO for. A client that can be an authority in science, engineering, technology and similar industries may benefit from this by leveraging their white papers on their technology. Creating highly informative research that will just be popular because of the quality of it’s content may serve as well as link baits work.
Although I do not see myself suggesting to a client to come up with some high quality research paper especially if there is nothing to develop, but I would ask if they already have any currently published research in journals and try to leverage everything else from there.
Just some Twitter trails to more info…
I follow a great deal of people on my Twitter, many from the same industry I play in. I first saw the tweet of GoogleOS that led to his blog post: Google Search Results Show Metadata for Scientific Papers.
And I drilled down deeper that led me to his source, GoogleBlogoscoped where he mentions about PDF files with authors names in SERPs. Intrigued by the results of their observations, I decided to look into this further, knowing there are many scientific, industrial, engineering, B2B companies that may have PDF whitepapers who may benefit from this. Those that have KPIs revolving around getting more downloads of these PDF Documents.
Update: Author data is also shown in MS Word Documents and HTML files
Indonesian blogger Busby SEO Challenge did a test search also for MS Word Documents and HTML files and they have also rendered to show the Author and Cited By in the SERPs description snippet.