Most people are at least broadly familiar with the concept of search engines. You type the word or phrase for which you need information, and the tool goes and fetches the best results, right? This is not a new concept. But comparatively few people have any idea what’s going on under the hood or how the search engine algorithms actually work.
Colibri Digital Marketing is a top San Francisco digital marketing agency. We’re right next door to Silicon Valley, and we get more than a few questions about how search engines function, and in order to give our answers some context, we always end up explaining the history of search engine optimization (SEO).
If you’re curious about how search engines came to be, read this post explaining the history of SEO. You’ll learn where it all began, and why. You’ll learn the basic principles of SEO, and you’ll get a feel for the 2017 SEO landscape from the perspective of a top San Francisco digital marketing agency. Happy reading!
What Is SEO?
In short, SEO describes the various processes and strategies that allow a particular website to outrank other, similar websites when they’re provided to a user by a search engine in response to a particular query. So, if I search “top San Francisco digital marketing agencies” I’ll get a slew of results (and there’s a reason why Colibri Digital Marketing often appears on the first page). SEO encompasses the total set of things we do to make sure that Colibri is at the top of that list.
So, what exactly is a search engine and what does it aim to do?
Developing a Search Engine
The first description of something like a modern search engine comes from Dr. Vannevar Bush’s article in the Atlantic, July 1945. In this article he envisions a tool to help solve a problem he was facing. Though the mountain of scientific knowledge was constantly growing, the “methods of transmitting and reviewing the results of research [were outdated and] totally inadequate for their purpose.” However, he saw salvation in the works of Leibnitz and Babbage, each of whom had laid the groundwork for “calculating machines” of a sort. If a machine of sufficient complexity could be built, it might be used to store information for later access. Indeed, it could facilitate “the collection of data and observations, the extraction of parallel material from the existing record, and the final insertion of new material into the general body of the common record.” In effect, Dr. Bush envisioned a sort of hybrid search engine and FTP server for the scientific community.
That kind of archive-and-retrieval system would take a number of different forms in the decades to come, but what we today think of as a full search engine got started in the early 90s.
In February of 1993, a group of Stanford students (you’ll find that “a group of Stanford students” is something of a running theme) created the foundation for Excite, one of the first search engines. Their system, Architext, sorted information by keywords extracted from the content. By December of that year, a number of other search engines (JumpStation, RBSE spider, and World Wide Web Worm, among others) were in operation which used “bots” (crawlers, spiders, trawlers, etc.) to evaluate and catalog webpages to improve search results for users.
By mid 1994, Alta Vista, Yahoo, Lycos, and Infoseek were all in play, and the notion of a “search engine” was gradually permeating the public consciousness. Things really took off, however, in 1996, when Stanford students Sergey Brin and Larry Page began building what would develop into the biggest, most recognizable search engine in the world: BackRub.
What? You thought I was going to say “Google,” didn’t you? Well, you’re not entirely wrong. BackRub would evolve into Google (the domain was registered in 1997) with the advent of PageRank.
The pun in the name wasn’t lost on Larry Page, when he and Brin were naming their algorithm. At heart, PageRank was a way of evaluating two different web pages and putting them into an ordered list, automatically. It worked in a very clever way. PageRank considered the link network of the internet as a whole as a ranking factor. Pages were ranked based on the likelihood that they would be found by just random link following, so sites with more linking domains would rank higher than sites that weren’t as thoroughly enmeshed.
SPAM and Misuse
In the very early 2000s, there was an emergent problem. Sites with more links tended to rank better than sites with fewer, and pages with greater densities of keywords would show up for a greater number of possible searches. The natural outcome was a sort of arms race between sites to stuff keywords and farm links more and more lavishly. A whole cottage industry sprang up under the guise of SEO that would link your page from ten thousand spam directories, and cram your page full of invisible keywords trying to outpace every other site vying for the top spot.
Today, those shady tactics fall under the broad category of “black hat SEO’ but back then, they weren’t being restrained or punished. It wasn’t good for Google, it wasn’t good for users, and it wasn’t really any good for the sites themselves.
Something had to change.
In 2003, Florida — an update to Google’s algorithm — started to watch for keyword stuffing, but it wasn’t perfect. The update caused a drastic decline in the overall quality of the results returned for a query. The problem was that the algorithm overreached. See, to a simple algorithm, there’s no clear way to tell the difference between keyword stuffing and especially relevant content based solely on keyword frequency and distribution. The problem was that bad content was so good at masquerading as good content, structurally, that Florida ended up throwing the baby out with the bathwater. The tech wasn’t there, yet, to interpret context.
Machine Learning and Natural Language Processing
That brings us to the concept of machine learning. We’re getting ahead of ourselves, chronologically, but bear with me for a minute. The problem with Florida was that it didn’t have a way to get a grip on the overall content. The math wasn’t there to describe the relationship between keywords (raw semantics) and quality (syntax and meaning).
Put yourself in the mind of a basic algorithm for a second. You’ve been told to cull pages that seem to overuse phrases that would lend themselves to search. Can you tell the difference between these two passages?
“The Shiba Inu is a small, agile dog that copes very well with mountainous terrain. The Shiba Inu was originally bred for hunting. It looks similar to and is often mistaken for other Japanese dog breeds like the Akita Inu or Hokkaido, but the Shiba Inu is a different breed with a distinct blood line, temperament and smaller size than other Japanese dog breeds.”
“Shiba Inu dog is Japanese hunting dog breed Hokkaido is small agile dog breed and good dog for rough terrain like akita husky Hokkaido shiba inu Japanese dog breeds.”
Since you, the simple algorithm, lack any real understanding of the meaning of the words, you have no context for the keywords you see, so you cull pages indiscriminately.
It’s not a great system.
What Google lacked in 2003 was a way to algorithmically quantify meaning and context.
Panda and Penguin
Unleashed in 2011 and 2012, respectively, these algorithms were much closer to the mark when it came to weeding out thin content. Penguin targeted sites which were spamming unnatural links (“link farming” as above) while Panda worked to promote high-quality content and demote thin, spammy, or otherwise unhelpful pages.
What they had in common was the ability to more accurately quantify the context of their subjects.
Penguin, for instance, was able to discern the relevance of a backlink from the semantic content of the page. An article on dogs, say, which linked to a page about the Shiba Inu would be a more valid backlink than a directory listing of unrelated webpages.
From then onward, Google would continue refining their algorithms as their tech evolved, with the ultimate goal of creating a simple algorithm that would parse text with the same kind of understanding as a human user, thereby to present that user with the best possible results.
SEO in 2017
Today, the state of SEO is at once more and less complicated than in decades past. While the number of ranking factors continues to increase and diversify, and the precise weight of each factor fluctuates, the overall experience is much more fluid.
We’re not quite there yet, but we’re as close as we’ve ever come to real natural language processing by the various algorithms and systems which ultimately feed the SERPs. More than ever before, strong, relevant content is being promoted while thin, spammy content is being suppressed.
So, what does this mean for you?
Well, as we’re fond of saying here at Colibri Digital Marketing, content is queen. The best thing you can do for your site, your brand, your business, and your digital presence in general is to just produce really strong content. Of course, pairing your content with the keywords your target audiences are searching for goes a long way, too.
(For more SEO tips, check out our post on off-page SEO.)
So, there you have it. SEO in a nutshell from its inception right through to the present day. If you’ve got any comments or questions, don’t hesitate to hit us up on Facebook or Twitter or reach out with the big friendly button below to sign up for a complimentary digital marketing strategy session, and find out how we can help you grow your digital presence.
Colibri Digital Marketing
We’re the digital marketing agency San Francisco trusts to focus on the triple bottom line of people, planet, and profit. Based in the Bay Area, close to Silicon Valley, we’re the team with the sneak-peek into the future of digital marketing. If you’re ready to work with San Francisco’s first and only full-service B Corp-Certified digital marketing agency, drop us a line to schedule a free digital marketing strategy session!