{"id":998,"date":"2019-12-05T08:00:00","date_gmt":"2019-12-05T07:00:00","guid":{"rendered":"http:\/\/cmagazine.test\/how-search-engines-work\/"},"modified":"2021-12-23T13:00:24","modified_gmt":"2021-12-23T12:00:24","slug":"how-search-engines-work","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/","title":{"rendered":"How Search Engines Work"},"content":{"rendered":"<p>The <span id=\"urn:enhancement-19dc56a2\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/internet\">Internet<\/span> is pretty big. But when you need a hand in this ocean of useless <span id=\"urn:enhancement-fe02aa54\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data\">data<\/span>, a <span id=\"urn:enhancement-f1a5ecee\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine is your friend.<\/p>\n<p>In this journey through the web, we will understand which algorithms and techniques form a <span id=\"urn:enhancement-519b9542\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine and what we need to create one.<\/p>\n<h2>Different kinds of queries, different kinds of results<\/h2>\n<p>As you may <span id=\"urn:enhancement-ba4124b8\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/knowledge\">know<\/span>, a <span id=\"urn:enhancement-9026edb3\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine is not as simple as it looks. On our end, we just type a couple of words and, in a matter of seconds, we have what we want. But under the surface, a lot is going on.<\/p>\n<p>First, depending on the information we\u2019re looking for, a <span id=\"urn:enhancement-de9dae01\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine uses a different representation.<\/p>\n<p>For instance, if we ask when a certain event will take place, a <span id=\"urn:enhancement-4df65235\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine will just answer our question. After all, that is what we needed, and the <span id=\"urn:enhancement-55fbf023\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine knows it.<\/p>\n<p>\u00a0<\/p>\n<center><br \/><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1147\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image1-1.jpg\" alt=\"\" width=\"823\" height=\"623\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image1-1.jpg 823w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image1-1-300x227.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image1-1-768x581.jpg 768w\" sizes=\"auto, (max-width: 823px) 100vw, 823px\" \/><\/center>\n<p>A similar approach is applied to location searches. The difference is that, typically, asking for a place to eat will result in a list of nearby restaurants. This time the <span id=\"urn:enhancement-3c857084\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine knows that the restaurant we want is in that list but can\u2019t possibly <span id=\"urn:enhancement-9a636456\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/knowledge\">know<\/span> our preferences.<\/p>\n<p>\u00a0<\/p>\n<p>But what about more <span id=\"urn:enhancement-91cadbc7\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/complexity\">complex<\/span> questions? This is where the magic begins.<\/p>\n<h2>Before searching<\/h2>\n<p>If you ever use <span id=\"urn:enhancement-8c9bcb1f\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/google\">Google<\/span>, you probably noticed that, for every question, a massive number of results is shown. But how can a <span id=\"urn:enhancement-e68b7622\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine possibly <span id=\"urn:enhancement-6b185d7d\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/knowledge\">know<\/span> that many pages? The answer is <b>Crawling<\/b>.<\/p>\n<p>Crawling is a fundamental operation for a <span id=\"urn:enhancement-9547a119\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine because it scans the <span id=\"urn:enhancement-b662e3ed\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/internet\">internet<\/span> starting from all known <span id=\"urn:enhancement-492ae488\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/uniform_resource_locator\">URLs<\/span> and, moving concentrically from link to link, creates a sort of map. Every new page that has already been crawled is then added to a <b><span id=\"urn:enhancement-23be64cc\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/bloom_filter\">Bloom Filter<\/span><\/b>, a probabilistic <span id=\"urn:enhancement-c944e3a0\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data\">data<\/span> structure.<\/p>\n<p>The next step is to <span id=\"urn:enhancement-7a2b52a4\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/knowledge\">know<\/span> what the actual content of these pages is. For this purpose, we need to introduce a new operation: <b>Indexing<\/b>.<\/p>\n<p>What happens is that, for every term of interest, the <span id=\"urn:enhancement-ef6e17a3\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine scans <span id=\"urn:enhancement-eb94e6e1\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">documents<\/span> and counts occurrences. This produces a list of pages for every given term, called <b>Posting <span id=\"urn:enhancement-d724edcc\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/html_element\">List<\/span><\/b>. Having this makes it really easy to know which <span id=\"urn:enhancement-82b31574\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">documents<\/span> are more suitable for any given request.<\/p>\n<p>Now that we have a list of accessible pages and we know their content, we need a way to decode a <span id=\"urn:enhancement-74dfc47\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/user_computing\">user<\/span>\u2019s question and to make it easily understood by a <span id=\"urn:enhancement-e127924a\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine. This operation is called <b><span id=\"urn:enhancement-d836dd5b\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/parsing\">Parsing<\/span><\/b>.<\/p>\n<p>First, the <span id=\"urn:enhancement-87da74c4\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine identifies the kind of <span id=\"urn:enhancement-bb540fca\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">document<\/span> that is needed and the language used. Then it tokenises words, removing punctuation and reducing letters to lower-case. Next, it removes stop-words like \u201ca\u201d or \u201cthe\u201d. Finally, the <span id=\"urn:enhancement-9ecbde47\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine looks for synonyms and related words. Now we have a list of basic keywords and parsing is complete.<\/p>\n<p>But this is only the beginning.<\/p>\n<h2>The life-cycle of a Query<\/h2>\n<p>Starting from the moment we type our question and click enter, a <b>query<\/b> is born. However, many steps are required before an acceptable answer is produced. Let\u2019s look at it like a <b>production line<\/b>; using our query as a <span id=\"urn:enhancement-50b5a079\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/conceptual_model\">model<\/span>, the <span id=\"urn:enhancement-53b27a71\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine will take raw results and refine them until they\u2019re ready to use.<\/p>\n<p>To begin, the <span id=\"urn:enhancement-20a1f716\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine &#8211; once it has received and parsed our query &#8211; will start looking it up in posting lists.<\/p>\n<p>This will usually result in billions of positive matches. We obviously need to process this huge amount of <span id=\"urn:enhancement-b274ce17\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data\">data<\/span> to determine which <span id=\"urn:enhancement-8c82450a\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">documents<\/span> are more suitable. For this purpose, <b><span id=\"urn:enhancement-ed48ef46\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/boolean_algebra\">Boolean Logic<\/span><\/b> is applied, letting only <span id=\"urn:enhancement-f1c6072a\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">documents<\/span> that have multiple different terms from query survive.<\/p>\n<p>Now we have a stricter and more relevant list, but we still need to <b>rank and sort<\/b> the results. To achieve this, a <span id=\"urn:enhancement-294f5eed\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine can use different algorithms like <span id=\"urn:enhancement-c2a718b0\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/tf-idf\">TF-IDF<\/span>, a function that evaluates the ratio between the searched term and the total amount of words in a document. In any case, the outcome will be a <b>list of top-k <span id=\"urn:enhancement-d3ad57a3\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">documents<\/span><\/b>.<\/p>\n<p>The final stage is once again a ranking one, just more <span id=\"urn:enhancement-3e5c108c\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/complexity\">complex<\/span>. The top-k <span id=\"urn:enhancement-9dfee30a\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/document\">documents<\/span> list is still too big and not refined enough, so how can we perfect the result?<\/p>\n<p>The answer is <b>Cascade Ranking<\/b>. With the help of different algorithms, the <span id=\"urn:enhancement-6d381058\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine can perform re-ranking and build a <b>multi-level architecture<\/b>, more <span id=\"urn:enhancement-c581bca4\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/complexity\">complex<\/span> every time. In this re-ranking, the page\u2019s popularity is taken into consideration. The simplest way to measure popularity on the <span id=\"urn:enhancement-64a380d9\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/internet\">internet<\/span> is by counting links: the more links lead to a page, the more popular a page is. Moreover, a <span id=\"urn:enhancement-6fd6766\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine can apply different <b>Machine <span id=\"urn:enhancement-948521cf\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/machine_learning\">Learning<\/span><\/b> algorithms such as LambdaMART or RankSVM to additionally improve the result.<\/p>\n<p>After following these steps, we have a structured and truly relevant set of pages that, with a high probability, can satisfy most users.<\/p>\n<h2>Is your Search Engine a good one?<\/h2>\n<p>The first question we need to ask is: what makes a <span id=\"urn:enhancement-8162fff2\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine a \u201cgood\u201d <span id=\"urn:enhancement-ca540588\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine? The most commonly accepted answer is <b>relevance<\/b>. If the results of a <span id=\"urn:enhancement-88cb7d7f\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> are pertinent to the query, the <span id=\"urn:enhancement-dc074597\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine did a good job.<\/p>\n<p>The second question is: who decides if results are relevant? Ideally, the best way to measure pertinence is human judgement, but it can be very slow and expensive. So the solution of choice is typically an automatised <b>Search Quality Evaluation Tool<\/b> like RRE or trec_eval.<\/p>\n<p>\u00a0<\/p>\n<center><br \/><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1149\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image2-1.jpg\" alt=\"\" width=\"1024\" height=\"766\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image2-1.jpg 902w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image2-1-300x225.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/image2-1-768x575.jpg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/center>\n<p>Another thing you must consider when building a <span id=\"urn:enhancement-479c208e\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine is <b>spam<\/b>. The <span id=\"urn:enhancement-ecfeefcc\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/internet\">internet<\/span>, as you may know, is really competitive and, for this reason, a lot of pages are manipulated to achieve high rankings. This practice, known as <b><span id=\"urn:enhancement-b9f433fe\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_optimization\">Search Engine Optimisation<\/span><\/b>, is essential for new sites but is also usually abused by spammers. To contrast this behaviour, it is advisable to use <b><span id=\"urn:enhancement-fbab0a15\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/machine_learning\">machine learning<\/span> classifiers<\/b> and to design more <b>complex ranking<\/b> solutions.<\/p>\n<p>\u00a0<\/p>\n<p>So now that you have an idea of how a <span id=\"urn:enhancement-5a41b8\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/search_engine_technology\">search<\/span> engine works, you can download one of many <b><span id=\"urn:enhancement-764bf868\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/open_source\">Open Source<\/span> <span id=\"urn:enhancement-6dc905dd\" class=\"textannotation disambiguated wl-other\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/web_search_engine\">Search Engines<\/span><\/b> and start experimenting. And if you need inspiration, go take a look at <b>PISA<\/b>, created by Antonio Mallia.<\/p>\n<p><img id=\"hzDownscaled\" style=\"position: absolute; top: -10000px;\" \/><\/p>\n<p><img id=\"hzDownscaled\" style=\"position: absolute; top: -10000px;\" \/><\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>The Internet is pretty big. But when you need a hand in this ocean of useless data, a search engine is your friend. In this journey through the web, we will understand which algorithms and techniques form a search engine and what we need to create one. Different kinds of queries, different kinds of results&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\">Read more<\/a><\/p>\n","protected":false},"author":18,"featured_media":956,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[46],"tags":[],"collections":[],"class_list":{"0":"post-998","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai-ml","8":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>How Search Engines Work - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"Curious about techniques and algorithms behind the hoods of search engines? Check this article to find out how these tools work!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Search Engines Work\" \/>\n<meta property=\"og:description\" content=\"Curious about techniques and algorithms behind the hoods of search engines? Check this article to find out how these tools work!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2019-12-05T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-12-23T12:00:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1080\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Valerio Bernardi\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Valerio Bernardi\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\"},\"author\":{\"name\":\"Valerio Bernardi\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/a194cd70b3fa57d323a992c19691159f\"},\"headline\":\"How Search Engines Work\",\"datePublished\":\"2019-12-05T07:00:00+00:00\",\"dateModified\":\"2021-12-23T12:00:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\"},\"wordCount\":1010,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg\",\"articleSection\":[\"AI\/ML\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\",\"name\":\"How Search Engines Work - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg\",\"datePublished\":\"2019-12-05T07:00:00+00:00\",\"dateModified\":\"2021-12-23T12:00:24+00:00\",\"description\":\"Curious about techniques and algorithms behind the hoods of search engines? Check this article to find out how these tools work!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg\",\"width\":1080,\"height\":675},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI\/ML\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Machine Learning\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/machine-learning\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"How Search Engines Work\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/a194cd70b3fa57d323a992c19691159f\",\"name\":\"Valerio Bernardi\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c6da4dc781eeca8a2e8ba25141d78c8d2e10c832c2e29231ab23be4b155bf538?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c6da4dc781eeca8a2e8ba25141d78c8d2e10c832c2e29231ab23be4b155bf538?s=96&d=mm&r=g\",\"caption\":\"Valerio Bernardi\"},\"description\":\"Valerio Bernardi is a Computer Engineering Student at Roma 3 University (Rome), with a passion for innovation and video games. Tech writer since 2015., his field his automation, ranging between industries and AI.\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/valerio-bernardi\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How Search Engines Work - Codemotion Magazine","description":"Curious about techniques and algorithms behind the hoods of search engines? Check this article to find out how these tools work!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/","og_locale":"en_US","og_type":"article","og_title":"How Search Engines Work","og_description":"Curious about techniques and algorithms behind the hoods of search engines? Check this article to find out how these tools work!","og_url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2019-12-05T07:00:00+00:00","article_modified_time":"2021-12-23T12:00:24+00:00","og_image":[{"width":1080,"height":675,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg","type":"image\/jpeg"}],"author":"Valerio Bernardi","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Valerio Bernardi","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/"},"author":{"name":"Valerio Bernardi","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/a194cd70b3fa57d323a992c19691159f"},"headline":"How Search Engines Work","datePublished":"2019-12-05T07:00:00+00:00","dateModified":"2021-12-23T12:00:24+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/"},"wordCount":1010,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg","articleSection":["AI\/ML"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/","url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/","name":"How Search Engines Work - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg","datePublished":"2019-12-05T07:00:00+00:00","dateModified":"2021-12-23T12:00:24+00:00","description":"Curious about techniques and algorithms behind the hoods of search engines? Check this article to find out how these tools work!","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg","width":1080,"height":675},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/how-search-engines-work\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"AI\/ML","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/"},{"@type":"ListItem","position":3,"name":"Machine Learning","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/machine-learning\/"},{"@type":"ListItem","position":4,"name":"How Search Engines Work"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/a194cd70b3fa57d323a992c19691159f","name":"Valerio Bernardi","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/c6da4dc781eeca8a2e8ba25141d78c8d2e10c832c2e29231ab23be4b155bf538?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c6da4dc781eeca8a2e8ba25141d78c8d2e10c832c2e29231ab23be4b155bf538?s=96&d=mm&r=g","caption":"Valerio Bernardi"},"description":"Valerio Bernardi is a Computer Engineering Student at Roma 3 University (Rome), with a passion for innovation and video games. Tech writer since 2015., his field his automation, ranging between industries and AI.","url":"https:\/\/www.codemotion.com\/magazine\/author\/valerio-bernardi\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-600x400.jpg","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-600x600.jpg","author_info":{"display_name":"Valerio Bernardi","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/valerio-bernardi\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg",1080,675,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-150x150.jpg",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-300x188.jpg",300,188,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-768x480.jpg",768,480,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-1024x640.jpg",1024,640,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg",1080,675,false],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg",1080,675,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper.jpg",100,63,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-180x128.jpg",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-896x504.jpg",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-400x225.jpg",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-600x400.jpg",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/09\/wallpaper-600x600.jpg",600,600,true]},"uagb_author_info":{"display_name":"Valerio Bernardi","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/valerio-bernardi\/"},"uagb_comment_info":1,"uagb_excerpt":"The Internet is pretty big. But when you need a hand in this ocean of useless data, a search engine is your friend. In this journey through the web, we will understand which algorithms and techniques form a search engine and what we need to create one. Different kinds of queries, different kinds of results&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/998","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=998"}],"version-history":[{"count":3,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/998\/revisions"}],"predecessor-version":[{"id":2946,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/998\/revisions\/2946"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/956"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=998"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=998"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=998"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=998"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}