The Journey from search to AI Assistants — Part 1

Part 1: The problems with current search engines.

0xSingularity
6 min readJan 11, 2023
World Internet Penetration. Search is one of the most basic things that the internet is used for.

It is estimated that people collectively spend over 800,000 years of time, every year, browsing for information using search engines.

Google is the most widely used search engine in the world. However, it is not without its flaws. One of its primary criticisms of is that it often provides irrelevant, outdated and cluttered information, which can make it lack nuance and make it difficult for users to find the specific information they are looking for.

Lack of Precision

Imagine going to a library and trying to find a book on a particular topic. Instead of being directed to the relevant section of the library, you are given a list of books that are only tangentially related to your topic, and some of the books are outdated and no longer relevant. This is similar to the experience that many users have with Search Engines, as the search results they receive are often not closely related to their search query, and some of the information is outdated or no longer relevant.

To take another analogy, say you are at a beach, looking for a sample of a rare mineral found in 0.0001% of sand particles. The total sand particles roughly correspond to all the information on the internet. Search engines give you a dump of the few million sand particles that look vaguely relevant to your query — whereas AI search engines could literally find the one grain of sand you want out of quadrillions of bits of data, as they actually “understand” the content at a deep level.

Now, imagine trying to find a particular brand of cereal. Instead of being directed to the correct aisle, you are given a list of every single cereal that the store carries, including those that are not related to your search. Similarly, with search engines, the search results they receive can be cluttered and overwhelming, making it difficult to find the specific information they are looking for,as they have to wade through a large number of results to find what they need.

Skewed Incentives

It is said, “the best things in life are free.” Unfortunately, this is not always the case with Search Engines, as they often prioritise showing ads over returning the most relevant non-ad search results.

Indeed, the ads based model completely skews incentives away from giving simply the most useful information, towards giving information that prompts more searching, preferably in a commercially useful direction.

Now Imagine going to a store and asking the clerk for their best-selling blender. Instead of directing you to the blender aisle, the clerk leads you to a different section of the store where they are trying to sell you a vacuum cleaner. While the vacuum cleaner may be a high-quality product, it is not what you are looking for!

Imagine going to a restaurant and ordering a pizza, only to be served a bowl of soup. While the soup may be delicious, it is not what you were looking for. In the same way, the presence of ads in the search results can create misaligned incentives for traditional search engines. Instead of focusing on providing the most relevant and useful search results to users, the search engines are more concerned with maximising ad revenue.

Google, for instance, uses various factors, including user history and inferred user data, to personalise the search results it returns to users. This means that the search results you see may be different from the results that another user sees for the same search query. It does this in an effort to provide a more personalised and relevant search experience for users. For example, if you have previously searched for “pizza restaurants” and you search for that term again, it might use your search history to infer that you are interested in finding local pizza restaurants. As a result, the search results you see might include a list of nearby pizza restaurants. While personalising the search results in this way can improve the user’s experience, it is important to note that it can also have some potential downsides.

For example, personalization can lead to a “filter bubble” effect, where the search results are tailored to the user’s existing beliefs and preferences, potentially limiting their exposure to new and diverse viewpoints.

User data includes data collected through its various products and services, including search queries, location data, and more. Imagine going to a store and having every item you look at recorded by the store. This data could be used to personalise your shopping experience, but it could also be shared with third parties, such as advertisers. Isn’t this stalker like?

In addition to collecting and sharing user data, Google has also experienced data breaches in the past, which have exposed the personal information of some of its users. Imagine leaving your wallet at a store and having all of your personal information, including your credit cards and identification, exposed to anyone who finds it. This is similar to the impact of a data breach, which can be a serious concern for users.

Censorship is easy with keywords

Leading search engines have also faced criticism in the past for censoring certain types of information, leading some people to question the impartiality and reliability of the search results it returns. One of the primary ways that it censors information is by removing certain websites or web pages from its search results. This can be done for a variety of reasons, including if the website or page violates Google’s webmaster guidelines or if it is illegal or inappropriate.

Imagine going to a library and finding that certain books have been removed from the shelves. While the library might have good reasons for removing these books, such as if they contain inappropriate content or if they violate the library’s rules, it could still raise concerns about censorship and the availability of information. This is similar to the impact of Google censoring certain websites or web pages from its search results.

In addition to removing websites or pages from its search results, Google can also manipulate the ranking of search results to favour certain websites or to demote others. This can lead to a biassed search experience, as the ranking of the search results can be influenced by Google’s own preferences or by those of its advertisers.

Imagine going to a store and finding that certain products are placed in more prominent locations or are given more favourable displays, while others are hidden in the back or are given less favourable treatment. While the store might have good reasons for doing this, such as if the products are more popular or if they are being promoted by the store, it could still raise concerns about fairness and impartiality. This is similar to the impact of Google manipulating the ranking of search results.

What Gives?

Over the last few years, monopoly of leading search engines has resulted in significant technological overhang, when it comes to the difference in performance between commercial search engines, and the quality of answers achievable by state-of-the-art NLP models. The domination of big names has had a stunting effect on growth of competing engines.
Yet a market for premium question answering does exist, with many people that would be willing to pay to discover more intelligent and precise answers to their questions.

It is estimated that people collectively spend over 800,000 years of time, every year, browsing for information using search engines.

By making this core process more efficient and effective, we can enhance the productivity of people across the world by a remarkable amount.

LUCI is an attempt to optimise large language models with extensive fine-tuning on factual QA datasets, with the goal of producing and effective and useful Oracle AI.Building on OpenAI GPT-3, we performed extensive statistically oriented fine-tuning, prompt development and hyper-param optimisation, leading to large gains over the previous best QA accuracy achieved by these models, in terms of precision, specificity, usefulness, and truthfulness.

Continued in Part — II, where we explore indexing methods, past and present.

--

--

0xSingularity
0xSingularity

Written by 0xSingularity

Tracking the journey of LUCI, a general-purpose question answering AI

No responses yet