DawnSearch

The open source distributed web search engine that searches by meaning

What is DawnSearch?

DawnSearch aims to be an alternative to the search engines controlled by big corporations.

DawnSearch is distributed, which means it does not run on a single server, or even on a single continent. Volunteers from all over the world can host their own DawnSearch instance, which will then connect to the others to form a global search engine.
DawnSearch is open, which means that everyone can take the source, and do with it whatever they want. Development is done in an open community, where every voice is valuable and is heard.
DawnSearch does not search for the words you type in directly. An AI model will read your search query and convert it into a list of numbers, which is interpeted as a location in 384-dimensional space. All documents in DawnSearch have also been analyzed and given a location. It then simply looks for the documents which are closest to your query.

Privacy

This DawnSearch instance does not actively collect data on access, and does not store searches. However, some information may be temporarily stored in log files. Due to the way DawnSearch works, a processed form of your seach query is sent to other instances. Do not use DawnSearch to search for any sensitive information.

Does this work as well as Google, Bing, Brave Search etc?

Currently, no. DawnSearch has just 0.1% of the data of one of a big dataset loaded. And this is still only a part of the internet. Over the next coming months the index will expand, and we will have to discover what that does to the quality of the results. As DawnSearch is an experiment, we hope to find a lot of improvments still.

AI and statistics

DawnSearch uses AI and statistical techniques in order to search. This does mean that biases may be present. For example, certain kinds of language use may not be detected as 'English' and would then be excluded from the index. The AI model used, all-MiniLM-L6-v2, may also prefer certain content over others. This is currently unknown. For example, it could decide it likes pages written by a male author more than written by a female. These biases may come from the training data itself, or it may happen because the AI is not human and thinks differently than we do.

Open Source / Free Software

The code for this instance is available on GitHub under an open source / free software license. In short, anyone is free to modify this software, with the important note that if they give other people access, they will also have to share their modifications with them.