Over the weekend I met with the CEO of BLUR Search Technologies . Jaime is also my brother-in-Law, and has sponsored IndieWebCamp NYC in 2018. We mainly gathered for Thanskgiving, the second Thanksgiving, and finally leftovers.
As we all played clean the fridge we snuck away to scope out a possible search engine for the IndieWeb Community. Blur Search Technologies will donate time and technology but we will need some help in implementing some building blocks IndieAuth, Post Type Discovery Algorithm, etc.
We will also check out and see how much of indiemap.org. I think it will be a ton, plus we have data already to play with.
Opt-in with IndieAuth
Yes many of us publish openly, even with liberal licenses that allow for remixing and forking but this does not mean we want the data scraped, parsed, and sorted. The right thing and what you have the right to do are not always the same.
Thus the first feature we would need to have would be an opt-in service using the IndieAuth protocol. Meaning the only website data the search engine would collect would be that which you authorized.
Types of Tables
We first discussed what types of tables and data are available to fill these tables. We did not decide if each top level h* would get a table or we would the h* as the first column.
Again we looked at Grant Richmond's UI, but the h-card directory would get parsed as soon as someone joins the search engine.
A feed reader could then be used to index sites. Using the post type discovery algorithim and existing microformats parsers we can add columns for all the properties used in:
For large blogs with decades and gigs of post we will index the pages overtime in the background. Adding sites quickly gets more expensive even quicker.
Some queries, like those involving people would get hard coded into the search engine. You could ask:
- Where is @x? -Then the search engine would qury the chekin posts for that person and tell you the last known location
- Who is @x? Will present the the h-card of a person. If there is a p-note or p-summary present then a tagline will appear in the results.
- What is @x Mastodon name? Queries the directory and finds the rel-me link
- What (movie, book, podcast) is most popular? This would query the frequency of "p-name" in the h-cite" of any watch, read or listen post (or whatever is the corect answer, much of this is new). These queries could of course be date restricted.
The keyword search would look for exact matches in:
- first p-name after the h-*
- p-category or rel="tag"
These could then be weighted in some form of ranking
- +100 if keyword in the p-name and alo p-category
- +50 if p-name
- +25 if p-category
- +10 for each exact match in the content
We needed to scope out an MVP which this blog post now completes. Next we will start working on testing the different microformats to json parsers to populate tables with dynamic columns to see which can be static columns.
We will start with my blog but need a few other volunteers. Find me in chat if interested.
reminded me of https://
We also need help from people with experience using the IndieWeb building blocks.
Can we add a micropub client so if you are signed into the search engine you can reply and interact with the results?
Can we develop APIs so people could add the search engine natively to their blogs for both local and network searches?
Could a private search enging help protect vunerable blogging communities by controlling not only who can use the search engine but giving uvers full control over what data is parsed?
Overall I think an opt-in search engine, where you can add and subtract your data as easy as every other time you use IndieLogIn will be great for the community. Search technologies combined with existing building blocks the #IndieWeb already created such a search tool would be useful to other consumable feeds in the #fediverse as well.