In an age of LLMs, is it time to reconsider human-edited web directories?

Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.

These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.

Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.

Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.

Lycos, Excite, and of course Yahoo all offered web directories of this sort.

(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)

By the late '90s, the standard narrative goes, the web got too big to index websites manually.

Google promised the world its algorithms would weed out the spam automatically.

And for a time, it worked.

But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?

Do we really want to search every single website on the web?

Or just those that aren’t filled with LLM-generated SEO spam?

Or just those that don’t feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your “free trial” subscription?

At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?

And is it time to begin considering what a modern version of those early web directories might look like?

@degoogle #tech #google #web #internet #LLM #LLMs #enshittification #technology #search #SearchEngines #SEO #SEM

  • Atemu
    link
    fedilink
    81 year ago

    I’d argue that link aggregators like Lemmy (from which I’m posting o/) are the new world version of that. Link aggregators are human-edited web directories; humans post links and other humans vote whether those links are relevant to the “category” (community) they’re in. The main difference is that it’s an open communal effort with implicit trust rather than closed groups of permitted editors.

  • @[email protected]
    link
    fedilink
    251 year ago

    Lycos, Excite, AltaVista, and of course Yahoo all were originally web directories of this sort.

    Both Wikipedia and my own memory disagree with you about Lycos and AltaVista. I’m pretty sure they both started as search engines. Maybe they briefly dabbled in being “portals”.

    • AJ SadauskasOP
      link
      fedilink
      51 year ago

      @bsammon And this Archive.org capture of Lycos.com from 1998 contradicts your memory: https://web.archive.org/web/19980109165410/http://lycos.com/

      See those links under “WEB GUIDES: Pick a guide, then explore the Web!”?

      See the links below that say Autos/Business/Money/Careers/News/Computers/People/Education /Shopping/Entertainment /Space/Sci-Fi/Fashion /Sports/Games/Government/Travel/Health/Kids

      That’s exactly what I’m referring to.

      Here’s the page where you submitted your website to Lycos: https://web.archive.org/web/19980131124504/http://lycos.com/addasite.html

      As far as the early search engines went, some were more sophisticated than others, and they improved over time. Some simply crawled the webpages on the sites in the directory, others

      But yes, Lycos definitely was definitely an example of the type of web directory I described.

      • @[email protected]
        link
        fedilink
        71 year ago

        1998 isn’t “originally” when Lycos started in 1994. That 1998 snapshot would be their “portal” era, I’d imagine.

        And the page where you submitted your website to Lycos – that’s no different than what Google used to have. It just submitted your website to the spider. There’s no indication in that snapshot that suggests that it would get your site added to a curated web-directory.

        Those late 90’s web-portal sites were a pale imitation of the web indices that Yahoo, and later DMoz/ODP were at their peak. I imagine that the Lycos portal, for example, was only managed/edited by a small handful of Lycos employees, and they were moving as fast as they could in the direction of charging websites for being listed in their portal/directory. The portal fad may have died out before they got many companies to pony up for listings.

        I think in the Lycos and AltaVista cases, they were both search engines originally (mid 90s) and than jumped on the “portal” bandwagon in the late 90s with half-assed efforts that don’t deserve to be held up as examples of something we might want to recreate.

        Yahoo and DMoz/ODP are the only two instances I am aware of that had a significant (like, numbered in the thousands) number of websites listed, and a good level of depth.

      • Michelle Hughes
        link
        fedilink
        21 year ago

        @Emperor

        Yeah. Sorry, I was hesitant to post links at first before I vetted them.

        It looks like “Curlie” is the official continuation of the DMOZ project:

        https://curlie.org/

        The other ones I was seeing, it turns out, are static mirrors of 2017 DMOZ.

        • ᴇᴍᴘᴇʀᴏʀ 帝
          link
          fedilink
          English
          11 year ago

          Thanks for that, a real blast from the past. I have a vague memory that I was an editor on the ODP or dmoz back in the day.

          Sorry, I was hesitant to post links at first before I vetted them.

          Yes, perhaps not coincidentally, I thought it best to ask for a human-curated link.

          • Michelle Hughes
            link
            fedilink
            11 year ago

            @Emperor

            Y’know, come to think of it, Wikipedia might be a better project to point to here. All the content on there is hand curated. When I’m interested in a subject, I usually go to wikipedia first instead of a search engine. Sometimes I am directed out to other websites from there.

            I set up a quick keyword search so I can type “wp blah blah blah” into my url bar and it searches wikipedia.

            https://support.mozilla.org/en-US/kb/how-search-from-address-bar?redirectslug=Smart+keywords&redirectlocale=en-US

            • ᴇᴍᴘᴇʀᴏʀ 帝
              link
              fedilink
              English
              11 year ago

              The only issue with Wikipedia (coming from a long, long time user and Administrator) is that freely open and editable wiki needs a critical mass of users to become self-policing.

              One of the projects I’ve been kicking around for a while (and has worked it’s way to the top of my list) is a wiki that integrates with Lemmy (and, potentially, other Fediverse services) which you could definitely use as a form of curated link directory - having an external links sections was definitely one of the uses it could be put to (as well as holding an instances documentation and a community’s FAQs, for example).

  • Brad Enslen
    link
    fedilink
    61 year ago

    @ajsadauskas @degoogle Since I run a small directory this is a fascinating conversation to me.

    There is a place for small human edited directories along with search engines like Wiby and Searchmysite which have human review before websites are entered. Also of note: Marginalia search.

    I don’t see a need for huge directories like the old Yahoo, Looksmart and ODP directories. But directories that serve a niche ignored by Google are useful.

    • ᴇᴍᴘᴇʀᴏʀ 帝
      link
      fedilink
      English
      21 year ago

      But directories that serve a niche ignored by Google are useful.

      This is a good point - as search is increasingly enshittified too (from top down, with corporate interests, and bottom up, from SEO manipulation and dodgy sites) it makes sense for topics or communities often drowned out by the noise.

      I also see you are using webrings - another blast from the past that has it’s uses.

    • Bernard Sheppard
      link
      fedilink
      31 year ago

      @bradenslen @ajsadauskas @degoogle looksmart! There’s a blast from the past.

      As a very early internet user (suburbia.org.au- look it up, and who ran it) and a database guy, what I learnt very early is that any search engine needed users who knew how to write highly selective queries to get highly specific results.

      Google - despite everything - can still be used as a useful tool - if you are a skilled user.

      I am still surprised that you are not taught how to perform critical internet searching in primary school. It is as important as the three Rs

  • René Seindal
    link
    fedilink
    41 year ago

    @ajsadauskas @degoogle DMOZ was once an important part of the internet, but it too suffered from abuse and manipulation for traffic.

    For many DMOZ was the entry point to the web. Whatever you were looking for, you started there.

    Google changed that, first for the better, then for the worse.

  • @[email protected]
    link
    fedilink
    41 year ago

    This is how it’s gonna go. we’ll get human-curated search results, before someone “innovates” by mildly automating the process until someone “innovates” again by using AI to automate it further. Time is a circle