# SearchHub

A local search engine for your browser bookmarks and history. Import bookmarks/history from Firefox, Zen, Chrome, or Chromium, search them with full-text queries, and optionally forward searches to external engines like wikipedia or SearXNG (aggregates results from dozens of backends). Content can be automatically tagged via local ONNX embeddings (opt-in; set `tagging_enabled = true` in config).

## Install

**Binaries** are available at [vit.am/~ololduck/search_hub/latest](https://vit.am/~ololduck/search_hub/latest/). Download the binary for your architecture, extract, and run.

**Source:** Clone the [repository](https://vit.am/~ololduck/search_hub/repository.git) and build with Rust:

**Prerequisites:** Rust (install via [rustup](https://rustup.rs/)).

```sh
git clone https://vit.am/~ololduck/search_hub/repository.git search_hub
cd search_hub
cargo install --path .
```

This installs the `search_hub` binary to `~/.cargo/bin/search_hub`.

To update later, pull the latest code and reinstall.

## First steps

```sh
# Import bookmarks from Firefox (auto-discovers your profile)
search_hub import firefox

# Import from Chrome
search_hub import chrome

# Start the web UI
search_hub serve
```

Open http://127.0.0.1:8080 in your browser. You can now search your bookmarks.

Search queries are also forwarded to external engines: Wikipedia, [crates.io](https://crates.io) via its public JSON API, and optionally [SearXNG](https://searx.space) (which aggregates Google, Bing, DDG, and dozens more) if `[[engines]]` is configured. Works as a custom search provider in Firefox/Zen via the OpenSearch protocol (your browser should auto-discover it at `/opensearch.xml`).

## CLI reference

| Command | What it does |
|---------|-------------|
| `search_hub serve` | Start web UI on port 8080 |
| `search_hub serve --port 3000` | Start on a custom port |
| `search_hub import firefox` | Import bookmarks from Firefox |
| `search_hub import chrome` | Import from Chrome/Chromium |
| `search_hub import zen` | Import from Zen Browser |
| `search_hub search "query"` | Search bookmarks from the terminal |
| `search_hub list` | List all bookmarks |
| `search_hub insert "Title" https://..."` | Add a bookmark (fetches content, auto-tags if enabled) |
| `search_hub remove --id 1` | Delete a bookmark by ID |
| `search_hub retag --all` | Re-run auto-tagging (requires `tagging_enabled = true` in config) |
| `search_hub init-config` | Create a default config file at `~/.config/search_hub/config.toml` |
| `search_hub self-update` | Check abbaye Atom feed and update to the latest release |
| `search_hub self-update --dry-run` | Check for updates without downloading |
| `search_hub self-update --target x86_64-unknown-linux-gnu` | Override the target triple |

All commands use `~/.local/share/search_hub/bookmarks.db` by default. Override with `--db-path` or set `db_path` in the config file.

The first time you use a search or insert command, SearchHub downloads an ONNX embedding model to `$XDG_CACHE_DIR` (defaults to `~/.cache/search_hub`) (about 127 MB).

## Configuration

Run `search_hub init-config` to create `~/.config/search_hub/config.toml` with all available options commented out. Or create it manually:

```toml
# Bookmark database path (default: platform data directory)
# db_path = "/home/you/.local/share/search_hub/bookmarks.db"

# Custom tags override the built-in defaults
# [[tags]]
# name = "my-custom-tag"
# examples = ["example text one", "example text two"]

# Whether auto-tagging is enabled (default: false, requires ONNX model download on first use)
# tagging_enabled = true

# Minimum confidence for auto-tagging (0.0 to 1.0, default: 0.6)
# tagging_threshold = 0.6

# Hosts to skip when fetching content for bookmarking (default: local addresses)
# exclude_urls = ["localhost", "127.0.0.1", "::1"]

# Per-engine configuration (optional)
# Multiple instances supported (e.g., public + private crates.io registries)
[[engines]]
type = "searxng"
instance = "https://search.kael.ink"
# timeout_secs = 10.0  # optional per-engine timeout
# Best: use an existing public instance (see https://searx.space).
# Also possible: run your own with Docker:
#   docker run -d --name searxng -p 8888:8080 searxng/searxng

# Custom crates.io registry (optional)
# [[engines]]
# type = "crates_io"
# url = "https://registry.example.com/api/v1/crates?q={}&per_page=10"
# timeout_secs = 5.0

# Wikipedia search (optional, defaults to English)
# [[engines]]
# type = "wikipedia"
# lang = "fr"
# timeout_secs = 5.0

# MDN Web Docs search (optional, defaults to en-US)
# [[engines]]
# type = "mdn"
# locale = "fr"
# timeout_secs = 5.0

# Generic HTML-scraped engine (use with any search site)
# Provide a URL template with `{}` for the query and a CSS selector
# targeting the result container. Results are extracted from `<a>` links
# inside that container (deduplicated, up to 10, http/https only).
#
# Note: most commercial search engines (Google, Bing, DuckDuckGo, etc.)
# block automated requests. This engine works best with small/niche sites
# that don't enforce bot detection. To find the right selector, view the
# page source or use browser dev tools on the search results page.
# [[engines]]
# type = "generic"
# name = "DuckDuckGo"
# url = "https://html.duckduckgo.com/html/?q={}"
# selector = "div.results"
# timeout_secs = 10.0
# shortcode = "ddg"           # optional: override auto-generated shortcode
# bang_enabled = true          # optional: disable ! redirect but keep @
# bang_url = "..."             # optional: custom redirect URL (keeps shortcode)
```

## Search shortcuts

SearchHub supports two query prefixes that use **shortcodes** — compact aliases
auto-generated from your configured `[[engines]]`.

| Prefix | Example | Behavior |
|--------|---------|----------|
| `!` | `!w Rust` | HTTP 302 redirect to the site's own search results page |
| `@` | `@w Rust` | Show search results from that engine only (bookmarks still shown) |

### Auto-generated shortcodes

Each engine type gets a sensible default shortcode:

| Engine | Shortcode | Bang URL |
|--------|-----------|----------|
| Wikipedia (lang=en) | `w` | `https://en.wikipedia.org/w/index.php?search={}` |
| Wikipedia (lang=fr) | `wfr` | `https://fr.wikipedia.org/w/index.php?search={}` |
| MDN (locale=en-US) | `mdn` | `https://developer.mozilla.org/en-US/search?q={}` |
| MDN (locale=fr) | `mdnfr` | `https://developer.mozilla.org/fr/search?q={}` |
| crates.io | `crates` | `https://crates.io/search?q={}` |
| SearXNG | `sx` | `{instance}/search?q={}` |
| Generic | slugified name | the engine's own URL template |

### Overriding shortcodes per engine

Set `shortcode`, `bang_url`, or `bang_enabled` directly on the engine:

```toml
[[engines]]
type = "wikipedia"
lang = "fr"
shortcode = "wikifr"       # overrides "wfr"
bang_enabled = false       # disable ! redirect (still searchable via @)
bang_url = "https://..."   # custom redirect URL
```

### Custom bangs (standalone shortcuts — no @ support)

```toml
[[bangs]]
trigger = "gh"
url = "https://github.com/search?q={}"
name = "GitHub"

# Suppress an auto-generated shortcut
[[bangs]]
trigger = "crates"
enabled = false
```

### Collisions

If two engines produce the same shortcode, SearchHub panics at startup with
a message naming both engines. Set `shortcode` on one of them to resolve it.

## Run the web server as a systemd user service

Keeps the web UI running in the background, starts automatically on login.

```sh
VERSION=(search_hub --version | cut -d\  -f2)
mkdir -p ~/.config/systemd/user
wget -O ~/.config/systemd/user/search-hub-web.service https://vit.am/~ololduck/search_hub/repository/browse/v$VERSION/contrib/search-hub-web.service
systemctl --user daemon-reload
systemctl --user enable --now search-hub-web.service
```

Check status with `systemctl --user status search-hub-web`. View logs with `journalctl --user -u search-hub-web -f`.

## Auto-import with systemd

```sh
VERSION=(search_hub --version | cut -d\  -f2)
mkdir -p ~/.config/systemd/user
wget -O ~/.config/systemd/user/search-hub-import.service https://vit.am/~ololduck/search_hub/repository/browse/v$VERSION/contrib/search-hub-import.service
wget -O ~/.config/systemd/user/search-hub-import.timer https://vit.am/~ololduck/search_hub/repository/browse/v$VERSION/contrib/search-hub-import.timer
systemctl --user daemon-reload
systemctl --user enable --now search-hub-import.timer
```

This imports bookmarks from Zen Browser daily. Edit the file to import from another browser.

## Auto-update with systemd

```sh
VERSION=(search_hub --version | cut -d\  -f2)
mkdir -p ~/.config/systemd/user
wget -O ~/.config/systemd/user/search-hub-update.service https://vit.am/~ololduck/search_hub/repository/browse/v$VERSION/contrib/search-hub-update.service
wget -O ~/.config/systemd/user/search-hub-update.timer https://vit.am/~ololduck/search_hub/repository/browse/v$VERSION/contrib/search-hub-update.timer
systemctl --user daemon-reload
systemctl --user enable --now search-hub-self-update.timer
```

This checks for new releases weekly and updates the binary automatically.

## Run with Podman / Docker

A container image is available at `oci.vit.am/search-hub:latest`. It serves on port 8080 as the `search_hub` user and expects:

- **Config** mounted at `/home/search_hub/.config/search_hub/config.toml`
- **Database** directory mounted at `/home/search_hub/.local/share/search_hub/`

```sh
# Pull and run
podman run -d --name search-hub \
  -p 8080:8080 \
  -v ~/.config/search_hub:/home/search_hub/.config/search_hub:ro \
  -v ~/.local/share/search_hub:/home/search_hub/.local/share/search_hub \
  oci.vit.am/search-hub:latest serve

# SIGHUP reload (re-reads config)
podman kill -s HUP search-hub

# Build locally from the Containerfile
podman build -t search-hub:latest -f Containerfile .
```

### docker-compose

```sh
docker compose up -d
```

See `docker-compose.yaml` at the project root. A SearXNG service is included as a commented-out example.

### Podman Quadlet (systemd-native)

```sh
mkdir -p ~/.config/containers/systemd
wget -O ~/.config/containers/systemd/search-hub.container https://vit.am/~ololduck/search_hub/repository/browse/main/contrib/search-hub.container
systemctl --user daemon-reload
systemctl --user enable --now search-hub
```

The Quadlet file uses `%h` (your home directory) for volume source paths.

## Resources

- **Downloads:** [vit.am/~ololduck/search_hub/latest](https://vit.am/~ololduck/search_hub/latest/)
- **Repository browser:** [vit.am/~ololduck/search_hub/repository](https://vit.am/~ololduck/search_hub/repository)
- **Git clone:** `git clone https://vit.am/~ololduck/search_hub/repository.git search_hub`
