2025-12-01.html (33126B)
1 <div class="wide64"> 2 <p> 3 I first used <a href="https://www.gnu.org/software/emacs/">Emacs</a> as a text editor 20 years ago. For over a decade, I have used it daily — for writing and coding, task and finance management, email, as a calculator, and to interact with local and remote hosts. I continue to discover new functionality and techniques, and was suprised to see how this 50-year old program has adapted to the frontier of technology. 4 </p> 5 <p> 6 This video shows a <a href="https://en.wikipedia.org/wiki/Large_language_model">large language model</a> (LLM), running on my workstation, using Emacs to determine my location, retrieve weather data, and email me the results. By "<a href="https://arxiv.org/abs/2201.11903">thinking</a>", the LLM determines how to chain available tools to achieve the desired result. 7 </p> 8 </div> 9 <video autoplay controls loop muted disablepictureinpicture 10 class="video video-wide" src="/static/media/llm.mp4" 11 type="video/mp4"> 12 Your browser does not support video. 13 </video> 14 <div class="wide64"> 15 <p> 16 With <a href="https://karthinks.com">karthink</a>'s <a href="https://github.com/karthink/gptel">gptel</a> package and some custom code, Emacs is capable of: 17 </p> 18 <ul> 19 <li>Querying models from hosted providers (<a href="https://www.anthropic.com/">Anthropic</a>, <a href="https://openai.com/">OpenAI</a>, <a href="https://openrouter.ai/">OpenRouter</a>), or local models (<a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a>, <a href="https://ollama.com/">ollama</a>).</li> 20 <li>Switching between models and configurations with only a few keystrokes.</li> 21 <li>Saving conversations to the local filesystem, and using them as context for other conversations.</li> 22 <li>Including files, buffers, and terminals as context for queries.</li> 23 <li>Searching the web and reading web pages.</li> 24 <li>Searching, reading, and sending email.</li> 25 <li>Consulting agendas, projects, and tasks.</li> 26 <li>Executing Emacs Lisp code and shell commands.</li> 27 <li>Generating images via the <a href="https://www.comfy.org/">ComfyUI</a> API.</li> 28 <li>Geolocating the device and checking the current date and time.</li> 29 <li>Reading <a href="https://en.wikipedia.org/wiki/Man_page">man</a> pages.</li> 30 <li>Retrieving the user's name and email.</li> 31 </ul> 32 <p> 33 Because LLMs understand and write <a href="https://en.wikipedia.org/wiki/Emacs_Lisp">Emacs Lisp</a> code, they can extend their own capabilities; the improvements are recursive. Below, I note some of the setup required to enable this functionality. 34 </p> 35 </div> 36 37 <div class="wide64"> 38 <h2>Emacs</h2> 39 <p> 40 With <code><a href="https://www.gnu.org/software/emacs/manual/html_node/use-package/">use-package</a></code>, <a href="https://melpa.org/">MELPA</a>, and <a href="https://www.passwordstore.org/">pass</a> for password management, a minimal configuration for <code>gptel</code> looks like this: 41 </p> 42 <pre><code>(use-package gptel 43 :commands (gptel gtpel-send gptel-send-region gptel-send-buffer) 44 :config 45 (setq gptel-api-key (password-store-get "open-ai/emacs") 46 gptel-curl--common-args 47 '("--disable" "--location" "--silent" "--compressed" "-XPOST" "-D-") 48 gptel-default-mode 'org-mode) 49 :ensure t)</code></pre> 50 <p> 51 This is enough to start querying <a href="https://openai.com/api/">OpenAI's API</a> from Emacs. 52 </p> 53 <p> 54 To use Anthropic's API: 55 </p> 56 <pre><code>(gptel-make-anthropic "Anthropic" 57 :key (password-store-get "anthropic/api/emacs") 58 :stream t)</code></pre> 59 <p> 60 I prefer OpenRouter, to access models across providers: 61 </p> 62 <pre><code>(gptel-make-openai "OpenRouter" 63 :endpoint "/api/v1/chat/completions" 64 :host "openrouter.ai" 65 :key (password-store-get "openrouter.ai/keys/emacs") 66 :models '(anthropic/claude-opus-4.5 67 anthropic/claude-sonnet-4.5 68 anthropic/claude-3.5-sonnet 69 cohere/command-a 70 deepseek/deepseek-r1-0528 71 deepseek/deepseek-v3.1-terminus:exacto 72 google/gemini-3-pro-preview 73 mistralai/devstral-medium 74 mistralai/magistral-medium-2506:thinking 75 moonshotai/kimi-k2-0905:exacto 76 moonshotai/kimi-k2-thinking 77 openai/gpt-5.1 78 openai/gpt-5.1-codex 79 openai/gpt-5-pro 80 perplexity/sonar-deep-research 81 qwen/qwen3-max 82 qwen/qwen3-vl-235b-a22b-thinking 83 qwen/qwen3-coder:exacto 84 z-ai/glm-4.6:exacto) 85 :stream t)</code></pre> 86 <p> 87 The choice of model depends on the task and its budget. Even where those two parameters are comparable, it is sometimes useful to switch models. One may have a blind spot, where another will have insight. 88 </p> 89 <p> 90 With <code>gptel</code>, it is easy to switch models mid-conversation, or use the output from one model as context for another. For example, I've used <a href="https://www.perplexity.ai/">Perplexity's</a> <a href="https://openrouter.ai/perplexity/sonar-deep-research">Sonar Deep Research</a> to create briefings, then used another LLM to summarize findings or answer specific questions, augmented with web search. 91 </p> 92 </div> 93 94 <div class="wide64"> 95 <h3>Tools</h3> 96 <p> 97 Tools augment a model's perception, memory, or capabilities. The <code>gptel-make-tool</code> function allows one to define tools for use by an LLM. 98 </p> 99 <p> 100 When making tools, one can leverage Emacs' existing functionality. For example, the <code>read_url</code> tool uses <code><a href=" https://www.gnu.org/software/emacs/manual/html_node/url/Retrieving-URLs.html">url-retrieve-synchronously</a></code>, while <code>get_user_name</code> and <code>get_user_email</code> read <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/User-Identification.html#index-user_002dfull_002dname">user-full-name</a></code> and <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/User-Identification.html#index-user_002dmail_002daddress">user-mail-address</a></code>. <code>now</code>, used to retrieve the current date and time, uses <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Time-Parsing.html#index-format_002dtime_002dstring">format_time_string</a></code>: 101 </p> 102 <pre><code>(gptel-make-tool 103 :name "now" 104 :category "time" 105 :function (lambda () (format-time-string "%Y-%m-%d %H:%M:%S %Z")) 106 :description "Retrieves the current local date, time, and timezone." 107 :include t)</code></pre> 108 <p> 109 Similarly, if Emacs is <a href="https://www.gnu.org/software/emacs/manual/html_node/emacs/Sending-Mail.html">configured to send mail</a>, the tool definition is straightforward: 110 </p> 111 <pre><code>(gptel-make-tool 112 :name "mail_send" 113 :category "mail" 114 :confirm t 115 :description "Send an email with the user's Emacs mail configuration." 116 :function 117 (lambda (to subject body) 118 (with-temp-buffer 119 (insert "To: " to "\n" 120 "From: " user-mail-address "\n" 121 "Subject: " subject "\n\n" 122 body) 123 (sendmail-send-it))) 124 :args 125 '((:name "to" 126 :type string 127 :description "The recipient's email address.") 128 (:name "subject" 129 :type string 130 :description "The subject of the email.") 131 (:name "body" 132 :type string 133 :description "The body of the email text.")))</code></pre> 134 <p> 135 For more complex functionality, I prefer writing shell scripts, for several reasons: 136 <ul> 137 <li>The tool definitions are simpler. For example, my <code>qwen-image</code> script includes a large <code>JSON</code> object for the ComfyUI flow. I prefer to leave it outside my Emacs configuration.</li> 138 <li>Tools are accessible to LLMs that may not be running in the Emacs environment (agents, one-off scripts).</li> 139 <li>Fluency. LLMs seem better at writing bash (or Python, or Go) than Emacs Lisp, so it easier to lean on this inherent expertise in developing the tools themselves.</li> 140 </ul> 141 </p> 142 <img class="img-center" src="/static/media/drawing-hands.jpg"> 143 <div class="caption"> 144 <p>M.C. Escher, <i>Drawing Hands</i> (1948)</p> 145 </div> 146 </div> 147 148 <div class="wide64"> 149 <h4>Web Search</h4> 150 <p> 151 For example, for web search, I initially used the tool described in the <code>gptel</code> <a href="https://github.com/karthink/gptel/wiki/Tools-collection">wiki</a>: 152 </p> 153 <pre><code>(defvar brave-search-api-key (password-store-get "search.brave.com/api/emacs") 154 "API key for accessing the Brave Search API.") 155 156 (defun brave-search-query (query) 157 "Perform a web search using the Brave Search API with the given QUERY." 158 (let ((url-request-method "GET") 159 (url-request-extra-headers 160 `(("X-Subscription-Token" . ,brave-search-api-key))) 161 (url (format "https://api.search.brave.com/res/v1/web/search?q=%s" 162 (url-encode-url query)))) 163 (with-current-buffer (url-retrieve-synchronously url) 164 (goto-char (point-min)) 165 (when (re-search-forward "^$" nil 'move) 166 (let ((json-object-type 'hash-table)) 167 (json-parse-string 168 (buffer-substring-no-properties (point) (point-max)))))))) 169 170 (gptel-make-tool 171 :name "brave_search" 172 :category "web" 173 :function #'brave-search-query 174 :description "Perform a web search using the Brave Search API" 175 :args (list '(:name "query" 176 :type string 177 :description "The search query string")))</code></pre> 178 <p> 179 However, there are times I want to inspect the search results. I use this script: 180 </p> 181 <pre><code>#!/usr/bin/env bash 182 183 set -euo pipefail 184 185 API_URL="https://api.search.brave.com/res/v1/web/search" 186 187 check_deps() { 188 for cmd in curl jq pass; do 189 command -v "${cmd}" >/dev/null || { 190 echo "missing: ${cmd}" >&2 191 exit 1 192 } 193 done 194 } 195 196 perform_search() { 197 local query="${1}" 198 local res 199 200 res=$(curl -s -G \ 201 -H "X-Subscription-Token: $(pass "search.brave.com/api/emacs")" \ 202 -H "Accept: application/json" \ 203 --data-urlencode "q=${query}" \ 204 "${API_URL}") 205 if echo "${res}" | jq -e . >/dev/null 2>&1; then 206 echo "${res}" 207 else 208 echo "error: failed to retrieve valid JSON res: ${res}" >&2 209 exit 1 210 fi 211 } 212 213 main() { 214 check_deps 215 216 if [ $# -eq 0 ]; then 217 echo "Usage: ${0} <query>" >&2 218 exit 1 219 fi 220 221 perform_search "${*}" 222 } 223 224 main "${@}"</code></pre> 225 <p> 226 Which can be called manually from a shell: <code>brave-search 'quine definition' | jq -C | less</code>. 227 </p> 228 <p> 229 The tool definition condenses to: 230 </p> 231 <pre><code>(gptel-make-tool 232 :name "brave_search" 233 :category "web" 234 :function 235 (lambda (query) 236 (shell-command-to-string 237 (format "brave-search %s" 238 (shell-quote-argument query)))) 239 :description "Perform a web search using the Brave Search API" 240 :args 241 (list '(:name "query" 242 :type string 243 :description "The search query string")))</code></pre> 244 </div> 245 <div class="wide64"> 246 <h4>Context</h4> 247 <p> 248 One limitation that I have run into with tools is context overflow — when retrieved data exceeds an LLM's context window. 249 </p> 250 <p> 251 For example, this tool lets an LLM read <code>man</code> pages, helping it correctly recall command flags: 252 </p> 253 <pre><code>(gptel-make-tool 254 :name "man" 255 :category "documentation" 256 :function 257 (lambda (page_name) 258 (shell-command-to-string 259 (concat "man --pager cat" page_name))) 260 :description "Read a Unix manual page." 261 :args 262 '((:name "page_name" 263 :type string 264 :description 265 "The name of the man page to read. Can optionally include a section number, for example: '2 read' or 'cat(1)'.")))</code></pre> 266 267 <p> 268 It broke when calling the <a href="https://www.gnu.org/software/units/">GNU units</a> <code>man</code> page, which exceeds 40,000 tokens on my system. This was unfortunate, since some coversions, like temperature, are unintuitive: 269 </p> 270 271 <pre><code>units 'tempC(100)' tempF</code></pre> 272 <p> 273 With <code>gptel</code>, one fallback is Emacs' built in <code>man</code> functionality. The appropriate region can be selected with <code>-r</code> in the transient menu. In some cases, this is faster than a tool call. 274 </p> 275 </div> 276 <video autoplay controls loop muted disablepictureinpicture 277 class="video" src="/static/media/llm-temp.mp4" 278 type="video/mp4"> 279 Your browser does not support video. 280 </video> 281 <div class="wide64"> 282 <p> 283 I ran into a similar problem with the <code>read_url</code> tool (also found on <a href="https://github.com/karthink/gptel/wiki/Tools-collection">gptel wiki</a>). It can break if the response is larger than the context window. 284 </p> 285 <pre><code>(gptel-make-tool 286 :name "read_url" 287 :category "web" 288 :function 289 (lambda (url) 290 (with-current-buffer 291 (url-retrieve-synchronously url) 292 (goto-char (point-min)) (forward-paragraph) 293 (let ((dom (libxml-parse-html-region 294 (point) (point-max)))) 295 (run-at-time 0 nil #'kill-buffer 296 (current-buffer)) 297 (with-temp-buffer 298 (shr-insert-document dom) 299 (buffer-substring-no-properties 300 (point-min) 301 (point-max)))))) 302 :description "Fetch and read the contents of a URL" 303 :args (list '(:name "url" 304 :type string 305 :description "The URL to read")))</code></pre> 306 <p> 307 When I have run into this problem, the issue was bloated functional content — JavaScript code CSS. If the content is not dynamically generated, one call fallback to Emacs' web browser, <code><a href="https://www.gnu.org/software/emacs/manual/html_mono/eww.html">eww</a></code>. The buffer or selected regions can be added as context. A more sophisticated tool could help in these cases. Long term, I hope that LLMs will steer the web back towards readability, either by acting as an aggregator and filter, or as evolutionary pressure in favor of static content. 308 </p> 309 </div> 310 311 <div class="wide64"> 312 <h4>Security</h4> 313 <p> 314 The <code><a href="https://github.com/karthink/gptel/wiki/Tools-collection#run_command">run_command</a></code> tool, also found in the <code>gptel</code> tool collection, enables shell command execution, and requires care. A compromised model could issue malicious commands, or a poorly prepared command could have unintended consequences. <code>gptel</code>'s <code>:confirm</code> key can be used to inspect and approve tool calls. 315 </p> 316 317 <pre><code>(gptel-make-tool 318 :name "run_command" 319 :category "command" 320 :confirm t 321 :function 322 (lambda (command) 323 (with-temp-message 324 (format "Executing command: =%s=" command) 325 (shell-command-to-string command))) 326 :description 327 "Execute a shell command; returns the output as a string." 328 :args 329 '((:name "command" 330 :type string 331 :description "The complete shell command to execute.")))</code></pre> 332 333 <p> 334 Inspection limits the LLM's ability to operate asynchronously, without human intervention. There are a few solutions to this problem, the easiest being to offer tools with more limited scope. 335 </p> 336 </div> 337 338 <video autoplay controls loop muted disablepictureinpicture 339 class="video" src="/static/media/llm-inspect.mp4" 340 type="video/mp4"> 341 Your browser does not support video. 342 </video> 343 344 <div class="wide64"> 345 <h3>Presets</h3> 346 <p> 347 With <code>gptel</code>'s transient menu, only a few keystrokes are need to add, edit, or remove context, switch the model one wants to query, change the input and output, or edit the system message. Presets accelerate switching between settings, and are defined with <code>gptel-make-preset</code>. 348 </p> 349 <p> 350 For example, with <a href="https://huggingface.co/openai/gpt-oss-120b">GPT-OSS 120B</a> (one of OpenAI's <a href="https://openai.com/open-models/">open weights</a> models), a system prompt is necessary to minimize the use of tables and excessive text styling. A preset can load the appropriate settings: 351 </p> 352 <pre><code>(gptel-make-preset 'assistant/gpt 353 :description "GPT-OSS general assistant." 354 :backend "llama.cpp" 355 :model 'gpt 356 :include-reasoning nil 357 :system 358 "You are a large language model queried from Emacs. Your conversation with the user occurs in an org-mode buffer. 359 360 - Use org-mode syntax only (no Markdown). 361 - Use tables ONLY for tabular data with few columns and rows. 362 - Avoid extended text in table cells. If cells need paragraphs, use a list instead. 363 - Default to plain paragraphs and simple lists. 364 - Minimize styling. Use *bold* or /italic/ only where emphasis is essential. Use ~code~ for technical terms. 365 - If citing facts or resources, output references as org-mode links. 366 - Use code blocks for calculations or code examples.")</code></pre> 367 <p> 368 From the transient menu, this preset can be selected with two keystrokes: <code>@</code> and then <code>a</code>. Alternatively, the preset can be used in the last prompt, like so: <code>@assistant/gpt When is the solstice this year?</code> 369 </p> 370 </div> 371 372 <div class="wide64"> 373 <h4>Memory</h4> 374 <p> 375 Presets can be used to implement read-only memory for an LLM. This preset uses <a href="https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking">Qwen3 VL 30B-A3B</a> with a <code>memory.org</code> file automatically included in the context: 376 </p> 377 378 <pre><code>(gptel-make-preset 'assistant/qwen 379 :description "Qwen Emacs assistant." 380 :backend "llama.cpp" 381 :model 'qwen3_vl_30b-a3b 382 :context '("~/memory.org"))</code></pre> 383 384 <p> 385 The file can include any information that should always be included as context. One could also grant LLMs the ability to append to <code>memory.org</code>, though I am skeptical that they would do so judiciously. 386 </p> 387 </div> 388 389 <div class="wide64"> 390 <h2>Local LLMs</h2> 391 <p> 392 Running LLMs on one's own devices offers some advantages over third-party providers: 393 <ul> 394 <li>Redundancy: they work offline, even if providers are experiencing an outage.</li> 395 <li>Privacy: queries and data remain on the device.</li> 396 <li>Control: You know exactly which model is running, with what settings, at what quantization.</li> 397 </ul> 398 </p> 399 <p> 400 The main trade-off is intelligence, though for many purposes, the gap is closing fast. Local models excel at summarizing data, language translation, image and PDF extraction, and simple research tasks. I rely on hosted models primarily for complex coding tasks, or when a larger effective context is required. 401 </p> 402 <h3>llama.cpp</h3> 403 <p> 404 <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> makes it easy to run models locally: 405 </p> 406 <pre><code>git clone https://github.com/ggml-org/llama.cpp.git 407 408 cd llama.cpp 409 410 cmake -B build 411 412 cmake --build build --config Release 413 414 mv build/bin/llama-server ~/.local/bin/ # Or elsewhere in PATH. 415 416 llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0</code></pre> 417 <p> 418 This will build <code>llama.cpp</code> with support for CPU based inference, move <code>llama-server</code> into <code>~/.local/bin/</code>, and then download and run <a href="https://unsloth.ai/">Unsloth</a>'s <code>Q8</code> quantization of the <a href="https://huggingface.co/Qwen/Qwen3-4B">Qwen3 4B</a>. The <code>llama.cpp</code> <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md"> documentation</a> explains how to build for GPUs and other hardware — not much more work than the default build. 419 </p> 420 <p><code>llama-server</code> offers a web interface, available at port 8080 by default.</p> 421 </div> 422 423 <video autoplay controls loop muted disablepictureinpicture 424 class="video" src="/static/media/llm-ls.mp4" 425 type="video/mp4"> 426 Your browser does not support video. 427 </video> 428 429 <div class="wide64"> 430 <h3>Weights</h3> 431 <p> 432 Part of the art of using LLMs is selecting an appropriate model. Some factors to consider are available hardware, intended use (task, language), and desired pricing (input and output costs). Some models offer specialized capabilities — <a href="https://ai.google.dev/gemma/docs/core">Gemma3</a> and <a href=""https://github.com/QwenLM/Qwen3-VL">Qwen3-VL</a> offer multimodal input, <a href="https://deepmind.google/models/gemma/medgemma/">Medgemma</a> specializes in medical knowledge, and <a href=https://mistral.ai/">Mistral</a>'s <a href="https://mistral.ai/news/devstral">Devstral</a> focuses on agentic use. 433 </p> 434 <p> 435 For local use, hardware tends to be the main limiter. One has to fit the model into available memory, and consider the acceptable performance for one's use case. A rough guideline is to use the smallest model or quantization for the required task. Or, from the opposite direction, to look for the largest model that can fit into available memory. The rule of thumb is that a <code>Q8_0</code> quantization uses about as much memory as there are parameters, so an 8 billion parameter model will use about 8 GB of RAM or VRAM. A <code>Q4_0</code> quant would use half that — 4 GB — while at 16-bit, 16 GB. 436 </p> 437 <p> 438 My workstation, laptop, and mobile (<code>llama.cpp</code> can be used from <code><a href="https://termux.dev/en/">termux</a></code>) all run different classes of weights. On my mobile device, I have about 12GB of RAM, but background utilization is already around 8GB. So, when necessary, I use 4B models at <code>Q8_0</code> or less: Gemma3, Qwen3-VL, and Medgemma. If a laptop has 16GB of RAM with 2GB in use, 8B models might run well enough. The workstation, which has a GPU, can run larger models, with longer context, faster. There are other tricks one can use — <a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention">flash attention</a>, <a href="https://research.google/blog/looking-back-at-speculative-decoding/">speculative decoding</a>, MoE offloading — to optimize performance across different hardware configurations. 439 </p> 440 </div> 441 442 <div class="wide64"> 443 <h3>llama-swap</h3> 444 <p> 445 One current limitation of <code>llama.cpp</code> is that unless you load multiple models at once, switching models requires manually starting a new instance of <code>llama-server</code>. To swap models on demand, <code><a href="https://github.com/mostlygeek/llama-swap">llama-swap</a></code> can be used. 446 </p> 447 <p> 448 <code>llama-swap</code> uses a <code>YAML</code> configuration file, which is <a href="https://github.com/mostlygeek/llama-swap/wiki/Configuration">well documented</a>. I use something like the following: 449 </p> 450 <pre><code>logLevel: debug 451 452 macros: 453 "models": "/home/llama-swap/models" 454 455 models: 456 gemma3: 457 cmd: | 458 llama-server 459 --ctx-size 0 460 --gpu-layers 888 461 --jinja 462 --min-p 0.0 463 --model ${models}/gemma-3-27b-it-ud-q8_k_xl.gguf 464 --mmproj ${models}/mmproj-gemma3-27b-bf16.gguf 465 --port ${PORT} 466 --repeat-penalty 1.0 467 --temp 1.0 468 --top-k 64 469 --top-p 0.95 470 ttl: 900 471 name: "gemma3_27b" 472 gpt: 473 cmd: | 474 llama-server 475 --chat-template-kwargs '{"reasoning_effort": "high"}' 476 --ctx-size 0 477 --gpu-layers 888 478 --jinja 479 --model ${models}/gpt-oss-120b-f16.gguf 480 --port ${PORT} 481 --temp 1.0 482 --top-k 0 483 --top-p 1.0 484 ttl: 900 485 name: "gpt-oss_120b" 486 qwen3_vl_30b-a3b: 487 cmd: | 488 llama-server 489 --ctx-size 131072 490 --gpu-layers 888 491 --jinja 492 --min-p 0 493 --model ${models}/qwen3-vl-30b-a3b-thinking-ud-q8_k_xl.gguf 494 --mmproj ${models}/mmproj-qwen3-vl-30ba3b-bf16.gguf 495 --port ${PORT} 496 --temp 0.6 497 --top-k 20 498 --top-p 0.95 499 ttl: 900 500 name: "qwen3_vl_30b-a3b-thinking"</code></pre> 501 </div> 502 <div class="wide64"> 503 <h3>nginx</h3> 504 <p> 505 Since my workstation has a GPU and can be accessed on the local network or via <a href="https://www.wireguard.com/">WireGuard</a> from other devices, I use <code><a href="https://nginx.org/">nginx</a></code> as a reverse proxy in front of <code>llama-swap</code>, with certificates generated by <code><a href="https://certbot.eff.org/">certbot</a></code>. For streaming LLM responses, <code>proxy_buffering off;</code> and <code>proxy_cache off;</code> are essential settings. 506 </p> 507 508 <pre><code>user http; 509 worker_processes 1; 510 worker_cpu_affinity auto; 511 512 events { 513 worker_connections 1024; 514 } 515 516 http { 517 charset utf-8; 518 sendfile on; 519 tcp_nopush on; 520 tcp_nodelay on; 521 server_tokens off; 522 types_hash_max_size 4096; 523 client_max_body_size 32M; 524 525 # MIME 526 include mime.types; 527 default_type application/octet-stream; 528 529 # logging 530 access_log /var/log/nginx/access.log; 531 error_log /var/log/nginx/error.log warn; 532 533 include /etc/nginx/conf.d/*.conf; 534 }</code></pre> 535 536 <p>Then, for <code>/etc/nginx/conf.d/llama-swap.conf</code>:</p> 537 538 <pre><code>server { 539 listen 80; 540 server_name llm.dwrz.net; 541 return 301 https://$server_name$request_uri; 542 } 543 544 server { 545 listen 443 ssl; 546 http2 on; 547 server_name llm.dwrz.net; 548 549 ssl_certificate /etc/letsencrypt/live/llm.dwrz.net/fullchain.pem; 550 ssl_certificate_key /etc/letsencrypt/live/llm.dwrz.net/privkey.pem; 551 552 location / { 553 proxy_buffering off; 554 proxy_cache off; 555 proxy_pass http://localhost:11434; 556 proxy_read_timeout 3600s; 557 proxy_set_header Host $host; 558 proxy_set_header X-Real-IP $remote_addr; 559 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 560 proxy_set_header X-Forwarded-Proto $scheme; 561 } 562 }</code></pre> 563 </div> 564 <div class="wide64"> 565 <h3>Emacs Configuration</h3> 566 567 <p> 568 <code>llama-server</code> offers an <a href="https://platform.openai.com/docs/api-reference/introduction">OpenAI API</a> compatible API. <code>gptel</code> can be configured to utilize local models with something like the following: 569 </p> 570 571 <pre><code>(gptel-make-openai "llama.cpp" 572 :stream t 573 :protocol "http" 574 :host "localhost" 575 :models 576 '((gemma3 577 :capabilities (media tool json url) 578 :mime-types ("image/jpeg" 579 "image/png" 580 "image/gif" 581 "image/webp")) 582 gpt 583 (medgemma_27b 584 :capabilities (media tool json url) 585 :mime-types ("image/jpeg" 586 "image/png" 587 "image/gif" 588 "image/webp")) 589 (qwen3_vl_30b-a3b 590 :capabilities (media tool json url) 591 :mime-types ("image/jpeg" 592 "image/png" 593 "image/gif" 594 "image/webp")) 595 (qwen3_vl_32b 596 :capabilities (media tool json url) 597 :mime-types ("image/jpeg" 598 "image/png" 599 "image/gif" 600 "image/webp"))))</code></pre> 601 </div> 602 <div class="wide64"> 603 <h2>Techniques</h2> 604 <p> 605 Having covered the setup and configuration, here are some practical ways I use Emacs with LLMs, demonstrated with examples: 606 </p> 607 </div> 608 <div class="wide64"> 609 <h3>Simple Q&A</h3> 610 <p> 611 With the <code>gptel</code> transient menu, press <code>m</code> to prompt from the minibuffer, and <code>e</code> to output the answer to the echo area, then <code>Enter</code> to input the prompt. 612 </p> 613 </div> 614 615 <video autoplay controls loop muted disablepictureinpicture 616 class="video" src="/static/media/llm-qa.mp4" 617 type="video/mp4"> 618 Your browser does not support video. 619 </video> 620 621 <div class="wide64"> 622 <h3>Brief Conversations</h3> 623 624 <p> 625 For brief multi-turn conversations that require no persistence, <code>gptel</code> can be used in the <code>*scratch*</code> buffer. Context can be added via the transient menu, <code>-b</code>, <code>-f</code>, or <code>-r</code> as necessary. The conversation is not persisted unless the buffer is saved. 626 </p> 627 628 <h3>Image-to-Text</h3> 629 <p> 630 With multimodal LLMs like Gemma3 and Qwen3-VL, one can extract text and tables from images. 631 </p> 632 </div> 633 634 <video autoplay controls loop muted disablepictureinpicture 635 class="video" src="/static/media/llm-itt.mp4" 636 type="video/mp4"> 637 Your browser does not support video. 638 </video> 639 640 <div class="wide64"> 641 <h3>Text-to-Image</h3> 642 <p> 643 My primary use case is to revisit themes from some of my dreams. Here, a local LLM retrieves a URL, reads its contents, and then generates an image with ComfyUI: 644 </p> 645 </div> 646 <video autoplay controls loop muted disablepictureinpicture 647 class="video" src="/static/media/llm-image.mp4" 648 type="video/mp4"> 649 Your browser does not support video. 650 </video> 651 652 <div class="wide64"> 653 <p> 654 The result: 655 <img class="img-center" src="/static/media/comfy-ui-dream.png"> 656 </p> 657 </div> 658 659 <div class="wide64"> 660 <h3>Research</h3> 661 <p> 662 If I know I will need to reference a topic later, I usually start out with an <code><a href="https://orgmode.org/">org-mode</a></code> file. In this case, I tend to use links to construct context, something like this: 663 664 <img class="img-center" src="/static/media/llm-links.png"> 665 </p> 666 </div> 667 668 <div class="wide64"> 669 <h3>Rewrites</h3> 670 <p> 671 Although I don't use it very often, <code>gptel</code> comes with rewrite functionality, activated when the transient menu is called on a seleted region. It can be used on both text and code, and the output can be <code>diff</code>ed, iterated on, accepted, or rejected. Additionally, it can serve as a kind of autocomplete, by having an LLM implement the skeleton of a function or code block. 672 </p> 673 </div> 674 675 <div class="wide64"> 676 <h3>Translation</h3> 677 <p> 678 For small or unimportant text, Google Translate via the command-line with <code><a href="https://github.com/soimort/translate-shell">translate-shell</a></code> works well enough. Otherwise, I find the translation output from local LLMs is typically more sensitive to context. 679 </p> 680 </div> 681 682 <video autoplay controls loop muted disablepictureinpicture 683 class="video" src="/static/media/llm-translate.mp4" 684 type="video/mp4"> 685 Your browser does not support video. 686 </video> 687 688 <div class="wide64"> 689 <h3>Code</h3> 690 <p> 691 My experience using LLMs for code has been mixed. For scripts and small programs, iterating in a single conversation works well. However, with larger codebases, I have not found that LLMs can contribute meaningfully, reliably. This used to be an area of relative strength for hosted models, but I surmise aggressive quantization has begun to reduce their effectiveness. 692 </p> 693 694 <p> 695 So far, I have had limited success with agents. My experience has been that they burn through tokens to understand context, but still manage to miss important nuance. This experience has made me hesitant to add tool support for file operations. I am actively exploring some techniques on this front. 696 </p> 697 698 <p> 699 For now, I have come to distrust the initial output from any model. Instead, I provide context through <code>org-mode</code> links in project-specific files. I have LLM(s) walk through potential changes, which I review and implement by hand. Generally, this approach saves time, but often, I still work faster on my own. 700 </p> 701 </div> 702 703 <div class="wide64"> 704 <h2>Reflections</h2> 705 <blockquote> 706 <p> 707 <i> 708 The question of whether a computer can think is no more interesting than 709 the question of whether a submarine can swim. 710 </i> 711 </p> 712 713 <p> 714 Edsger Dijkstra 715 </p> 716 </blockquote> 717 718 <p> 719 Despite encountering frustrations with LLM use, it is hard to shake 720 the feeling of experiencing a leap in capability. There is something 721 magical to the technology, especially when run locally — the coil whine of 722 the GPU evoking the spirit of Rodin's 723 <a href="https://en.wikipedia.org/wiki/The_Thinker"><i>Thinker</i></a>. 724 Learning <a href="https://www.3blue1brown.com/topics/neural-networks">how 725 LLMs work</a> has offered<a href="https://arxiv.org/abs/2007.09560"> 726 another lens</a> through which to view the world. 727 </p> 728 729 <p> 730 My hope is that time will distribute and democratize the technology, in terms of hardware (for local use) and software (system integration). For most users, the barrier to entry for Emacs is high. Other frontends could unlock comparable power and flexibility with support for: 731 <ul> 732 <li>The ability to assist the user in developing custom tools</li> 733 <li>Notebooks featuring executable code blocks</li> 734 <li>Links for local and remote content, including other conversations</li> 735 <li>Switching models and providers at any point</li> 736 <li>Mail and task integration</li> 737 <li>Offline operation with local models</li> 738 <li>Remote access — Emacs can be accessed via <code><a href="https://www.openssh.org/">SSH</a></code>, <code>gptel</code> files via <code><a href="https://www.gnu.org/software/tramp/">TRAMP</a></code></li> 739 </ul> 740 </p> 741 742 <p> 743 There are many topics of concern and discussion around LLMs. From my work with them so far, I am more anxious about some than others. Local inference alone reveals how much energy these models can require. On the other hand, the limitations of the technology leave me extremely skeptical of imminent superintelligence. But what we have now, limitations included, is useful — and has potential. 744 </p> 745 </div>