Fix Emacs LLM entry - src

commit 87c49a7cd23f563cd55381d34bd1c8c4a1027509
parent 247830430a37871a15fbc53f675e9cc4ef442d42
Author: dwrz <dwrz@dwrz.net>
Date:   Sat, 29 Nov 2025 01:47:16 +0000

Fix Emacs LLM entry

Diffstat:
M cmd/web/site/entry/static/2025-12-01/2025-12-01.html  | 110 ++++++++++++++++++++++++++++++++++++-------------------------------------------

1 file changed, 50 insertions(+), 60 deletions(-)
diff --git a/cmd/web/site/entry/static/2025-12-01/2025-12-01.html b/cmd/web/site/entry/static/2025-12-01/2025-12-01.html
@@ -1,9 +1,12 @@
 <div class="wide64">
   <p>
-    This video shows a <a href="https://en.wikipedia.org/wiki/Large_language_model">large language model</a> (LLM), running on my workstation, using <a href="https://www.gnu.org/software/emacs/">Emacs</a> to determine my location, retrieve weather data, and email me the results:
+    I first used <a href="https://www.gnu.org/software/emacs/">Emacs</a> as a text editor 20 years ago. For over a decade, I have used it daily — for writing and coding, task and finance management, email, as a calculator, and to interact with local and remote hosts. I continue to discover new functionality and techniques, and was suprised to see how this 50-year old program has adapted to the frontier of technology.
+  </p>
+  <p>
+    This video shows a <a href="https://en.wikipedia.org/wiki/Large_language_model">large language model</a> (LLM), running on my workstation, using Emacs to determine my location, retrieve weather data, and email me the results. By "<a href="https://arxiv.org/abs/2201.11903">thinking</a>", the LLM determines how to chain available tools to achieve the desired result.
   </p>
 </div>
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video video-wide" src="/static/media/llm.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -12,10 +15,9 @@
   <p>
     With <a href="https://karthinks.com">karthink</a>'s <a href="https://github.com/karthink/gptel">gptel</a> package and some custom code, Emacs is capable of:
   </p>
-
   <ul>
     <li>Querying models from hosted providers (<a href="https://www.anthropic.com/">Anthropic</a>, <a href="https://openai.com/">OpenAI</a>, <a href="https://openrouter.ai/">OpenRouter</a>), or local models (<a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a>, <a href="https://ollama.com/">ollama</a>).</li>
-    <li>Switching rapidly between models and configurations, with only a few keystrokes.</li>
+    <li>Switching between models and configurations with only a few keystrokes.</li>
     <li>Saving conversations to the local filesystem, and using them as context for other conversations.</li>
     <li>Including files, buffers, and terminals as context for queries.</li>
     <li>Searching the web and reading web pages.</li>
@@ -44,8 +46,7 @@
        gptel-curl--common-args
        '("--disable" "--location" "--silent" "--compressed" "-XPOST" "-D-")
        gptel-default-mode 'org-mode)
- :ensure t)
-  </code></pre>
+ :ensure t)</code></pre>
   <p>
     This is enough to start querying <a href="https://openai.com/api/">OpenAI's API</a> from Emacs.
   </p>
@@ -54,8 +55,7 @@
   </p>
   <pre><code>(gptel-make-anthropic "Anthropic"
  :key (password-store-get "anthropic/api/emacs")
- :stream t)
-  </code></pre>
+ :stream t)</code></pre>
   <p>
     I prefer OpenRouter, to access models across providers:
   </p>
@@ -82,8 +82,7 @@
            qwen/qwen3-vl-235b-a22b-thinking
            qwen/qwen3-coder:exacto
            z-ai/glm-4.6:exacto)
- :stream t)
-  </code></pre>
+ :stream t)</code></pre>
   <p>
     The choice of model depends on the task and its budget. Even where those two parameters are comparable, it is sometimes useful to switch models. One may have a blind spot, where another will have insight.
   </p>
@@ -105,8 +104,7 @@
  :category "time"
  :function (lambda () (format-time-string "%Y-%m-%d %H:%M:%S %Z"))
  :description "Retrieves the current local date, time, and timezone."
- :include t)
-  </code></pre>
+ :include t)</code></pre>
   <p>
     Similarly, if Emacs is <a href="https://www.gnu.org/software/emacs/manual/html_node/emacs/Sending-Mail.html">configured to send mail</a>, the tool definition is straightforward:
   </p>
@@ -132,12 +130,11 @@
           :description "The subject of the email.")
    (:name "body"
           :type string
-          :description "The body of the email text.")))
-  </code></pre>
+          :description "The body of the email text.")))</code></pre>
   <p>
     For more complex functionality, I prefer writing shell scripts, for several reasons:
     <ul>
-      <li>The tool definitions are simpler. For example, my <code>qwen-image</code> script includes a large JSON for the ComfyUI flow. I prefer to leave it outside my Emacs configuration.</li>
+      <li>The tool definitions are simpler. For example, my <code>qwen-image</code> script includes a large <code>JSON</code> object for the ComfyUI flow. I prefer to leave it outside my Emacs configuration.</li>
       <li>Tools are accessible to LLMs that may not be running in the Emacs environment (agents, one-off scripts).</li>
       <li>Fluency. LLMs seem better at writing bash (or Python, or Go) than Emacs Lisp, so it easier to lean on this inherent expertise in developing the tools themselves.</li>
     </ul>
@@ -177,8 +174,7 @@
  :description "Perform a web search using the Brave Search API"
  :args (list '(:name "query"
                      :type string
-                     :description "The search query string")))
-  </code></pre>
+                     :description "The search query string")))</code></pre>
   <p>
     However, there are times I want to inspect the search results. I use this script:
   </p>
@@ -223,10 +219,9 @@ main() {
     fi
 
     perform_search "${*}"
-    }
+}
 
-    main "${@}"
-  </code></pre>
+main "${@}"</code></pre>
   <p>
     Which can be called manually from a shell: <code>brave-search 'quine definition' | jq -C | less</code>.
   </p>
@@ -245,8 +240,7 @@ main() {
  :args
  (list '(:name "query"
                :type string
-               :description "The search query string")))
-  </code></pre>
+               :description "The search query string")))</code></pre>
 </div>
 <div class="wide64">
   <h4>Context</h4>
@@ -268,20 +262,18 @@ main() {
  '((:name "page_name"
           :type string
           :description
-          "The name of the man page to read. Can optionally include a section number, for example: '2 read' or 'cat(1)'.")))
-  </code></pre>
+          "The name of the man page to read. Can optionally include a section number, for example: '2 read' or 'cat(1)'.")))</code></pre>
 
   <p>
     It broke when calling the <a href="https://www.gnu.org/software/units/">GNU units</a> <code>man</code> page, which exceeds 40,000 tokens on my system. This was unfortunate, since some coversions, like temperature, are unintuitive:
   </p>
 
-  <pre><code>units 'tempC(100)' tempF
-  </code></pre>
+  <pre><code>units 'tempC(100)' tempF</code></pre>
   <p>
     With <code>gptel</code>, one fallback is Emacs' built in <code>man</code> functionality. The appropriate region can be selected with <code>-r</code> in the transient menu. In some cases, this is faster than a tool call.
   </p>
 </div>
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-temp.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -310,8 +302,7 @@ main() {
   :description "Fetch and read the contents of a URL"
   :args (list '(:name "url"
                       :type string
-                      :description "The URL to read")))
-  </code></pre>
+                      :description "The URL to read")))</code></pre>
   <p>
     When I have run into this problem, the issue was bloated functional content — JavaScript code CSS. If the content is not dynamically generated, one call fallback to Emacs' web browser, <code><a href="https://www.gnu.org/software/emacs/manual/html_mono/eww.html">eww</a></code>. The buffer or selected regions can be added as context. A more sophisticated tool could help in these cases. Long term, I hope that LLMs will steer the web back towards readability, either by acting as an aggregator and filter, or as evolutionary pressure in favor of static content.
   </p>
@@ -320,7 +311,7 @@ main() {
 <div class="wide64">
   <h4>Security</h4>
   <p>
-    The <code><a href="https://github.com/karthink/gptel/wiki/Tools-collection#run_command">run_command</a></code> tool, also found in the <code>gptel</code> tool collection, enables shell command execution, and requires careful consideration. A compromised model could issue malicious commands, or a poorly prepared command could have unintended consequences. <code>gptel</code>'s <code>:confirm</code> key can be used to inspect and approve tool calls.
+    The <code><a href="https://github.com/karthink/gptel/wiki/Tools-collection#run_command">run_command</a></code> tool, also found in the <code>gptel</code> tool collection, enables shell command execution, and requires care. A compromised model could issue malicious commands, or a poorly prepared command could have unintended consequences. <code>gptel</code>'s <code>:confirm</code> key can be used to inspect and approve tool calls.
   </p>
 
   <pre><code>(gptel-make-tool
@@ -337,15 +328,14 @@ main() {
  :args
  '((:name "command"
           :type string
-          :description "The complete shell command to execute.")))
-  </code></pre>
+          :description "The complete shell command to execute.")))</code></pre>
 
   <p>
     Inspection limits the LLM's ability to operate asynchronously, without human intervention. There are a few solutions to this problem, the easiest being to offer tools with more limited scope.
   </p>
 </div>
 
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-inspect.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -373,8 +363,7 @@ main() {
 - Default to plain paragraphs and simple lists.
 - Minimize styling. Use *bold* or /italic/ only where emphasis is essential. Use ~code~ for technical terms.
 - If citing facts or resources, output references as org-mode links.
-- Use code blocks for calculations or code examples.")
-  </code></pre>
+- Use code blocks for calculations or code examples.")</code></pre>
   <p>
     From the transient menu, this preset can be selected with two keystrokes: <code>@</code> and then <code>a</code>.
   </p>
@@ -390,8 +379,7 @@ main() {
  :description "Qwen Emacs assistant."
  :backend "llama.cpp"
  :model 'qwen3_vl_30b-a3b
- :context '("~/memory.org"))
-  </code></pre>
+ :context '("~/memory.org"))</code></pre>
 
   <p>
     The file can include any information that should always be included as context. One could also grant LLMs the ability to append to <code>memory.org</code>, though I am skeptical that they would do so judiciously.
@@ -425,15 +413,14 @@ cmake --build build --config Release
 
 mv build/bin/llama-server ~/.local/bin/ # Or elsewhere in PATH.
 
-llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0
-  </code></pre>
+llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0</code></pre>
   <p>
     This will build <code>llama.cpp</code> with support for CPU based inference, move <code>llama-server</code> into <code>~/.local/bin/</code>, and then download and run <a href="https://unsloth.ai/">Unsloth</a>'s <code>Q8</code> quantization of the <a href="https://huggingface.co/Qwen/Qwen3-4B">Qwen3 4B</a>. The <code>llama.cpp</code> <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md"> documentation</a> explains how to build for GPUs and other hardware — not much more work than the default build.
   </p>
   <p><code>llama-server</code> offers a web interface, available at port 8080 by default.</p>
 </div>
 
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-ls.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -448,7 +435,7 @@ llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0
     For local use, hardware tends to be the main limiter. One has to fit the model into available memory, and consider the acceptable performance for one's use case. A rough guideline is to use the smallest model or quantization for the required task. Or, from the opposite direction, to look for the largest model that can fit into available memory. The rule of thumb is that a <code>Q8_0</code> quantization uses about as much memory as there are parameters, so an 8 billion parameter model will use about 8 GB of RAM or VRAM. A <code>Q4_0</code> quant would use half that — 4 GB — while at 16-bit, 16 GB.
   </p>
   <p>
-    My workstation, laptop, and mobile (<code>llama.cpp</code> can be used from <code><a href="https://termux.dev/en/">termux</a></code>) all run different classes of weights. On my mobile device, I have about 12GB of RAM, but background utilization is already around 8GB. So, when necessary, I use 4B models at <code>Q8_0</code> or less: Gemma3, Qwen3-VL, and Medgemma. If a laptop has 16GB of RAM with 2GB in use, 8B models might run well enough. The workstation, which has a GPU, can run larger models, faster. There are other tricks one can use — <a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention">flash attention</a>, <a href="https://research.google/blog/looking-back-at-speculative-decoding/">speculative decoding</a>, MoE offloading — to optimize performance across different hardware configurations.
+    My workstation, laptop, and mobile (<code>llama.cpp</code> can be used from <code><a href="https://termux.dev/en/">termux</a></code>) all run different classes of weights. On my mobile device, I have about 12GB of RAM, but background utilization is already around 8GB. So, when necessary, I use 4B models at <code>Q8_0</code> or less: Gemma3, Qwen3-VL, and Medgemma. If a laptop has 16GB of RAM with 2GB in use, 8B models might run well enough. The workstation, which has a GPU, can run larger models, with longer context, faster. There are other tricks one can use — <a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention">flash attention</a>, <a href="https://research.google/blog/looking-back-at-speculative-decoding/">speculative decoding</a>, MoE offloading — to optimize performance across different hardware configurations.
   </p>
 </div>
 
@@ -458,7 +445,7 @@ llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0
     One current limitation of <code>llama.cpp</code> is that unless you load multiple models at once, switching models requires manually starting a new instance of <code>llama-server</code>. To swap models on demand, <code><a href="https://github.com/mostlygeek/llama-swap">llama-swap</a></code> can be used.
   </p>
   <p>
-    <code>llama-swap</code> uses a YAML configuration file, which is <a href="https://github.com/mostlygeek/llama-swap/wiki/Configuration">well documented</a>. I use something like the following:
+    <code>llama-swap</code> uses a <code>YAML</code> configuration file, which is <a href="https://github.com/mostlygeek/llama-swap/wiki/Configuration">well documented</a>. I use something like the following:
   </p>
   <pre><code>logLevel: debug
 
@@ -510,8 +497,7 @@ models:
       --top-k 20
       --top-p 0.95
     ttl: 900
-    name: "qwen3_vl_30b-a3b-thinking"
-  </code></pre>
+    name: "qwen3_vl_30b-a3b-thinking"</code></pre>
 </div>
 <div class="wide64">
   <h3>nginx</h3>
@@ -545,8 +531,7 @@ http {
     error_log /var/log/nginx/error.log warn;
 
     include /etc/nginx/conf.d/*.conf;
-}
-  </code></pre>
+}</code></pre>
 
   <p>Then, for <code>/etc/nginx/conf.d/llama-swap.conf</code>:</p>
 
@@ -574,8 +559,7 @@ server {
 		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 		proxy_set_header X-Forwarded-Proto $scheme;
 	}
-}
-  </code></pre>
+}</code></pre>
 </div>
 <div class="wide64">
   <h3>Emacs Configuration</h3>
@@ -613,8 +597,7 @@ server {
      :mime-types ("image/jpeg"
                   "image/png"
                   "image/gif"
-                  "image/webp"))))
-  </code></pre>
+                  "image/webp"))))</code></pre>
 </div>
 <div class="wide64">
   <h2>Techniques</h2>
@@ -629,7 +612,7 @@ server {
   </p>
 </div>
 
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-qa.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -648,7 +631,7 @@ server {
   </p>
 </div>
 
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-itt.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -660,7 +643,7 @@ server {
     My primary use case is to revisit themes from some of my dreams. Here, a local LLM retrieves a URL, reads its contents, and then generates an image with ComfyUI:
   </p>
 </div>
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-image.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -676,7 +659,7 @@ server {
 <div class="wide64">
   <h3>Research</h3>
   <p>
-    If I know I well need to reference a topc later, I usually start out with an <code><a href="https://orgmode.org/">org-mode</a></code> file. In this case, I tend to use links to construct context, something like this:
+    If I know I will need to reference a topic later, I usually start out with an <code><a href="https://orgmode.org/">org-mode</a></code> file. In this case, I tend to use links to construct context, something like this:
 
     <img class="img-center" src="/static/media/llm-links.png">
   </p>
@@ -685,7 +668,7 @@ server {
 <div class="wide64">
   <h3>Rewrites</h3>
   <p>
-    Although I don't use it very often, <code>gptel</code> comes with rewrite functionality, activated when the transient menu is called on a seleted region. It can be used on both text and code, and the output can be <code>diff</code>ed, iterated on, accepted, or rejected. Additionally, it can serve as a kind of autocomplete, by having a LLM implement the skeleton of a function or code block.
+    Although I don't use it very often, <code>gptel</code> comes with rewrite functionality, activated when the transient menu is called on a seleted region. It can be used on both text and code, and the output can be <code>diff</code>ed, iterated on, accepted, or rejected. Additionally, it can serve as a kind of autocomplete, by having an LLM implement the skeleton of a function or code block.
   </p>
 </div>
 
@@ -696,7 +679,7 @@ server {
   </p>
 </div>
 
-<video autoplay loop muted disablepictureinpicture
+<video autoplay controls loop muted disablepictureinpicture
        class="video" src="/static/media/llm-translate.mp4"
        type="video/mp4">
   Your browser does not support video.
@@ -733,13 +716,20 @@ server {
   </blockquote>
 
   <p>
-    I first used Emacs as a text editor 20 years ago. For over a decade, I have used it daily — for writing and coding, task and finance management, email, as a calculator, and to interact with local and remote hosts. I continue to discover new functionality and techniques, and was suprised to see how this 50-year old program has adapted to the frontier of technology.
+    Despite encountering frustrations with LLM use, it is hard to shake
+    the feeling of experiencing a leap in capability. There is something
+    magical to the technology, especially when run locally — the coil whine of
+    the GPU evoking the spirit of Rodin's
+    <a href="https://en.wikipedia.org/wiki/The_Thinker"><i>Thinker</i></a>.
+    Learning <a href="https://www.3blue1brown.com/topics/neural-networks">how
+    LLMs work</a> has offered<a href="https://arxiv.org/abs/2007.09560">
+    another lens</a> through which to view the world.
   </p>
 
   <p>
-    Unfortunately, for most users, the barrier to entry for Emacs is high. For other frontends, comparable power and flexibility could be unlocked with support for:
+    My hope is that time will distribute and democratize the technology, in terms of hardware (for local use) and software (system integration). For most users, the barrier to entry for Emacs is high. Other frontends could unlock comparable power and flexibility with support for:
     <ul>
-      <li>The ability to modify their own environment and capabilities</li>
+      <li>The ability to assist the user in developing custom tools</li>
       <li>Notebooks featuring executable code blocks</li>
       <li>Links for local and remote content, including other conversations</li>
       <li>Switching models and providers at any point</li>
@@ -750,6 +740,6 @@ server {
   </p>
 
   <p>
-    There are many topics of concern and discussion around LLMs. From my work with them so far, I am more anxious about some than others. Local inference alone reveals how much energy these models can require. On the other hand, the limitations of the technology leave me extremely skeptical of imminent superintelligence. What we have now, limitations included, is useful — and has potential.
+    There are many topics of concern and discussion around LLMs. From my work with them so far, I am more anxious about some than others. Local inference alone reveals how much energy these models can require. On the other hand, the limitations of the technology leave me extremely skeptical of imminent superintelligence. But what we have now, limitations included, is useful — and has potential.
   </p>
 </div>

	src Go monorepo.
	git clone git://code.dwrz.net/src
	Log \| Files \| Refs