src

Go monorepo.
git clone git://code.dwrz.net/src
Log | Files | Refs

commit e1c5f1461b3cd7cd44c771f848c4bc5a37423110
parent f809c1e7e078404680920043f020957bcf32ee37
Author: dwrz <dwrz@dwrz.net>
Date:   Wed, 26 Nov 2025 17:51:11 +0000

Update 2025-11-24 entry

Diffstat:
Mcmd/web/site/entry/static/2025-11-24/2025-11-24.html | 164++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------
1 file changed, 105 insertions(+), 59 deletions(-)

diff --git a/cmd/web/site/entry/static/2025-11-24/2025-11-24.html b/cmd/web/site/entry/static/2025-11-24/2025-11-24.html @@ -1,9 +1,8 @@ - <p> - This video shows a <a href="https://en.wikipedia.org/wiki/Large_language_model">large language model</a> (<code>LLM</code>), running on my workstation, using <a href="https://www.gnu.org/software/emacs/">Emacs</a> to determine my location, retrieve the weather forecast, and email me the results. + This video shows a <a href="https://en.wikipedia.org/wiki/Large_language_model">large language model</a> (LLM), running on my workstation, using <a href="https://www.gnu.org/software/emacs/">Emacs</a> to determine my location, retrieve weather data, and email me the results: </p> -<video autoplay loop muted +<video autoplay loop muted disablepictureinpicture class="video" src="/static/media/llm.mp4" type="video/mp4"> Your browser does not support video. @@ -15,21 +14,21 @@ <ul> <li>Querying models from hosted providers (<a href="https://www.anthropic.com/">Anthropic</a>, <a href="https://openai.com/">OpenAI</a>, <a href="https://openrouter.ai/">OpenRouter</a>), or local models (<a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a>, <a href="https://ollama.com/">ollama</a>, etcetera).</li> - <li>Switching rapidly between models and configurations.</li> - <li>Saving conversations and using them as context for other conversations.</li> + <li>Switching rapidly between models and configurations, with only a few keystrokes.</li> + <li>Saving conversations to the local filesystem, and using them as context for other conversations.</li> <li>Including files, buffers, and terminals as context for queries.</li> <li>Searching the web and reading web pages.</li> <li>Searching, reading, and sending email.</li> <li>Consulting agendas, projects, and tasks.</li> <li>Executing Emacs Lisp code and shell commands.</li> - <li>Generating images via <a href="https://www.comfy.org/">ComfyUI</a>.</li> + <li>Generating images via the <a href="https://www.comfy.org/">ComfyUI</a> API.</li> <li>Geolocating the device and checking the current date and time.</li> <li>Reading <a href="https://en.wikipedia.org/wiki/Man_page">man</a> pages.</li> <li>Retrieving the user's name and email.</li> </ul> <p> - Because LLMs are able to understand and write <a href="https://en.wikipedia.org/wiki/Emacs_Lisp">Emacs Lisp</a> code, so one can use them to further extend their capabilities; the improvements are recursive. Below, I note some of the setup required to enable this functionality. + Because LLMs understand and write <a href="https://en.wikipedia.org/wiki/Emacs_Lisp">Emacs Lisp</a> code, they can help extend their own capabilities; the improvements are recursive. Below, I note some of the setup required to enable this functionality. </p> <h2>Emacs</h2> @@ -62,14 +61,14 @@ </code></pre> <p> - I use OpenRouter, which grants access to models across providers: + I prefer OpenRouter, to access models across providers: </p> <pre><code>(gptel-make-openai "OpenRouter" :endpoint "/api/v1/chat/completions" :host "openrouter.ai" :key (password-store-get "openrouter.ai/keys/emacs") - :models '(anthropic/claude-opus-4.1 + :models '(anthropic/claude-opus-4.5 anthropic/claude-sonnet-4.5 anthropic/claude-3.5-sonnet cohere/command-a @@ -92,11 +91,11 @@ </code></pre> <p> - The choice of model depends on the task and its budget. Even where those two parameters are comparable, it is sometimes useful to switch from one model to another. One may have a blind spot, where another will have insight. + The choice of model depends on the task and its budget. Even where those two parameters are comparable, it is sometimes useful to switch models. One may have a blind spot, where another will have insight. </p> <p> - With <code>gptel</code>, it is easy to switch models at any point in a conversation, or to take the output of one to feed as context for another. For example, I've used <a href="https://www.perplexity.ai/">Perplexity's</a> <a href="https://openrouter.ai/perplexity/sonar-deep-research">Sonar Deep Research</a> to create briefings on a topic, and then used another LLM to summarize findings or answer specific questions, augmented with further web search. + With <code>gptel</code>, it is easy to switch models mid-conversation, or use the output from one model as context for another. For example, I've used <a href="https://www.perplexity.ai/">Perplexity's</a> <a href="https://openrouter.ai/perplexity/sonar-deep-research">Sonar Deep Research</a> to create briefings, then used another LLM to summarize findings or answer specific questions, augmented with web search. </p> <h3>Tools</h3> @@ -106,7 +105,7 @@ </p> <p> - In making a tool, on can rely on the extensive functionality already offered by Emacs. For example, the <code>read_url</code> tool uses <code><a href=" https://www.gnu.org/software/emacs/manual/html_node/url/Retrieving-URLs.html">url-retrieve-synchronously</a></code>, <code>get_user_name</code> and <code>get_user_email</code> read the <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/User-Identification.html#index-user_002dfull_002dname">user-full-name</a></code> and <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/User-Identification.html#index-user_002dmail_002daddress">user-mail-address</a></code> variables. <code>now</code>, used to retrieve the current date and time, uses <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Time-Parsing.html#index-format_002dtime_002dstring">format_time_string</a></code>: + When making tools, one can leverage Emacs' existing functionality. For example, the <code>read_url</code> tool uses <code><a href=" https://www.gnu.org/software/emacs/manual/html_node/url/Retrieving-URLs.html">url-retrieve-synchronously</a></code>, while <code>get_user_name</code> and <code>get_user_email</code> read <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/User-Identification.html#index-user_002dfull_002dname">user-full-name</a></code> and <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/User-Identification.html#index-user_002dmail_002daddress">user-mail-address</a></code>. <code>now</code>, used to retrieve the current date and time, uses <code><a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Time-Parsing.html#index-format_002dtime_002dstring">format_time_string</a></code>: </p> <pre><code>(gptel-make-tool @@ -118,7 +117,7 @@ </code></pre> <p> - Similarly, if Emacs is <a href="https://www.gnu.org/software/emacs/manual/html_node/emacs/Sending-Mail.html">configured to send mail</a>, the tool definition in Emacs Lisp is straightforward: + Similarly, if Emacs is <a href="https://www.gnu.org/software/emacs/manual/html_node/emacs/Sending-Mail.html">configured to send mail</a>, the tool definition is straightforward: </p> <pre><code>(gptel-make-tool @@ -147,19 +146,23 @@ </code></pre> <p> - For more complex functionality, my preference has been to write shell scripts. There are a few advantages to this approach: + For more complex functionality, I prefer writing shell scripts, for several reasons: <ul> - <li>Tools are easier to develop and debug, since they are easily invoked manually.</li> - <li>The tool definitions are simpler. For example, my <code>qwen-image</code> script includes a large JSON for the <code>ComfyUI</code> flow. I prefer to leave it outside my Emacs configuration.</li> + <li>The tool definitions are simpler. For example, my <code>qwen-image</code> script includes a large JSON for the ComfyUI flow. I prefer to leave it outside my Emacs configuration.</li> <li>Tools are accessible to LLMs that may not be running in the Emacs environment (agents, one-off scripts).</li> <li>Fluency. LLMs seem better at writing bash (or Python, or Go) than Emacs Lisp, so it easier to lean on this inherent expertise in developing the tools themselves.</li> </ul> </p> +<img class="img-center" src="/static/media/drawing-hands.jpg"> +<div class="caption"> + <p>M.C. Escher, <i>Drawing Hands</i> (1948)</p> +</div> + <h4>Web Search</h4> <p> - For example, for web search, I initially used the tool described in the <a href="https://github.com/karthink/gptel/wiki/Tools-collection">gptel wiki</a>: + For example, for web search, I initially used the tool described in the <code>gptel</code> <a href="https://github.com/karthink/gptel/wiki/Tools-collection">wiki</a>: </p> <pre><code>(defvar brave-search-api-key (password-store-get "search.brave.com/api/emacs") @@ -189,7 +192,7 @@ </code></pre> <p> - However, there are times I want to inspect the search results, so I refactored to use a script: + However, there are times I want to inspect the search results. I use this script: </p> <pre><code>#!/usr/bin/env bash @@ -233,9 +236,9 @@ main() { fi perform_search "${*}" -} + } -main "${@}" + main "${@}" </code></pre> <p> @@ -264,11 +267,11 @@ main "${@}" <h4>Context</h4> <p> - One limitation that I have occasionally run into with tools is context overflow — when the data retrieved by the tool exceeds what can fit into the LLM's context. + One limitation that I have run into with tools is context overflow — when retrieved data exceeds an LLM's context window. </p> <p> - For example, the <code>man</code> tool makes it possible for an LLM to read <code>man</code> pages. It can help a model correctly recall flags for a command: + For example, this tool lets an LLM read <code>man</code> pages, helping it correctly recall command flags: </p> <pre><code>(gptel-make-tool @@ -287,7 +290,7 @@ main "${@}" </code></pre> <p> - This tool broke when calling the <a href="https://www.gnu.org/software/units/">GNU units</a> <code>man</code> page, which on my system, is currently about 40,000 tokens. This was unfortunate, since converting temperatures with <code>units</code> isn't intuitive: + It broke when calling the <a href="https://www.gnu.org/software/units/">GNU units</a> <code>man</code> page, which exceeds 40,000 tokens on my system. This was unfortunate, since some coversions, like temperature, are unintuitive: </p> <pre><code>units 'tempC(100)' tempF @@ -297,6 +300,12 @@ main "${@}" With <code>gptel</code>, one fallback is Emacs' built in <code>man</code> functionality. The appropriate region can be selected with <code>-r</code> in the transient menu. In some cases, this is faster than a tool call. </p> +<video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-temp.mp4" + type="video/mp4"> + Your browser does not support video. +</video> + <p> I ran into a similar problem with the <code>read_url</code> tool (also found on <a href="https://github.com/karthink/gptel/wiki/Tools-collection">gptel wiki</a>). It can break if the response is larger than the context window. </p> @@ -325,13 +334,13 @@ main "${@}" </code></pre> <p> - When I have run into this problem, the issue was bloated functional content — JavaScript code CSS. If the content is not dynamically generated, one call fallback to Emacs' web browser, <code><a href="https://www.gnu.org/software/emacs/manual/html_mono/eww.html">eww</a></code>. The buffer or selected regions can be added as context. A more sophisticated tool could help in these cases. Otherwise, I hope that LLMs will help steer the web back towards readability, either by acting as an aggregator and filter, or as an evolutionary pressure in favor of static content. + When I have run into this problem, the issue was bloated functional content — JavaScript code CSS. If the content is not dynamically generated, one call fallback to Emacs' web browser, <code><a href="https://www.gnu.org/software/emacs/manual/html_mono/eww.html">eww</a></code>. The buffer or selected regions can be added as context. A more sophisticated tool could help in these cases. Long term, I hope that LLMs will steer the web back towards readability, either by acting as an aggregator and filter, or as evolutionary pressure in favor of static content. </p> <h4>Security</h4> <p> - The <code><a href="https://github.com/karthink/gptel/wiki/Tools-collection#run_command">run_command</a></code> tool, also described in the <code>gptel</code> tool collection, confers the ability to execute shell commands. Its use requires care. A compromised model could use it to issue malicious commands; alternatively, a poorly formatted command could have unintended consequences. <code>gptel</code> offers the <code>:confirm</code> key to enable inspection and approval of a tool call. + The <code><a href="https://github.com/karthink/gptel/wiki/Tools-collection#run_command">run_command</a></code> tool, also found in the <code>gptel</code> tool collection, enables shell command execution, requires careful consideration. A compromised model could issue malicious commands, or a poorly formatted command could have unintended consequences. <code>gptel</code>'s <code>:confirm</code> key can be used to inspect and approve tool calls. </p> <pre><code>(gptel-make-tool @@ -352,17 +361,23 @@ main "${@}" </code></pre> <p> - Inspection limits the ability of the LLM to operate asynchronously, without human intervention. There are a few solutions to this problem, the easiest being to offer tools with more limited scope. + Inspection limits the LLM's ability to operate asynchronously, without human intervention. There are a few solutions to this problem, the easiest being to offer tools with more limited scope. </p> +<video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-inspect.mp4" + type="video/mp4"> + Your browser does not support video. +</video> + <h3>Presets</h3> <p> - <code>gptel</code>'s transient menu makes it fast and easy to manage LLM use. With a few keystrokes, it is possible to add, edit, or remove context, switch the model one wants to query, change the input and output, or edit the system message. This can be further accelerated by defining presets with <code>gptel-make-preset</code>. + With <code>gptel</code>'s transient menu, only a few keystrokes are need to add, edit, or remove context, switch the model one wants to query, change the input and output, or edit the system message. Presets accelerate switching between settings, and are defined with <code>gptel-make-preset</code>. </p> <p> - For example, with <a href="https://huggingface.co/openai/gpt-oss-120b">GPT-OSS 120B</a> (one of OpenAI's <a href="https://openai.com/open-models/">open weights</a> models), I find a system prompt necessary to minimize the use of tables and excessive text styling. A preset loads the appropriate settings: + For example, with <a href="https://huggingface.co/openai/gpt-oss-120b">GPT-OSS 120B</a> (one of OpenAI's <a href="https://openai.com/open-models/">open weights</a> models), a system prompt is necessary to minimize the use of tables and excessive text styling. A preset can load the appropriate settings: </p> <pre><code>(gptel-make-preset 'assistant/gpt @@ -400,7 +415,7 @@ main "${@}" </code></pre> <p> - One could grant LLMs the ability to append to <code>memory.org</code> with a tool, though I am skeptical that they would use it judiciously. + The file can include any information that should always be included as context. One could also grant LLMs the ability to append to <code>memory.org</code>, though I am skeptical that they would do so judiciously. </p> <h2>Local LLMs</h2> @@ -415,7 +430,7 @@ main "${@}" </p> <p> - The main trade-off is intelligence, though for many purposes, the gap is closing fast. I've found local models can summarize or transform data effectively, help with language translation and learning, extract data from images and PDFs, and perform simple research tasks. I rely on hosted models primarily for complex coding tasks, or whenever larger effective context is required. + The main trade-off is intelligence, though for many purposes, the gap is closing fast. Local models excel at summarizing data, language translation, image and PDF extraction, and simple research tasks. I rely on hosted models primarily for complex coding tasks, or when a larger effective context is required. </p> <h3>llama.cpp</h3> @@ -441,18 +456,26 @@ llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0 This will build <code>llama.cpp</code> with support for CPU based inference, move <code>llama-server</code> into <code>~/.local/bin/</code>, and then download and run <a href="https://unsloth.ai/">Unsloth</a>'s <code>Q8</code> quantization of the <a href="https://huggingface.co/Qwen/Qwen3-4B">Qwen3 4B</a>. The <code>llama.cpp</code> <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md"> documentation</a> explains how to build for GPUs and other hardware — not much more work than the default build. </p> +<p><code>llama-server</code> offers a web interface, available at port 8080 by default.</p> + +<video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-ls.mp4" + type="video/mp4"> + Your browser does not support video. +</video> + <h3>Weights</h3> <p> - Part of the art of using LLMs is finding the appropriate model to use. Some factors to consider are available hardware, intended use (task, language), and desired pricing (money paid for input and output). Some models, like <a href="https://ai.google.dev/gemma/docs/core">Gemma3</a> and <a href=""https://github.com/QwenLM/Qwen3-VL">Qwen3-VL</a> offer multimodal support, and can parse images. Others, like Google's <a href="https://deepmind.google/models/gemma/medgemma/">Medgemma</a> specialize in specific subject areas, or like <a href=https://mistral.ai/">Mistral</a>'s <a href="https://mistral.ai/news/devstral">Devstral</a>, agentic use. + Part of the art of using LLMs is selecting an appropriate model. Some factors to consider are available hardware, intended use (task, language), and desired pricing (input and output costs). Some models offer specialized capabilities — <a href="https://ai.google.dev/gemma/docs/core">Gemma3</a> and <a href=""https://github.com/QwenLM/Qwen3-VL">Qwen3-VL</a> offer multimodal input, <a href="https://deepmind.google/models/gemma/medgemma/">Medgemma</a> specializes in medical knowledge, and <a href=https://mistral.ai/">Mistral</a>'s <a href="https://mistral.ai/news/devstral">Devstral</a> focuses on agentic use. </p> <p> - For local use, hardware tends to be the main limiter. One has to fit the model into available memory and consider the acceptable speed for one's use case. A rough guideline is to use the smallest model or quantization for the required task. From the opposite direction, one can look for the largest model that can fit into available memory. The rule of thumb is that a <code>Q8_0</code> uses about as much memory as there are parameters, so an 8 billion parameter model will use about 8 GB of RAM or VRAM. A <code>Q4_0</code> quant would use half that — 4 GB — while at 16-bit, 16 GB. + For local use, hardware tends to be the main limiter. One has to fit the model into available memory, and consider the acceptable performance for one's use case. A rough guideline is to use the smallest model or quantization for the required task. Or, from the opposite direction, to look for the largest model that can fit into available memory. The rule of thumb is that a <code>Q8_0</code> quantization uses about as much memory as there are parameters, so an 8 billion parameter model will use about 8 GB of RAM or VRAM. A <code>Q4_0</code> quant would use half that — 4 GB — while at 16-bit, 16 GB. </p> <p> - My workstation, laptop, and mobile (<code>llama.cpp</code> can be built and run in <code><a href="https://termux.dev/en/">termux</a></code>) all run different classes of weights. On my mobile device, I have about 12GB of RAM, but background utilization is already around 8GB. So, when necessary, I use 4B models at <code>Q8_0</code>: Gemma3, Qwen3-VL, and Medgemma. If a laptop has 16GB of RAM with 2GB in use, 8B models might run well enough. The workstation, which has a GPU, can run larger models, faster. There are other tricks one can use — <a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention">flash attention</a>, <a href="https://research.google/blog/looking-back-at-speculative-decoding/">speculative decoding</a>, MoE offloading — to optimize model performance across different hardware configurations. + My workstation, laptop, and mobile (<code>llama.cpp</code> can be used from <code><a href="https://termux.dev/en/">termux</a></code>) all run different classes of weights. On my mobile device, I have about 12GB of RAM, but background utilization is already around 8GB. So, when necessary, I use 4B models at <code>Q8_0</code> or less: Gemma3, Qwen3-VL, and Medgemma. If a laptop has 16GB of RAM with 2GB in use, 8B models might run well enough. The workstation, which has a GPU, can run larger models, faster. There are other tricks one can use — <a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention">flash attention</a>, <a href="https://research.google/blog/looking-back-at-speculative-decoding/">speculative decoding</a>, MoE offloading — to optimize performance across different hardware configurations. </p> <h3>llama-swap</h3> @@ -462,7 +485,7 @@ llama-server -hf unsloth/Qwen3-4B-GGUF:q8_0 </p> <p> - <code>llama-swap</code> uses a YAML configuration file; which is <a href="https://github.com/mostlygeek/llama-swap/wiki/Configuration">well documented</a>. I use something like the following: + <code>llama-swap</code> uses a YAML configuration file, which is <a href="https://github.com/mostlygeek/llama-swap/wiki/Configuration">well documented</a>. I use something like the following: </p> <pre><code>logLevel: debug @@ -521,7 +544,7 @@ models: <h3>nginx</h3> <p> - Since my workstation has a GPU and can be accessed via <a href="https://www.wireguard.com/">WireGuard</a> from other devices, it is set up to serve models. For HTTPS, I use <code><a href="https://certbot.eff.org/">certbot</a></code> with an <code><a href="https://nginx.org/">nginx</a></code> reverse proxy, running in front of <code>llama-swap</code>. With <code>nginx</code>, some settings are important for streaming responses from LLMs, namely <code>proxy_buffering off;</code> and <code>proxy_cache off;</code>. + Since my workstation has a GPU and can be accessed on the local network or via <a href="https://www.wireguard.com/">WireGuard</a> from other devices, I use <code><a href="https://nginx.org/">nginx</a></code> as a reverse proxy in front of <code>llama-swap</code>, with certificates generated by <code><a href="https://certbot.eff.org/">certbot</a></code>. For streaming LLM responses, <code>proxy_buffering off;</code> and <code>proxy_cache off;</code> are essential settings. </p> <pre><code>user http; @@ -623,24 +646,50 @@ server { <h2>Techniques</h2> <p> - There are a variety of ways I use Emacs with LLMs: + Having covered the setup and configuration, here are some practical ways I use Emacs with LLMs, demonstrated with examples: </p> <h3>Simple Q&A</h3> <p> With the <code>gptel</code> transient menu, press <code>m</code> to prompt from the minibuffer, and <code>e</code> to output the answer to the echo area, then <code>Enter</code> to input the prompt. + + <video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-qa.mp4" + type="video/mp4"> + Your browser does not support video. + </video> </p> <h3>Brief Conversations</h3> <p> - For brief multi-turn conversations that require no persistence, <code>gptel</code> can be used in the <code>*scratch*</code> buffer. I usually pair this with setting any context via the transient menu, <code>-b</code>, <code>-f</code>, or <code>-r</code> as necessary. + For brief multi-turn conversations that require no persistence, <code>gptel</code> can be used in the <code>*scratch*</code> buffer. Context can be added via the transient menu, <code>-b</code>, <code>-f</code>, or <code>-r</code> as necessary. The conversation is not persisted unless the buffer is saved. </p> <h3>Image-to-Text</h3> <p> - With multimodal LLMs like Gemma3 and Qwen3-VL, one can extract text and tables from images. + With multimodal LLMs like Gemma3 and Qwen3-VL, one can extract text and tables from images. + + <video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-itt.mp4" + type="video/mp4"> + Your browser does not support video. + </video> +</p> + +<h3>Text-to-Image</h3> +<p> + Here, a local LLM retrieves a URL, reads its contents, and then + generates an image with ComfyUI. + <video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-image.mp4" + type="video/mp4"> + Your browser does not support video. + </video> + + The result: + <img class="img-center" src="/static/media/comfy-ui-dream.png"> </p> <h3>Research</h3> @@ -650,49 +699,46 @@ server { <h3>Translation</h3> <p> - For small or unimportant text, Google Translate via the command-line with <code><a href="https://github.com/soimort/translate-shell">translate-shell</a></code> works well enough. Otherwise, I find the translation output from local LLMs is typically better, more aware of the context. + For small or unimportant text, Google Translate via the command-line with <code><a href="https://github.com/soimort/translate-shell">translate-shell</a></code> works well enough. Otherwise, I find the translation output from local LLMs is typically more sensitive to context. + + <video autoplay loop muted disablepictureinpicture + class="video" src="/static/media/llm-translate.mp4" + type="video/mp4"> + Your browser does not support video. + </video> </p> <h3>Code</h3> <p> - My experience writing code with LLMs has been mixed. For scripts and small programs, iterating in a single conversation can work well enough. However, at work, with a larger codebase, only a few models have been able to contribute meaningfully. Hosted models have worked well at some points, but lately, the quality has dropped off significantly, I imagine due to excessive quantization. I believe the quantization of the model should be clearly labeled and priced accordingly. Since that is not the case, I have come to distrust the initial output from any model. + My experience using LLMs for code has been mixed. For scripts and small programs, iterating in a single conversation works well. However, with larger codebases, few models contribute meaningfully. While hosted models are typically stronger in this use case, I surmise aggressive quantization has reduced their reliability. I have come to distrust the initial output from any model. </p> <p> - So far, I have also had limited success with agents, which in some cases, also cost more than I would care to spend. I find agents waste too many resources understanding the context, and even then, often fail to capture important nuace. This is one reason I have not yet added tool support for reading or modify files and directories. + So far, I have had limited success with agents — which often burn through tokens to understand context, but still manage to miss important nuance. This experience has made me hesitant to add tool support for file operations. </p> <p> - Instead, I have found the middle ground to be manually providing context in project or task specific files, using <code>org-mode</code> links. I ask the LLMs to walk me through code changes, which I then review and implement by hand. In some cases, the output is good enough that it saves time. In others, it still ends up being faster to implement on my own, and in a few cases, I wish I had never bothered. + Instead, I provide context through <code>org-mode</code> links in project-specific files. I have the LLM walk through potential changes, which I review and implement by hand. Generally, this approach saves time, but often, I still work faster on my own. </p> <h2>Conclusion</h2> <p> - I first started using Emacs as my text editor 20 years ago. For over ten years now, I have used it on a daily basis — for writing and coding, email, managing finances and tasks, as my calculator, and for interacting with both local and remote hosts. I continue as a student of this software, discovering new functionality and techniques. I have been surprised by how well this 50-year old software has adapted to the frontier of technology. Despite flaws and limitations, the core fundamental design has ensured its endurance. + I first used Emacs as a text editor 20 years ago. For over a decade, I have used it daily — for writing and coding, task and finance management, email, as a calculator, and to interact with local and remote hosts. I continue to discover new functionality and techniques, and was suprised to see how well this 50-year old program has adapted to the frontier of technology. Despite flaws and limitations, its endurance reflects its foundational design. </p> <p> - The barrier for entry for Emacs is high. For everyday users, the power and flexibility that <code>gptel</code> offers could be unlocked by offering support for: + The barrier for entry for Emacs is high. For everyday users, comparable power and flexibility could be unlocked with support for: <ul> - <li>Notebooks with support for code and other blocks</li> - <li>Links for local and remote content</li> - <li>Referencing conversations</li> + <li>Notebooks featuring executable code blocks</li> + <li>Links for local and remote content, including other conversations</li> <li>Switching models and providers, including local models</li> - <li>Mail and task maangement integration</li> - <li>Offline operation — Emacs will work with local models even offline.</li> - <li>Remote operation — Emacs can be accessed remotely via SSH or TRAMP.</li> + <li>Mail and task integration</li> + <li>Offline operation with local models.</li> + <li>Remote access — Emacs can be accessed remotely via SSH or TRAMP.</li> </ul> </p> <p> - My work with LLMs so far has given me both concern and optimism. Local inference surfaces the energy requirements, yet daily limitations make me skeptical of imminent superintelligence. In the same way that calculators are better in a domain than humans, LLMs may offer areas of comparative advantage. The key question is which tasks we can delegate to them reliably and efficiently, such that the effort of building scaffolding, maintaining guardrails, and managing operations costs less than doing the work ourselves. -</p> - -<p> - There are many areas of concern and discussion around LLMs. From my work with them so far, I am more anxious about some than others. Local LLMs reveal the computation and energy requirements for inference alone. On the other hand, barring the <a href="https://en.wikipedia.org/wiki/Eichmann_in_Jerusalem">banality of evil</a>, the limitations I see on a daily basis make me skeptical that we are anywhere close to seeing super-intelligent machines. Perhaps in the way that calculators are better than humans, we will see LLMs have areas of comparative strength — and weakness. -</p> - -<p> - My interest is primarily on the potential utility of the technology. It has already proven its ability to understand natural language, take the drudgery out of some work, or serve as a useful second set of eyes. The question for me is the magnitude of the impact. Which of the more advanced tasks will we be able to hand off to LLMs, (a) in a trusted manner, and (b) in such a way that building the required guardrails and scaffolding, and managing their operation, will take less time than doing the task oneself? + So far, my experiments with LLMs has left me with concern and optimism. Local inference reveals the energy requirements, yet daily limitations make me skeptical of imminent superintelligence. In the same way that calculators are better than humans, LLMs may offer areas of comparative advantage. The key question is which tasks we can delegate reliably and efficiently, such that the effort of building scaffolding, maintaining guardrails, and managing operations costs less than doing the work ourselves. </p>