config

Personal configuration.
git clone git://code.dwrz.net/config
Log | Files | Refs

sql-indent.org (21952B)


      1 
      2     sql-indent.el -- syntax based indentation for SQL files for GNU Emacs
      3 
      4 *NOTE* This file is formatted as an Emacs Org file.  If you open it in GNU
      5 Emacs, you will be able to open-close sections of the file, which will make
      6 navigation easier.
      7 
      8 * Overview
      9 
     10 sql-indent.el is a GNU Emacs minor mode which adds support for syntax-based
     11 indentation when editing SQL code: TAB indents the current line based on the
     12 syntax of the SQL code on previous lines.  This works like the indentation for
     13 C and C++ code.
     14 
     15 The package also defines align rules so that the ~align~ function works for
     16 SQL statements, see ~sqlind-align-rules~ for which rules are defined.  This
     17 can be used to align multiple lines around equal signs or "as" statements.
     18 Here is an example of alignment rules:
     19 
     20 #+BEGIN_SRC sql
     21   update my_table
     22      set col1_has_a_long_name = value1,
     23          col2_is_short        = value2,
     24          col3_med             = v2,
     25          c4                   = v5
     26    where cond1 is not null;
     27 
     28   select long_colum as lcol,
     29          scol       as short_column,
     30          mcolu      as mcol,
     31     from my_table;
     32 #+END_SRC
     33 
     34 To use this feature, select the region you want to align and type:
     35 
     36 #+BEGIN_SRC text
     37   M-x align RET
     38 #+END_SRC
     39 
     40 ~sqlind-minor-mode~ together with the align rules can assist in writing tidy
     41 SQL code or formatting existing SQL code.  The indentation rules are
     42 customizable, so you can adapt it to match your own indentation preferences.
     43 
     44 * Installation
     45 
     46 To install this package, open the file ~sql-indent.el~ in Emacs and type
     47 
     48 #+BEGIN_SRC text
     49   M-x package-install-from-buffer RET
     50 #+END_SRC
     51 
     52 The syntax-based indentation of SQL code can be turned ON/OFF at any time by
     53 enabling or disabling ~sqlind-minor-mode~:
     54 
     55 #+BEGIN_SRC text
     56   M-x sqlind-minor-mode RET
     57 #+END_SRC
     58 
     59 To enable syntax-based indentation for every SQL buffer, you can add
     60 ~sqlind-minor-mode~ to ~sql-mode-hook~.  First, bring up the customization
     61 buffer using the command:
     62 
     63 #+BEGIN_SRC text
     64   M-x customize-variable RET sql-mode-hook RET
     65 #+END_SRC
     66     
     67 Than, click on the "INS" button to add a new entry and put "sqlind-minor-mode"
     68 in the text field.
     69 
     70 * Customization
     71 
     72 The sql-indent.el package allows customizing the indentation rules to suit
     73 your personal preferences or site specific coding styles.  To create a set of
     74 customized rules, you will need to have basic familiarity with how indentation
     75 works.  See also the "A Simple Example" section below and the
     76 ~sql-indent-left.el~ file, which demonstrate how to set up custom indentation
     77 rules.  The sections below contain the rest of the details.
     78 
     79 The indentation process happens in two separate phases: first syntactic
     80 information is determined about the line, than the line is indented based on
     81 that syntactic information.  The syntactic parse is not expected to change
     82 often, since it deals with the structure of the SQL code, however, indentation
     83 is a personal preference, and can be easily customized.
     84 
     85 Two variables control the way code is indented: ~sqlind-basic-offset~ and
     86 ~sqlind-indentation-offsets-alist~.  To customize these variables, you need to
     87 create a function that sets custom values and add it to ~sql-mode-hook~.
     88 
     89 ** A Simple Example
     90 
     91 The default indentation rules will align to the right all the keywords in a
     92 SELECT statement, like this:
     93 
     94 #+BEGIN_SRC sql
     95   select c1, c2
     96     from t1
     97    where c3 = 2
     98 #+END_SRC
     99 
    100 If you prefer to have them aligned to the left, like this:
    101 
    102 #+BEGIN_SRC sql
    103   select c1, c2
    104   from t1
    105   where c3 = 2
    106 #+END_SRC
    107 
    108 You can add the following code to your init file:
    109 
    110 #+BEGIN_SRC emacs-lisp
    111   (require 'sql-indent)
    112 
    113   ;; Update indentation rules, select, insert, delete and update keywords
    114   ;; are aligned with the clause start
    115 
    116   (defvar my-sql-indentation-offsets-alist
    117     `((select-clause 0)
    118       (insert-clause 0)
    119       (delete-clause 0)
    120       (update-clause 0)
    121       ,@sqlind-default-indentation-offsets-alist))
    122 
    123   (add-hook 'sqlind-minor-mode-hook
    124       (lambda ()
    125          (setq sqlind-indentation-offsets-alist
    126                my-sql-indentation-offsets-alist)))
    127 #+END_SRC
    128 
    129 ** Customization Basics
    130 
    131 To customize indentation, you will need to provide a new value for the
    132 ~sqlind-indentation-offsets-alist~ variable.  The variable is made buffer
    133 local each time it is set, so you need to set it inside the ~sql-mode-hook~.
    134 The variable specifies how each syntactic symbol should be indented.  Since
    135 only a few symbols need to be updated, the usual way to update it is to
    136 "extend" the value of ~sqlind-default-indentation-offsets-alist~, like so:
    137 
    138 #+BEGIN_SRC emacs-lisp
    139   (defvar my-sql-indentation-offsets-alist
    140     `( ;; put new syntactic symbols here, and add the default ones at the end.
    141        ;; If there is no value specified for a syntactic symbol, the default
    142        ;; will be picked up.
    143       ,@sqlind-default-indentation-offsets-alist))
    144 
    145   ;; Arrange for the new indentation offset to be set up for each SQL buffer.
    146   (add-hook 'sqlind-minor-mode-hook
    147             (lambda ()
    148               (setq sqlind-indentation-offsets-alist
    149                     my-sql-indentation-offsets-alist)))
    150 #+END_SRC
    151 
    152 The simplest way to adjust the indentation is to explore the syntactic
    153 information using ~sqlind-show-syntax-of-line~.  To use it, move the cursor to
    154 the line you would like to indent and type:
    155 
    156 #+BEGIN_SRC text
    157 M-x sqlind-show-syntax-of-line RET
    158 #+END_SRC
    159 
    160 A message like the one below will be shown in the messages buffer:
    161 
    162 #+BEGIN_SRC text
    163 ((select-clause . 743) (statement-continuation . 743))
    164 #+END_SRC
    165 
    166 The first symbol displayed is the syntactic symbol used for indentation, in
    167 this case ~select-clause~.  The syntactic symbols are described in a section
    168 below, however, for now, this is the symbol that will need to be updated in
    169 ~sqlind-indentation-offsets-alist~.  The number next to it represents the
    170 anchor, or reference position in the buffer where the current statement
    171 starts.  The anchor and is useful if you need to write your own indentation
    172 functions.
    173 
    174 To customize indentation for this type of statement, add an entry in the
    175 ~sqlind-indentation-offsets-alist~, for the syntactic symbol shown, with
    176 information about how it should be indented.  This information is a list
    177 containing *indentation control items* (these are described below).
    178 
    179 For example, to indent keyword in SELECT clauses at the same level as the
    180 keyword itself, we use a number which is added to the indentation level of the
    181 anchor, in this case, 0:
    182 
    183 #+BEGIN_SRC text
    184 (select-clause 0)
    185 #+END_SRC
    186 
    187 To indent it at ~sqlind-basic-offset~ plus one more space, use:
    188 
    189 #+BEGIN_SRC text
    190 (select-clause + 1)
    191 #+END_SRC
    192 
    193 To right-justify the keyword w.r.t the SELECT keyword, use:
    194 
    195 #+BEGIN_SRC text
    196 (select-clause sqlind-right-justify-clause)
    197 #+END_SRC
    198 
    199 The default value for ~sqlind-indentation-offsets-alist~ contains many
    200 examples for indentation setup rules.
    201 
    202 ** Indentation control items
    203 
    204 ~sqlind-calculate-indentation~ is the function that calculates the indentation
    205 offset to use, based on the contents of ~sqlind-indentation-offsets-alist~.
    206 The indentation offset starts with the indentation column of the ANCHOR point
    207 and it is adjusted based on the following items:
    208 
    209  * a ~NUMBER~ -- the NUMBER will be added to the indentation offset.
    210 
    211  * ~+~ -- the current indentation offset is incremented by
    212    ~sqlind-basic-offset~
    213 
    214  * ~++~ -- the current indentation offset is indentation by ~2 *
    215    sqlind-basic-offset~
    216 
    217  * ~-~ -- the current indentation offset is decremented by
    218    ~sqlind-basic-offset~
    219 
    220  * ~--~ -- the current indentation offset is decremented by 2 *
    221    ~sqlind-basic-offset~
    222 
    223  * a ~FUNCTION~ -- the syntax and current indentation offset is passed to the
    224    function and its result is used as the new indentation offset.  This can be
    225    used to further customize indentation.
    226 
    227 *** Indentation Helper Functions
    228 
    229 The following helper functions are available as part of the package and can be
    230 used as the FUNCTION part in the ~sqlind-indentation-offsets-alist~
    231 
    232 **** sqlind-use-anchor-indentation
    233 
    234 discard the current offset and returns the indentation column of the ANCHOR
    235   
    236 **** sqlind-lineup-to-anchor
    237 
    238 discard the current offset and returns the column of the anchor point, which
    239 may be different than the indentation column of the anchor point.
    240 
    241 **** sqlind-use-previous-line-indentation
    242 
    243 discard the current offset and returns the indentation column of the previous
    244 line
    245 
    246 **** sqlind-lineup-open-paren-to-anchor
    247 
    248 if the line starts with an open parenthesis, discard the current
    249 offset and return the column of the anchor point.
    250 
    251 **** sqlind-lone-semicolon
    252 
    253 if the line contains a single semicolon ';', use the value of
    254 ~sqlind-use-anchor-indentation~
    255 
    256 **** sqlind-adjust-operator
    257 
    258 if the line starts with an arithmetic operator (like ~+~ , ~-~, or ~||~), line
    259 it up so that the right hand operand lines up with the left hand operand of
    260 the previous line.  For example, it will indent the ~||~ operator like this:
    261 
    262 #+BEGIN_SRC sql
    263 select col1, col2
    264           || col3 as composed_column, -- align col3 with col2
    265        col4
    266     || col5 as composed_column2
    267 from   my_table
    268 where  cond1 = 1
    269 and    cond2 = 2;
    270 #+END_SRC
    271 
    272 **** sqlind-left-justify-logical-operator
    273 
    274 If the line starts with a logic operator (AND, OR NOT), line the operator with
    275 the start of the WHERE clause.  This rule should be added to the
    276 ~in-select-clause~ syntax after the ~sqlind-lineup-to-clause-end~ rule.
    277 
    278 **** sqlind-right-justify-logical-operator
    279 
    280 If the line starts with a logic operator (AND, OR NOT), line the operator with
    281 the end of the WHERE clause. This rule should be added to the
    282 ~in-select-clause~ syntax.
    283   
    284 #+BEGIN_SRC sql
    285 select *
    286   from table
    287  where a = b
    288    and c = d; -- AND clause sits under the where clause
    289 #+END_SRC
    290 
    291 **** sqlind-adjust-comma
    292 
    293 if the line starts with a comma, adjust the current offset so that the line is
    294 indented to the first word character.  For example, if added to a
    295 ~select-column~ syntax indentation rule, it will indent as follows:
    296 
    297 #+BEGIN_SRC sql
    298 select col1
    299    ,   col2 -- align "col2" with "col1"
    300 from my_table;
    301 #+END_SRC
    302 
    303 **** sqlind-lineup-into-nested-statement
    304 
    305 discard the current offset and return the column of the first word inside a
    306 nested statement.  This rule should be added to
    307 ~nested-statement-continuation~ syntax indentation rule, and will indent as
    308 follows:
    309 
    310 #+BEGIN_SRC sql
    311 (    a,
    312      b  -- b is aligned with a
    313 )
    314 #+END_SRC
    315 
    316 *** More Indentation Helper Functions
    317 The following function contain indentation code specific to various SQL
    318 statements.  Have a look at their doc strings for what they do:
    319 
    320  * ~sqlind-indent-comment-start~, ~sqlind-indent-comment-continuation~
    321 
    322  * ~sqlind-indent-select-column~
    323 
    324  * ~sqlind-indent-select-table~
    325 
    326  * ~sqlind-lineup-to-clause-end~
    327 
    328  * ~sqlind-right-justify-clause~
    329 
    330  * ~sqlind-lineup-joins-to-anchor~
    331 
    332 ** Syntactic Symbols
    333 
    334 The SQL parsing code returns a syntax definition (either a symbol or a
    335 list) and an anchor point, which is a buffer position.  The syntax symbols can
    336 be used to define how to indent each line.  The following syntax symbols are
    337 defined for SQL code:
    338 
    339  * ~(syntax-error MESSAGE START END)~ -- this is returned when the parse
    340    failed.  MESSAGE is an informative message, START and END are buffer
    341    locations denoting the problematic region.  ANCHOR is undefined for this
    342    syntax info
    343 
    344  * ~in-comment~ -- line is inside a multi line comment, ANCHOR is the start of
    345    the comment.
    346 
    347  * ~comment-start~ -- line starts with a comment.  ANCHOR is the start of the
    348    enclosing block.
    349 
    350  * ~in-string~ -- line is inside a string, ANCHOR denotes the start of the
    351    string.
    352 
    353  * ~toplevel~ -- line is at toplevel (not inside any programming construct).
    354    ANCHOR is usually (point-min).
    355 
    356  * ~(in-block BLOCK-KIND LABEL)~ -- line is inside a block construct.
    357    BLOCK-KIND (a symbol) is the actual block type and can be one of "if",
    358    "case", "exception", "loop" etc.  If the block is labeled, LABEL contains
    359    the label.  ANCHOR is the start of the block.
    360 
    361  * ~(in-begin-block KIND LABEL)~ -- line is inside a block started by a begin
    362    statement.  KIND (a symbol) is "toplevel-block" for a begin at toplevel,
    363    "defun" for a begin that starts the body of a procedure or function,
    364    \"package\" for a begin that starts the body of a package, nil for a begin
    365    that is none of the previous.  For a "defun" or "package", LABEL is the
    366    name of the procedure, function or package, for the other block types LABEL
    367    contains the block label, or the empty string if the block has no label.
    368    ANCHOR is the start of the block.
    369 
    370  * ~(block-start KIND)~ -- line begins with a statement that starts a block.
    371    KIND (a symbol) can be one of "then", "else" or "loop".  ANCHOR is the
    372    reference point for the block start (the corresponding if, case, etc).
    373 
    374  * ~(block-end KIND LABEL)~ -- the line contains an end statement.  KIND (a
    375    symbol) is the type of block we are closing, LABEL (a string) is the block
    376    label (or procedure name for an end defun).
    377 
    378  * ~declare-statement~ -- line is after a declare keyword, but before the
    379    begin.  ANCHOR is the start of the declare statement.
    380 
    381  * ~(package NAME)~ -- line is inside a package definition.  NAME is the name
    382    of the package, ANCHOR is the start of the package.
    383 
    384  * ~(package-body NAME)~ -- line is inside a package body.  NAME is the name
    385    of the package, ANCHOR is the start of the package body.
    386 
    387  * ~(create-statement WHAT NAME)~ -- line is inside a CREATE statement (other
    388    than create procedure or function).  WHAT is the thing being created, NAME
    389    is its name.  ANCHOR is the start of the create statement.
    390 
    391  * ~(defun-start NAME)~ -- line is inside a procedure of function definition
    392    but before the begin block that starts the body.  NAME is the name of the
    393    procedure/function, ANCHOR is the start of the procedure/function
    394    definition.
    395 
    396 The following SYNTAX-es are for SQL statements.  For all of them ANCHOR points
    397 to the start of a statement itself.
    398 
    399  * ~labeled-statement-start~ -- line is just after a label.
    400 
    401  * ~statement-continuation~ -- line is inside a statement which starts on a
    402    previous line.
    403 
    404  * ~nested-statement-open~ -- line is just inside an opening bracket, but the
    405   actual bracket is on a previous line.
    406 
    407  * ~nested-statement-continuation~ -- line is inside an opening bracket, but
    408    not the first element after the bracket.
    409 
    410  * ~nested-statement-close~ line is inside an opening bracket and the line
    411    contains the closing bracket as the first character.
    412 
    413 The following SYNTAX-es are for statements which are SQL code (DML
    414 statements).  They are specializations on the previous statement syntaxes and
    415 for all of them a previous generic statement syntax is present earlier in the
    416 SYNTAX list.  Unless otherwise specified, ANCHOR points to the start of the
    417 clause (select, from, where, etc) in which the current point is.
    418 
    419  * ~with-clause~ -- line is inside a WITH clause, but before the main SELECT
    420    clause.
    421 
    422  * ~with-clause-cte~ -- line is inside a with clause before a CTE (common
    423    table expression) declaration
    424 
    425  * ~with-clause-cte-cont~ -- line is inside a with clause before a CTE
    426    definition
    427 
    428  * ~case-clause~ -- line is on a CASE expression (WHEN or END clauses).
    429    ANCHOR is the start of the CASE expression.
    430 
    431  * ~case-clause-item~ -- line is on a CASE expression (THEN and ELSE clauses).
    432    ANCHOR is the position of the case clause.
    433 
    434  * ~case-clause-item-cont~ -- line is on a CASE expression but not on one of
    435    the CASE sub-keywords.  ANCHOR points to the case keyword that this line is
    436    a continuation of.
    437 
    438  * ~select-clause~ -- line is inside a select statement, right before one of
    439    its clauses (from, where, order by, etc).
    440 
    441  * ~select-column~ -- line is inside the select column section, after a full
    442    column was defined (and a new column definition is about to start).
    443 
    444  * ~select-column-continuation~ -- line is inside the select column section,
    445    but in the middle of a column definition.  The defined column starts on a
    446    previous like.  Note that ANCHOR still points to the start of the select
    447    statement itself.
    448 
    449  * ~select-join-condition~ -- line is right before or just after the ON clause
    450    for an INNER, LEFT or RIGHT join.  ANCHOR points to the join statement for
    451    which the ON is defined.
    452 
    453  * ~select-table~ -- line is inside the from clause, just after a table was
    454    defined and a new one is about to start.
    455 
    456  * ~select-table-continuation~ -- line is inside the from clause, inside a
    457    table definition which starts on a previous line. ANCHOR still points to
    458    the start of the table definition.
    459 
    460  * ~(in-select-clause CLAUSE)~ -- line is inside the select CLAUSE, which can
    461    be "where", "order by", "group by" or "having".  Note that CLAUSE can never
    462    be "select" and "from", because we have special syntaxes inside those
    463    clauses.
    464 
    465  * ~insert-clause~ -- line is inside an insert statement, right before one of
    466    its clauses (values, select).
    467 
    468  * ~(in-insert-clause CLAUSE)~ -- line is inside the insert CLAUSE, which can
    469    be "insert into" or "values".
    470 
    471  * ~delete-clause~ -- line is inside a delete statement right before one of
    472    its clauses.
    473 
    474  * ~(in-delete-clause CLAUSE)~ -- line is inside a delete CLAUSE, which can be
    475    "delete from" or "where".
    476 
    477  * ~update-clause~ -- line is inside an update statement right before one of
    478    its clauses.
    479 
    480  * ~(in-update-clause CLAUSE)~ -- line is inside an update CLAUSE, which can
    481    be "update", "set" or "where"
    482 * Limitations
    483 
    484 The sql-indent package does not contain a full SQL parser, instead it relies
    485 on various heuristics to determine the context of each line in a SQL program.
    486 Relying on heuristics means that sometimes valid SQL code is not detected
    487 correctly, and therefore the indentation will be wrong.
    488 
    489 This section contains some of the known cases that are incorrectly detected,
    490 and provides some workarounds for them.
    491 
    492 ** Parsing expressions
    493 
    494 There is no support for parsing SQL expressions, so if an expression is broken
    495 over several lines, sql-indent.el will consider all lines to be
    496 ~statement-continuation~ lines.  The exception is that bracketed expressions
    497 are identified correctly so they can be used for indentation.
    498 
    499 The examples below summarize what is supported and what is not, as well as the
    500 workarounds:
    501 
    502 #+BEGIN_SRC sql
    503   -- SUPPORTED: case expression immediately after assignment
    504   var := case ind
    505          when 1 then 'Guy'
    506          when 2 then 'Abc'
    507          when 3 then 'Def'
    508          else 'World'
    509          end case;
    510 
    511   -- NOT SUPPORTED: any complex expression involving a case expression.  entire
    512   -- expression is a 'statement-continuation
    513   var := 'abc'
    514     || case ind
    515     when 1 then 'Guy'
    516     when 2 then 'Abc'
    517     when 3 then 'Def'
    518     else 'World'
    519     end case;
    520 
    521   -- WORKAROUND: use brackets instead
    522   var := 'abc'
    523     || (case ind
    524         when 1 then 'Guy'
    525         when 2 then 'Abc'
    526         when 3 then 'Def'
    527         else 'World'
    528         end case);
    529 
    530   -- SUPPORTED: case expression as select column
    531   select col1,
    532          case ind
    533          when 1 then 'Guy'
    534          when 2 then 'Abc'
    535          when 3 then 'Def'
    536          else 'World'
    537          end case,
    538          col2,
    539     from some_table;
    540 
    541   -- NOT SUPPORTED: any complex expression involving a case expression in a
    542   -- select column.  Entire column is a 'select-column-continuation
    543   select col1,
    544          'abc' || case ind
    545            when 1 then 'Guy'
    546            when 2 then 'Abc'
    547            when 3 then 'Def'
    548            else 'World'
    549            end case,
    550          col2,
    551     from some_table;
    552 
    553   -- WORKAROUND: use brackets instead
    554   select col1,
    555          'abc' || (case ind
    556                    when 1 then 'Guy'
    557                    when 2 then 'Abc'
    558                    when 3 then 'Def'
    559                    else 'World'
    560                    end case),
    561          col2,
    562     from some_table;
    563 #+END_SRC
    564 
    565 ** DECLARE statements in Postgres
    566 
    567 The DECLARE statement in Postgres can start a block of declarations or declare
    568 a single item.  Unfortunately, the sql-indent package is not always able to
    569 differentiate between the two.  Curently, DECLARE blocks are only recognized
    570 at the start of another enclosing block, such as $$, BEGIN, THEN or ELSE, but
    571 they can occur on other situations.  The workaround is to enclose the DECLARE
    572 block inside a BEGIN/END block or to use individual DECLARE statements.
    573 
    574 For more info see https://github.com/alex-hhh/emacs-sql-indent/issues/92
    575 
    576 DECLARE blocks are not recognized when the follow normal statements,
    577 sql-indent considers them single statements instead.  In the example below,
    578 DECLARE is considered a statement, and the ~local_b~ declaration is anchored
    579 off the previous BEGIN statement:
    580 
    581 #+BEGIN_SRC sql
    582   -- NOT SUPPORTED: `local_b` is not recognized as a declaration
    583   create function less_than(a text, b text) returns boolean as $$
    584     begin
    585       raise debug 'less_than(%, %)', a, b;
    586       declare
    587         local_a text := a;
    588       local_b text := b;
    589       begin
    590         return local_a < local_b;
    591       end;
    592     end;
    593   end;
    594   $$ language plpgsql;
    595 #+END_SRC sql
    596 
    597 The workaround is to either surround the DECLARE block with a BEGIN/END
    598 statement, or to prefix each variable declaration with DECLARE, as in the two
    599 examples below:
    600 
    601 #+BEGIN_SRC sql
    602   -- WORKAROUND 1: surround the DECLARE/BEGIN/END with another BEGIN/END block
    603   create function less_than(a text, b text) returns boolean as $$
    604     begin
    605       raise debug 'less_than(%, %)', a, b;
    606       begin
    607         declare
    608           local_a text := a;
    609           local_b text := b;
    610         begin
    611           return local_a < local_b;
    612         end;
    613       end;
    614     end;
    615   end;
    616   $$ language plpgsql;
    617 
    618   -- WORKAROUND 2: declare each variable individually
    619   create function less_than(a text, b text) returns boolean as $$
    620     begin
    621       raise debug 'less_than(%, %)', a, b;
    622       declare
    623         local_a text := a;
    624       declare
    625         local_b text := b;
    626       begin
    627         return local_a < local_b;
    628       end;
    629     end;
    630   end;
    631   $$ language plpgsql;
    632 #+END_SRC sql
    633 
    634 .