aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRichard M. Stallman <[email protected]>2005-06-17 13:51:19 +0000
committerRichard M. Stallman <[email protected]>2005-06-17 13:51:19 +0000
commit377ddd88f6ad58ccf7072c6bcf57fd76bfbc37c5 (patch)
tree6ad4229fab6289c75c312df5ab926a25fe6048bd
parenta99eb78d827dcbef46f2b83ceb3f7f10abaf969c (diff)
(Byte Packing): New node.
(Processes): Add it to menu.
-rw-r--r--lispref/processes.texi402
1 files changed, 402 insertions, 0 deletions
diff --git a/lispref/processes.texi b/lispref/processes.texi
index 07a7288635..f86a844a87 100644
--- a/lispref/processes.texi
+++ b/lispref/processes.texi
@@ -52,6 +52,7 @@ This function returns @code{t} if @var{object} is a process,
* Datagrams:: UDP network connections.
* Low-Level Network:: Lower-level but more general function
to create connections and servers.
+* Byte Packing:: Using bindat to pack and unpack binary data.
@end menu
@node Subprocess Creation
@@ -2015,6 +2016,407 @@ That particular network option is supported by
@code{make-network-process} and @code{set-network-process-option}.
@end table
+@node Byte Packing
+@section Packing and Unpacking Byte Arrays
+
+ This section describes how to pack and unpack arrays of bytes,
+usually for binary network protocols. These functoins byte arrays to
+alists, and vice versa. The byte array can be represented as a
+unibyte string or as a vector of integers, while the alist associates
+symbols either with fixed-size objects or with recursive sub-alists.
+
+@cindex serializing
+@cindex deserializing
+@cindex packing
+@cindex unpacking
+ Conversion from byte arrays to nested alists is also known as
+@dfn{deserializing} or @dfn{unpacking}, while going in the opposite
+direction is also known as @dfn{serializing} or @dfn{packing}.
+
+@menu
+* Bindat Spec:: Describing data layout.
+* Bindat Functions:: Doing the unpacking and packing.
+* Bindat Examples:: Samples of what bindat.el can do for you!
+@end menu
+
+@node Bindat Spec
+@subsection Describing Data Layout
+
+ To control unpacking and packing, you write a @dfn{data layout
+specification}, a special nested list describing named and typed
+@dfn{fields}. This specification conrtols length of each field to be
+processed, and how to pack or unpack it.
+
+@cindex endianness
+@cindex big endian
+@cindex little endian
+@cindex network byte ordering
+ A field's @dfn{type} describes the size (in bytes) of the object
+that the field represents and, in the case of multibyte fields, how
+the bytes are ordered within the firld. The two possible orderings
+are ``big endian'' (also known as ``network byte ordering'') and
+``little endian''. For instance, the number @code{#x23cd} (decimal
+9165) in big endian would be the two bytes @code{#x23} @code{#xcd};
+and in little endian, @code{#xcd} @code{#x23}. Here are the possible
+type values:
+
+@table @code
+@item u8
+@itemx byte
+Unsigned byte, with length 1.
+
+@item u16
+@itemx word
+@itemx short
+Unsigned integer in network byte order, with length 2.
+
+@item u24
+Unsigned integer in network byte order, with length 3.
+
+@item u32
+@itemx dword
+@itemx long
+Unsigned integer in network byte order, with length 4.
+Note: These values may be limited by Emacs' integer implementation limits.
+
+@item u16r
+@itemx u24r
+@itemx u32r
+Unsigned integer in little endian order, with length 2, 3 and 4, respectively.
+
+@item str @var{len}
+String of length @var{len}.
+
+@item strz @var{len}
+Zero-terminated string of length @var{len}.
+
+@item vec @var{len}
+Vector of @var{len} bytes.
+
+@item ip
+Four-byte vector representing an Internet address. For example:
+@code{[127 0 0 1]} for localhost.
+
+@item bits @var{len}
+List of set bits in @var{len} bytes. The bytes are taken in big
+endian order and the bits are numbered starting with @code{8 *
+@var{len} @minus{} 1}} and ending with zero. For example: @code{bits
+2} unpacks @code{#x28} @code{#x1c} to @code{(2 3 4 11 13)} and
+@code{#x1c} @code{#x28} to @code{(3 5 10 11 12)}.
+
+@item (eval @var{form})
+@var{form} is a Lisp expression evaluated at the moment the field is
+unpacked or packed. The result of the evaluation should be one of the
+above-listed type specifications.
+@end table
+
+A field specification generally has the form @code{([@var{name}]
+@var{handler})}. The square braces indicate that @var{name} is
+optional. (Don't use names that are symbols meaningful as type
+specifications (above) or handler specifications (below), since that
+would be ambiguous.) @var{name} can be a symbol or the expression
+@code{(eval @var{form})}, in which case @var{form} should evaluate to
+a symbol.
+
+@var{handler} describes how to unpack or pack the field and can be one
+of the following:
+
+@table @code
+@item @var{type}
+Unpack/pack this field according to the type specification @var{type}.
+
+@item eval @var{form}
+Evaluate @var{form}, a Lisp expression, for side-effect only. If the
+field name is specified, the value is bound to that field name.
+@var{form} can access and update these dynamically bound variables:
+
+@table @code
+@item raw-data
+The data as a byte array.
+
+@item pos
+Current position of the unpacking or packing operation.
+
+@item struct
+Alist.
+
+@item last
+Value of the last field processed.
+@end table
+
+@item fill @var{len}
+Skip @var{len} bytes. In packing, this leaves them unchanged,
+which normally means they remain zero. In unpacking, this means
+they are ignored.
+
+@item align @var{len}
+Skip to the next multiple of @var{len} bytes.
+
+@item struct @var{spec-name}
+Process @var{spec-name} as a sub-specification. This descrobes a
+structure nested within another structure.
+
+@item union @var{form} (@var{tag} @var{spec})@dots{}
+@c ??? I don't see how one would actually use this.
+@c ??? what kind of expression would be useful for @var{form}?
+Evaluate @var{form}, a Lisp expression, find the first @var{tag}
+that matches it, and process its associated data layout specification
+@var{spec}. Matching can occur in one of three ways:
+
+@itemize
+@item
+If a @var{tag} has the form @code{(eval @var{expr})}, evaluate
+@var{expr} with the variable @code{tag} dynamically bound to the value
+of @var{form}. A non-@code{nil} result indicates a match.
+
+@item
+@var{tag} matches if it is @code{equal} to the value of @var{form}.
+
+@item
+@var{tag} matches unconditionally if it is @code{t}.
+@end itemize
+
+@item repeat @var{count} @var{field-spec}@dots{}
+@var{count} may be an integer, or a list of one element naming a
+previous field. For correct operation, each @var{field-spec} must
+include a name.
+@c ??? What does it MEAN?
+@end table
+
+@node Bindat Functions
+@subsection Functions to Unpack and Pack Bytes
+
+ In the following documentation, @var{spec} refers to a data layout
+specification, @code{raw-data} to a byte array, and @var{struct} to an
+alist representing unpacked field data.
+
+@defun bindat-unpack spec raw-data &optional pos
+This function unpacks data from the byte array @code{raw-data}
+according to @var{spec}. Normally this starts unpacking at the
+beginning of the byte array, but if @var{pos} is non-@code{nil}, it
+specifies a zero-based starting position to use instead.
+
+The value is an alist or nested alist in which each element describes
+one unpacked field.
+@end defun
+
+@defun bindat-get-field struct &rest name
+This function selects a field's data from the nested alist
+@var{struct}. Usually @var{struct} was returned by
+@code{bindat-unpack}. If @var{name} corresponds to just one argument,
+that means to extract a top-level field value. Multiple @var{name}
+arguments specify repeated lookup of sub-structures. An integer name
+acts as an array index.
+
+For example, if @var{name} is @code{(a b 2 c)}, that means to find
+field @code{c} in the second element of subfield @code{b} of field
+@code{a}. (This corresponds to @code{struct.a.b[2].c} in C.)
+@end defun
+
+@defun bindat-length spec struct
+@c ??? I don't understand this at all -- rms
+This function returns the length in bytes of @var{struct}, according
+to @var{spec}.
+@end defun
+
+@defun bindat-pack spec struct &optional raw-data pos
+This function returns a byte array packed according to @var{spec} from
+the data in the alist @var{struct}. Normally it creates and fills a
+new byte array starting at the beginning. However, if @var{raw-data}
+is non-@code{nil}, it speciries a pre-allocated string or vector to
+pack into. If @var{pos} is non-@code{nil}, it specifies the starting
+offset for packing into @code{raw-data}.
+
+@c ??? Isn't this a bug? Shoudn't it always be unibyte?
+Note: The result is a multibyte string; use @code{string-make-unibyte}
+on it to make it unibyte if necessary.
+@end defun
+
+@defun bindat-ip-to-string ip
+Convert the Internet address vector @var{ip} to a string in the usual
+dotted notation.
+
+@example
+(bindat-ip-to-string [127 0 0 1])
+ @result{} "127.0.0.1"
+@end example
+@end defun
+
+@node Bindat Examples
+@subsection Examples of Byte Unpacking and Packing
+
+ Here is a complete example of byte unpacking and packing:
+
+ @lisp
+(defvar fcookie-index-spec
+ '((:version u32)
+ (:count u32)
+ (:longest u32)
+ (:shortest u32)
+ (:flags u32)
+ (:delim u8)
+ (:ignored fill 3)
+ (:offset repeat (:count)
+ (:foo u32)))
+ "Description of a fortune cookie index file's contents.")
+
+(defun fcookie (cookies &optional index)
+ "Display a random fortune cookie from file COOKIES.
+Optional second arg INDEX specifies the associated index
+filename, which is by default constructed by appending
+\".dat\" to COOKIES. Display cookie text in possibly
+new buffer \"*Fortune Cookie: BASENAME*\" where BASENAME
+is COOKIES without the directory part."
+ (interactive "fCookies file: ")
+ (let* ((info (with-temp-buffer
+ (insert-file-contents-literally
+ (or index (concat cookies ".dat")))
+ (bindat-unpack fcookie-index-spec
+ (buffer-string))))
+ (sel (random (bindat-get-field info :count)))
+ (beg (cdar (bindat-get-field info :offset sel)))
+ (end (or (cdar (bindat-get-field info :offset (1+ sel)))
+ (nth 7 (file-attributes cookies)))))
+ (switch-to-buffer (get-buffer-create
+ (format "*Fortune Cookie: %s*"
+ (file-name-nondirectory cookies))))
+ (erase-buffer)
+ (insert-file-contents-literally cookies nil beg (- end 3))))
+
+(defun fcookie-create-index (cookies &optional index delim)
+ "Scan file COOKIES, and write out its index file.
+Optional second arg INDEX specifies the index filename,
+which is by default constructed by appending \".dat\" to
+COOKIES. Optional third arg DELIM specifies the unibyte
+character which, when found on a line of its own in
+COOKIES, indicates the border between entries."
+ (interactive "fCookies file: ")
+ (setq delim (or delim ?%))
+ (let ((delim-line (format "\n%c\n" delim))
+ (count 0)
+ (max 0)
+ min p q len offsets)
+ (unless (= 3 (string-bytes delim-line))
+ (error "Delimiter cannot be represented in one byte"))
+ (with-temp-buffer
+ (insert-file-contents-literally cookies)
+ (while (and (setq p (point))
+ (search-forward delim-line (point-max) t)
+ (setq len (- (point) 3 p)))
+ (setq count (1+ count)
+ max (max max len)
+ min (min (or min max) len)
+ offsets (cons (1- p) offsets))))
+ (with-temp-buffer
+ (set-buffer-multibyte nil)
+ (insert (string-make-unibyte
+ (bindat-pack
+ fcookie-index-spec
+ `((:version . 2)
+ (:count . ,count)
+ (:longest . ,max)
+ (:shortest . ,min)
+ (:flags . 0)
+ (:delim . ,delim)
+ (:offset . ,(mapcar (lambda (o)
+ (list (cons :foo o)))
+ (nreverse offsets)))))))
+ (let ((coding-system-for-write 'raw-text-unix))
+ (write-file (or index (concat cookies ".dat")))))))
+@end lisp
+
+Following is an example of defining and unpacking a complex structure.
+Consider the following C structures:
+
+@example
+struct header @{
+ unsigned long dest_ip;
+ unsigned long src_ip;
+ unsigned short dest_port;
+ unsigned short src_port;
+@};
+
+struct data @{
+ unsigned char type;
+ unsigned char opcode;
+ unsigned long length; /* In little endian order */
+ unsigned char id[8]; /* nul-terminated string */
+ unsigned char data[/* (length + 3) & ~3 */];
+@};
+
+struct packet @{
+ struct header header;
+ unsigned char items;
+ unsigned char filler[3];
+ struct data item[/* items */];
+
+@};
+@end example
+
+The corresponding data layout specification:
+
+@lisp
+(setq header-spec
+ '((dest-ip ip)
+ (src-ip ip)
+ (dest-port u16)
+ (src-port u16)))
+
+(setq data-spec
+ '((type u8)
+ (opcode u8)
+ (length u16r) ;; little endian order
+ (id strz 8)
+ (data vec (length))
+ (align 4)))
+
+(setq packet-spec
+ '((header struct header-spec)
+ (items u8)
+ (fill 3)
+ (item repeat (items)
+ (struct data-spec))))
+@end lisp
+
+A binary data representation:
+
+@lisp
+(setq binary-data
+ [ 192 168 1 100 192 168 1 101 01 28 21 32 2 0 0 0
+ 2 3 5 0 ?A ?B ?C ?D ?E ?F 0 0 1 2 3 4 5 0 0 0
+ 1 4 7 0 ?B ?C ?D ?E ?F ?G 0 0 6 7 8 9 10 11 12 0 ])
+@end lisp
+
+The corresponding decoded structure:
+
+@lisp
+(setq decoded-structure (bindat-unpack packet-spec binary-data))
+ @result{}
+((header
+ (dest-ip . [192 168 1 100])
+ (src-ip . [192 168 1 101])
+ (dest-port . 284)
+ (src-port . 5408))
+ (items . 2)
+ (item ((data . [1 2 3 4 5])
+ (id . "ABCDEF")
+ (length . 5)
+ (opcode . 3)
+ (type . 2))
+ ((data . [6 7 8 9 10 11 12])
+ (id . "BCDEFG")
+ (length . 7)
+ (opcode . 4)
+ (type . 1))))
+@end lisp
+
+Fetching data from this structure:
+
+@lisp
+(bindat-get-field decoded-structure 'item 1 'id)
+ @result{} "BCDEFG"
+@end lisp
+
@ignore
arch-tag: ba9da253-e65f-4e7f-b727-08fba0a1df7a
@end ignore