For example, if you have a C library that can calculate the value of Pi, and you've always wanted to have your Erlang application use that library, you have a few options:
Scott Lystig Fritchie wrote a paper that describes EDTK to a fair degree of detail. A copy of the paper can be obtained from the Erlang workshop proceedings of the ACM's PLI 2002 conference; that workshop's proceedings are available in the ACM's Digital Library. Non-ACM members can obtain a copy of the paper from the author's personal Web site: http://www.snookles.com/scott/ or locally via pli2002-slf.pdf. Please honor the ACM's copyright and redistribution restrictions.
EDTK's license is Berkeley-style. See the file ../LICENSE for full license details.
Erlang's extensibility mechanism is called a "port". A port behaves like a regular Erlang process: you can send messages to and receive messages from it. However, the messages sent through a port are redirected to a "driver", which executes the code that extends Erlang's functionality.
No, Erlang does not have a "foreign pointer" data type. Some of the detail in Erlang Driver Details section of this FAQ can give a sense of Erlang history and why Erlang doesn't have such a data type. Another source of detail is Scott Lystig Fritchie's PLI 2002 paper about the EDTK. See the FAQ question "What is EDTK?" for how to obtain a copy of the paper.
It may be time to create such a data type. I (Scott) don't know. The Erlang community has traditionally been reluctant to add new things to the language. I would suspect that any debate about adding a foreign pointer data type to the Erlang would quickly expand into a wider discussion of how to "linked-in" drivers should evolve and if there should be Erlang-visible difference between them an "pipe" drivers. Another debate would be whether or not it would be desirable, much less feasible, for "pipe" drivers to utilize the new data type. (See the FAQ topic "What is the difference between a "pipe driver" and a "linked-in driver"?" for an overview of what a "linked-in" driver is.)
One questioner pointed out that some sort of foreign data type could be quite helpful when tied together with the garbage collector. If a process that had allocated memory or opened a file, the memory would be freed or file closed when the foreign data type's reference count dropped to zero.
I agree: such a tie to GC would be a nifty thing. However, in absence of a new data type, an Erlang programmer using EDTK would simply have to keep in mind that such allocated resources are tied to the port. If the programmer wants to free those resources, he/she should explicitly free them via driver function calls or implicitly free them by closing the port as soon as it is no longer necessary. If the Erlang process never closes the port and never dies, all value mapped resources will be maintained indefinitely by the driver. However, this is not much different from the case when the Erlang process always holds a reference to a foreign data object and, therefore, the GC system will never collect it and free its associated resource(s). (See the FAQ topic "What is a value map? Why would I want to use one?" for more information on value maps.)
A longer explanation of why SWIG wasn't used can be found in Scott Lystig Fritchie's PLI 2002 paper about the EDTK. See the paper for full details.
The short explanation is that, at the start of the project, I (Scott) wasn't certain if SWIG was capable of dealing with all of the additional complications, restrictions, and sometimes wacky things that Erlang requires of its language extension mechanism that most other programming languages' don't. Upon reflection, SWIG probably is capable of supporting the Erlang-unique things that EDTK does. However, SWIG is undergoing an extensive redesign for its 2.0 release; we can wait for SWIG 2.0 before deciding whether to stop development of EDTK in favor of SWIG.
See the file ../examples/README for details of the example drivers that are included in the EDTK source distribution. As of release version 1.0, the following drivers are included:
Due to EDTK's very young age (release version 1.0), I (Scott) am not aware of any other drivers developed using EDTK.
The official location for downloading the EDTK source distribution is http://www.snookles.com/erlang/edtk/. At the time of this writing, there are no unofficial locations. :-)
EDTK's maintainer is Scott Lystig Fritchie. He can be contacted at nospam@snookles.com. Please please the string "EDTK" in the Subject line: otherwise your message may be ignored, thanks to Scott's email spam filter.
I (Scott) would really like to hear about other people using EDTK. If you have a bug report, a bug fix, or a request for new features, please send them to me.
It depends on your perspective.
From the perspective of an Erlang process, there is no difference: they appear exactly the same. Thanks to the miracle of abstraction, the Erlang "port" hides all of the details of how a driver is implemented. To loosely paraphrase the Erlang book by Armstrong et al., a port intentionally makes non-Erlang things look as much like Erlang as possible.
A "pipe driver" is the original Erlang driver implementation. Refer to the figure in port-driver.png.
The Erlang port takes care of all communication between the Erlang virtual machine process (shown in the top half of the figure) and the driver process (show in the bottom half of the figure). All driver code runs in the driver process. All data sent between the two operating system processes goes through a pair of operating system pipes.
Each of the four arrows in the diagram is numbered. Here is an explanation of what takes place at each step.
The I/O list bound to InBytes is literally the list of bytes shown in blue. These bytes are formatted according to a protocol that both the Erlang process and the driver process agree upon. In this case, the 2 value means to execute function #2, and the bytes follow are the single argument for that function.{self(), {command, InBytes}}
In later releases of Erlang, BIFs (Built-In Functions) were added to communicate with a port. This example does the same thing as port-driver.png's step #1:
port_command(Port, InBytes)
A "linked-in" driver executes within the VM's process space. As a result, there is much lower latency and overhead: no data copying done through two pipes, and no context switching between operating system processes. However, this speed comes at a price: a single divide-by-zero or memory access bug within the driver can crash the entire Erlang virtual machine.
Linked-in drivers use the exact same API, however. The Erlang-side stub and the driver must still send serialized data in both directions. Internally, the driver code receives a contiguous buffer containing the serialized bytes and returns a single contiguous buffer.
More recent Erlang releases have given more flexibility to linked-in drivers.
There isn't a "synchronous" driver or "asynchronous" driver per se. Instead, these refer to where a linked-in driver executes the main part of the driver's code.
Before the Erlang R7, the BEAM virtual machine utilized only a single thread of execution. Linked-in drivers had no choice: they always executed within the same thread of execution as everything else. Unfortunately, some operations performed by a linked-in driver that may block for long periods of time, such as file system access, would block the entire virtual machine's execution.
Erlang R7 release introduced a separate thread pool for use by linked-in drivers. The virtual machine executes exclusively within the operating system process's main thread, but linked-in drivers may schedule execution of potentially-blocking operations by a thread in the async worker thread pool.
Since the R7 release, the efile driver uses the async worker pool to avoid blocking the VM when making system calls such as open(2) and unlink(2).
EDTK supports both pipe and linked-in drivers.
EDTK creates a single shared library for use by both types. If the Erlang port is opened using the pipe style, the program executed is pipe-main, which loads the shared library, takes care of all of the pipe I/O communcation with the Erlang VM process, and calls the appropriate functions within the shared library.
For linked-in driver usage, the EDTK XML specification file can be annotated to describe whether the default function execution behavior should be in the main VM thread or in an async worker thread. Each function in the specification may override the driver's default execution behavior.
By default, EDTK will generate a start/0 function and a start_pipe/0 function. The former is for starting the driver linked-in style, and the latter is for starting the driver as an external process, pipe-style. In both cases, the Erlang code path must include the directory containing the driver's shared library file. In the pipe driver case, the Erlang code path must include the directory where the "pipe-main" executable is stored. See How does the driver's Erlang stub code locate the shared library? for details on manipulating the Erlang code path.
"Linked-in" is a somewhat misleading term, in my (Scott's) opinion. It refers to the fact that the driver's execution takes place within the VM's operating system process context and that the driver uses the linked-in API.
However, a "linked-in" driver may be implemented as a shared library or by statically linking the driver's functions into the BEAM executable file.
Additional detail about Erlang drivers and ports can be found in:
Prentice Hall has permitted distribution of the first half of the book (Chapters 1-9) as well as the appendices, in electronic format. This Adobe Acrobat-formatted file can be found at http://www.erlang.org/download/erlang-book-part1.pdf.
Complete documentation for the Open Source version Erlang can be found at http://www.erlang.org/doc.html.
The code generator used by EDTK is called GSLgen, an Open Source software tool written by iMatix. The GSLgen source distribution can be found at http://www.imatix.com/html/gslgen/index.htm. Note that GSLgen must be compiled with iMatix's SFL library, which can be obtained at http://www.imatix.com/html/sfl/index.htm.
Several of the drivers in the ../examples directory rely on third-party libraries. The file ../examples/README.3rd-party-files describes where to find their respective source code distributions and any special compilation instructions for use with EDTK.
See the file ../README for installation instructions.
There is no formal installation procedure, yet, for EDTK.
Compilation of all of the example drivers can be done with a simple command "make". Once compiled successfully, the command "make regression" will run the driver's regression test.
All of the driver regression tests are written in Erlang and have filenames ending with the suffix "_test.erl". Erlang's pattern matching facility creates a wonderful way to write regression tests. A test need only be written to cover the expected cases: anything error nor unexpected result will fail to match the expected pattern, causing the test to fail. See the regression test program for the 'simple1' driver for an example.
Note that some drivers, such as libnet and libpcap, require superuser privileges in order to run successfully.
After generating a driver successfully with EDTK and compiling it, there are only two files required at runtime. For the sake of this example, assume that the driver we're working with is "gd1".
These files must be placed somewhere in your Erlang VM's code search path. Additional shared libraries used by your system's shared library loader must be present. In the case of gd1_drv.so, it also relies on the libpng, libjpeg, and libXpm shared libraries. You can use the ldd /path/to/gd1_drv.so command to find all other dependent shared libraries. For example:
% ldd ./gd1_drv.so ./gd1_drv.so: /user/fritchie/src/e-d/edtk/examples/3rd-party-files/libgd.so (0x28153000) libjpeg.so.9 => /usr/local/lib/libjpeg.so.9 (0x2817f000) libpng.so.5 => /usr/local/lib/libpng.so.5 (0x2819c000) libm.so.2 => /usr/lib/libm.so.2 (0x281c1000) libz.so.2 => /usr/lib/libz.so.2 (0x281de000)
If your driver's Erlang stub code cannot locate its shared library file, you'll see an error such as this one:
Erlang (BEAM) emulator version 5.1.1 [source] [threads:0] Eshell V5.1.1 (abort with ^G) 1> {ok, Port} = simple1_drv:start(). Error: simple1_drv.so not found in code path =ERROR REPORT==== 12-Oct-2002::17:53:03 === Error in process <0.23.0> with exit value: {{badmatch,{error,enoent}},[{simple1_drv,start,0},{erl_eval,expr,3},{erl_eval,exprs,4},{shell,eval_loop,2}]} ** exited: {{badmatch,{error,enoent}}, [{simple1_drv,start,0}, {erl_eval,expr,3}, {erl_eval,exprs,4}, {shell,eval_loop,2}]} **
Each Erlang module generated by EDTK contains a function load_path/0 to locate the directory which contains that driver's shared library file. load_path/0 will use the same search path used by the Erlang code server to try to find the shared library.
For example, if your driver is called "simple1_drv", simple1_drv:load_path/0 will search the code server path, as returned by code:get_path/0, until if finds a file called "simple1_drv.so". If it cannot find such a file in any directory in the search path, it will return {error, enoent}.
Perhaps the easiest way to modify the code server's search path is to use the "-pa" and/or "-pz" flags on the "erl" command line. For example:
... will add the directories "../priv" and "/path/to/other/dir" to the end of the search path.erl -pz ../priv -pz /path/to/other/dir
See the documentation for "code", the Erlang code server, in the Erlang Kernel Reference Manual for details on examining and manipulating the code search path. For Erlang release R8, that can be found at http://www.erlang.org/doc/r8b/lib/kernel-2.7.3/doc/html/index.html: select the "code" link in the left frame.
Well, I (Scott) am not a build system guru. What follows here isn't a recommendation as ... er, well, it's a step-by-step of what I've done for most of the example drivers in the ../examples/ directory.
None of the example drivers require manual editing of the GSLgen-generated *_drv.c, *_drv.h, *_drv.erl, or *_drv.hrl files. However, when I was creating the Makefile template, I didn't know that EDTK would be able to generate 100% of all driver glue code.
If this file-saving behavior bothers you, comment out or delete the "mv" lines in each of the 4 Makefile targets for those files.
For now, to make a minimum effort, change the lines marked "CHANGEME". Look at other XML config files in the "examples" directory if you are unsure about how to make these changes.
For now, to make a minimum effort, change the lines marked "CHANGEME". Look at other XML config files in the "examples" directory if you are unsure about how to make these changes.
EDTK specification files use XML syntax. A simple example file can be found in ../examples/simple0/simple0.xml. A more complex example can be found in ../examples/berkeley_db/berkeley_db.xml.
So far, the best documentation describing the XML tree structure and XML entity relationships can be found in:
No, I wasn't using XML simply to be "buzzword-compliant" or because so much of the world thinks that XML is the greatest tool since sliced bread or the ball-point pen. Many of those same people also think that XML is a programming language ... which goes to show that they are idiots.
I chose to use XML for a number of reasons. In no particular order, they include:
A value map, also called a "valmap", provides a way to safely hide things like memory pointers and file descriptors from Erlang. The EDTK-generated Erlang stubs are aware of the formatting of the tuple and use that knowledge to make certain that only a properly-typed value map tuple can be sent as an argument to a valmap-protected function. All other Erlang code must treat a valmap value as an opaque object.
An example from the SWIG documentation is helpful here. This Python code demonstrates a SWIG-generated interface for several standard C library function calls:
def filecopy(source,target): f1 = fopen(source, "r") f2 = fopen(target, "w") buffer = malloc(8192) nbytes = fread(buffer,8192,1,f1) while (nbytes > 0): fwrite(buffer,8192,1,f2) nbytes = fread(buffer,8192,1,f1) free(buffer)
Assume this example were translated directly into Erlang (ignoring the multiple-assignment usage of nbytes and buffer). If the Erlang process were to crash, it may leak up to two file descriptors, the hunk of memory that buffer points to, and perhaps some additional memory associated with the stdio FILE * pointers. If you want a fault-tolerant application to run non-stop for years, it is important to prevent resource leaks such as these.
The mapping of the fake valmap value to its "real" value is done by the driver. Therefore, when an Erlang process closes the port without calling the necessary clean-up functions, or if it exits unexpectedly, the driver knows exactly what clean-up functions must be called in order to avoid leaking important resources.
An Erlang version of the filecopy function, using EDTK value maps for the memory buffer and both stdio file pointers, might look something like this:
-module(filecopy). -define(DRV, sample_drv). -define(BUFSIZ, 8192). -export([copy/2]). copy(Src, Dst) -> {ok, Port} = ?DRV:start(), {ok, SrcF} = ?DRV:fopen(Port, Src, "r"), {ok, DstF} = ?DRV:fopen(Port, Dst, "w"), {ok, Buf} = ?DRV:malloc(Port, ?BUFSIZ), RFun = fun () -> ?DRV:fread(Port, Buf, 1, ?BUFSIZ, SrcF) end, WFun = fun (N) -> ?DRV:fwrite(Port, Buf, N, DstF) end, Val = copy2(RFun, WFun), %% Shutdown will automatically close files and free the buffer. ?DRV:shutdown(Port), Val. copy2(RFun, WFun) -> copy2(RFun, WFun, RFun()). copy2(RFun, WFun, {ok, N}) -> WFun(N), copy2(RFun, WFun); copy2(RFun, WFun, {error, 0}) -> ok; % End of file copy2(RFun, WFun, Error) -> Error.
While this implementation more-or-less directly mimics what the Python program does, it does not violate Erlang's single-assignment semantics. As Erlang sees it, the value of "Buf" never changes. The memory buffer that is associated with it, via the value map mechanism, does indeed change ... but that buffer is not accessible to Erlang and, therefore, does not violate single-assignment semantics.
Furthermore, the Erlang program can crash at any place in the execution of filecopy:copy/2, and the driver will automatically free any value mapped resources that the process had managed to allocate before its death.
There are three ways to control which Pthread will execute your library's extension function. If it's executed in the same Pthread as the Erlang Virtual Machine, it's called "synchronous" execution. If it's executed in a different Pthread (using a Pthread from the worker thread pool maintained by the VM), it's "asynchronous".
<hack place="post-deserialize" type="verbatim"> do_async_call = 0; /* Set to 1 for async behavior */ </hack>
Synchronous versus asynchronous execution has no meaning for pipe-style drivers: by definition, the execution of your extension function takes place in an external operating system process.
That's a good question. It depends on the error, doesn't it? :-)
If "make" fails, it's probably because you have something bogus in
your XML specification file. GSLgen does not provide very helpful
diagnostic messages. For example, if I have a typo in one of my
... you will see an error like:
You don't know very much, other than the generation of
"gd_drv.h" died. If you look at "gd_drv.h", you'll
see a single line:
That's more helpful, but only slightly. The GSLgen template file
that complained is called "../../edtk/htab-macros.gsl"; the
error occurred at line 22. The reason for the error is "Undefined
identifier: func.name".
"Undefined identifier" usually means that the XML specification
has omitted (or misspelled) a mandatory attribute name. The
"name" attribute of <func> objects is mandatory.
Your best bet is to try to find the <func> that's missing
a "name".
There are times where GSLgen has managed to generate part of a
file before the error occurs. I don't have any specific advice for
you, other than to look at the error message and at the line of GSLgen
schema file where the error happened and try to puzzle it out.
GSLgen's syntax is pretty simple. See the "gslgen20"
directory for an easy-to-access copy of the GSLgen 2.0
documentation. (NOTE: The GSLgen documentation is distributed with
EDTK for sake of convenience. See the file
README-gslgen20-dir for a bit more
explanation.)
Your XML specification file has more arguments than the function
blarf() really has. Double-check that you have the correct
number of <arg objects for the function. If you really do
wish to omit an argument from the Erlang side or the C side, make
certain you've used the attribute 'noerlcall="1"' or
'noccall="1"', respectively.
Most likely you've got the wrong "ctype" attribute on a
<arg> or <return> object. Examine your
driver's DriverName_drv.h file to check the data type your
"callstate_t" structure is using for the values going into
and out of your extension function -- there is a mismatch there,
somewhere. Fortunately, they're pretty easy to find and fix in your
XML spec.
One more subtle problem that can cause this compiler error is the
definition of two different functions that have an argument with the
same name but with different C types. EDTK will silently use the
first one in the definition of the "callstate_t" structure
and ignore subsequent attempts to define it with another type.
Solution: Change the "name" attribute of one of the <arg>
objects.
There are a few things that you can do here:
The "threadid" number gives you the ID number of the thread that
the Erlang VM is using. Subsequent messages from the "invoke_*"
functions will show a different ID when executing asynchronously.
The "cmd" value can be cross-referenced with the
"Driver<->emulator communication codes" found at the near the top of
"DriverName_drv.h" and "DriverName_drv.hrl". That
will give you some idea what the last driver function call was before
the test failed and closed the port.
Yes, there is. See the file
TUTORIAL for a tutorial that I (Scott) wrote
while using EDTK to develop the driver for
GD, a library for dynamic
creation of
PNG and
JPEG images.
I'll assume that you've already looked over all of the existing
documentation in the
top-level README file,
here in the
doc directory as well as in the
"examples" directory. Well, you've found
it all. Although it would be wonderful to have a catalog of all of
the EDTK XML objects, their attributes, and how they interact with
each other, there isn't such a document yet. Sorry!
<func>
<arg name="sx" ctype="int"/>
<arg name="sy" ctype="int"/>
<return ctype="gdImagePtr" name="ret_gdImagePtr"
valmap_name="imageptr" valmap_type="start"
expect="!= NULL" expect_errval="0"/>
</func>
env EDTK_DIR=../../edtk gslgen -script:../../edtk/c_h_template.gsl gd.xml > gd_drv.h
gslgen - Generalised Schema Language Generator V2.000 Beta 1
Copyright (c) 1996-2000 iMatix - http://www.imatix.com
gslgen I: Processing gd.xml...
make: *** [gd_drv.h] Error 255
gslgen W: (../../edtk/htab-macros.gsl 22) Undefined identifier: func.name
Scenario #2: C compiler says: too many
arguments to function 'blarf'
Scenario #3: C compiler says: passing arg 2
of 'flurbl' makes integer from pointer without a cast
Scenario #4: my regression test fails, but I
cannot determine where
_outputv: my threadid = 8122000, cmd = 8
Is there a tutorial document of some kind?
Is there a comprehensive reference of all
the XML objects and attributes that EDTK uses?