Recommendation: Basic I/O class types
Version: 1.0
This text contains a recommendation about how to use certain class
types to make libraries for the language Objective Caml more
interoperable. The recommendation is the result of a discussion
between Nicolas Cannasse (extlib), Gerd Stolpmann (ocamlnet), and
Yamagata Yoriyuki (camomile). The mentioned libraries will be/are
changed such that they support the recommendation.
The nature of the class system allows us to use objects of a class X
as instances of another class Y when the type of X is a subtype of
Y. It is not required that X and Y have an inheritance relation.
Moreover, X and Y can even be defined in different, separately
compiled libraries that do not know of each other. The latter
observation is the basis for this recommendation, as it becomes
possible to define interoperable class types by
convention. Especially, it is not necessary to have one formal
definition in the Objective Caml system to which the using libraries
refer. The formal definition is absent, and informally replaced by
this convention.
From an abstract point of view, we are going to establish a minimum
class type T that contains the I/O methods the authors regard as
relevant for this recommendation. A library will usually have
additional methods in a class type TL. Formally, this is
reflected by the requirement that TL is a subtype of T. The
language allows it then to coerce instances of TL to T.
Assumed we have two libraries L and L' defining I/O class types
TL, and TL', respectively, it is now possible to
use instances x of type TL in the context of L' by
For example, this allows us to XXX (fill in: use an ocamlnet channel
in the context of camomile - when the libraries are changed).
I/O of octets
Octet streams are the first kind of I/O classes defined by this
recommendation. They play a special role as all I/O must finally be
transformed into operations of octets, because the current operating
systems require this.
Formal type definition. The following class types are
recommended. The names of the class types are not normative, but the
names of the methods and their types are.
class type rec_octet_in_channel =
object
method input : string -> int -> int -> int
method close_in : unit -> unit
end
class type rec_octet_out_channel =
object
method output : string -> int -> int -> int
method flush : unit -> unit
method close_out : unit -> unit
end
Meaning. Both input and output channels have two states:
open and closed. It is outside the scope of this
recommendation how a channel is opened. When it is open, however, the
defined methods can be called, and have a defined meaning. When a
channel is closed, it is usually regarded as an error to invoke the
methods. It is not defined what happens in this case, but it is
suggested to raise a Failure or library-specific exception. The
implementation may even opt to silently ignore such calls.
There are two kinds of semantics: blocking I/O and non-blocking
I/O. An implementation must define which semantics it selects, or
the conditions under which a certain semantic model is effective.
Note that the following definition differs from the Unix API
as the "end of file condition" is indicated by an exception, and
the "operation would block condition" is expressed by the return
value 0!
- method input : string -> int -> int -> int:
- Data is read from the channel and put into the string. The first
int argument is the position in the string where to store
the octets, and the second int argument is the maximum
number of octets to read. The method returns the actual number of
octets read from the channel and put into the string. The
implementation is always free to read fewer octets than requested.
(This is even allowed when reading files from a local disk!) When
the end of the stream is reached, and there are no more octets that
could be read, the method raises the exception End_of_file.
An implementation for blocking I/O must at least read one octet,
or raise End_of_file. An implementation for non-blocking
I/O can return the value 0 to indicate that there are currently
no octets to read.
As a special case, when zero octets are requested for a blocking
implementation, the method must return the number 0.
- method close_in : unit -> unit:
- The input channel is closed.
- method output : string -> int -> int -> int:
- Data is taken from the string and written to the channel. The
first int argument is the position in the string where
the octets are passed, and the second int argument is
the number of octets to write. The method returns the actual
number of octets written to the channel. The implementation is
always free to write fewer octets than requested.
(This is even allowed when writing to files on a local disk!)
An implementation for blocking I/O must at least write one octet.
An implementation for non-blocking I/O can return the value 0 to
indicate that currently no octets can be processed.
As a special case, when zero octets are requested for a blocking
implementation, the method must return the number 0.
- method flush : unit -> unit:
- The implementation may choose that output does not
write directly to the underlying resource, but into a buffer.
In this case, the call of flush writes the contents of the
buffer to the resource. When there is no such buffer, the call does
nothing.
- method close_out : unit -> unit:
- The buffer, if any, is flushed, and the output channel is
closed.
Rationale. The authors examined a number of alternatives, and
came to the conclusion that this definition is the best for
general-purpose I/O channels. The reasons:
- The method names for the "close" operation are different for the
input and the output channels. The intention is to allow it that
classes can implement both types at the same time to model
bidirectional channels. For these it is a common requirement that the
both directions must be closed independently of each other.
- The method types for input and output follow the
"classic buffer design", i.e. one passes a substring as buffer for the
I/O operation. It was discussed whether simplified types can be used
instead where whole strings are passed to or from the
channel. However, it was feared that the memory management of the
runtime system would be too much stressed in this case. Actually, the
GC works extremely well when cleaning up small blocks, but it is still
expensive to collect large blocks which are typical of I/O.
- The choice of indicating "end of file" by an exception and the
"operation would block" condition by the return value 0 may seem
surprising as it deviates from the Unix tradition. This problem,
however, is virulent in the whole standard library of O'Caml, as the
lower-level interfaces follow the Unix API, and the higher-level
interfaces prefer End_of_file. So there must be a certain
layer where the low-level Unix conventions are given up, and the
higher-level conventions are introduced. The authors see this
recommendation as instance of such a layer, and thus prefer the
higher level of expression. Furthermore, we chose the return value 0
instead of Sys_blocked_io, as the value 0 naturally means
that no data are available.
- It is left open how to handle I/O errors. Implementations may
choose to retry I/O, or to throw exceptions, or to handle these
conditions in very specific ways. The reason for this omission is
simply that a definition of the possible I/O errors cannot be done on
the abstract level, as one would have to dig into the error conditions
of possible I/O resources. It is, however, the intention of this
recommendation to avoid any statement about the underlying
implementation beyond the abstract model.
Polymorphic I/O
Another case is that the I/O class is polymorphic in the elementary
(character-level) type subject to the I/O operations. This case is
especially handled to allow I/O of Unicode characters. We
intentionally define this case on the level of characters and not on
the level of the implied monoid to reduce complexity.
Formal type definition. The following class types are
recommended. The names of the class types are not normative, but the
names of the methods and their types are.
class type ['t] rec_poly_in_channel =
method get : unit -> 't
method close_in : unit -> unit
end
class type ['t] rec_poly_out_channel =
method put : 't -> unit
method flush : unit -> unit
method close_out : unit -> unit
end
Meaning. As in the octet case, the channels have the two
states open and closed. Again, we do not define how to
open a channel, and we let it up to the implementation how to handle
the case when methods of a closed channel are called.
Polymorphic channels always perform blocking I/O.
- method get : unit -> 't:
- A character is read from the channel and returned. When there is
no more character, and the channel is at its end, the exception
End_of_file is raised.
- method close_in : unit -> unit:
- The input channel is closed.
- method put : 't -> unit:
- The character passed as argument is written to the channel.
- method flush : unit -> unit:
- The implementation may choose that put does not
write directly to the underlying resource, but into a buffer.
In this case, the call of flush writes the contents of the
buffer to the resource. When there is no such buffer, the call does
nothing.
- method close_out : unit -> unit:
- The buffer, if any, is flushed, and the output channel is
closed.
Rationale. The authors examined a number of alternatives, and
came to the conclusion that this definition is the best for
general-purpose I/O channels. The reasons:
- The method names get and put are intentionally
different from the monomorphic, octet-only case. First, these methods
have different types, and this is a good justification to introduce
new names. Second, these methods work on the level of single characters,
and not on the level of the monoid. Third, the different names allows
it that an implementation supports all class types at the same time.
Gerd Stolpmann
Last modified: Fri May 28 09:27:52 CEST 2004