using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Constructing an IMAP Server from Scratch - Part 1

Implementing Server-Side RFC 3501 (IMAP) in F# (Part 1)

So it's been a while, I've had little free time over the last couple months (December through February are my extrememly busy period), but I want to start a new project today, inspired by a complete failure of my existing mail provider to work properly.

Start with RTFM (or RFC's, in this case)

Now IMAP, as a protocol, is extraordinarily complicated. It is primarily detailed by RFC 3501, but many other RFC's have come in over the years to update it. As a result, we have 23 RFC's that relate to IMAP directly, and several more indirectly. I'm going to list the most important ones (in numeric order) first, then the extraneious ones.

Primarily, we have 18 RFC's to work with: 1730, 2060, 2061, 2088, 2342, 3501, 3502, 3516, 4466, 4469, 4551, 5032, 5182, 5738, 6237, 6855, 7162, and 7377.

We also have 5 additional RFC's for MIME that are extremely important: 2045. 2046, 2047, 2048, and 2049.

Additionally, there are 7 other RFC's that are directly relevant, though we can ignore for now: 2595, 5068, 5550, 6186, 6858, 7817, and 8314.

Finally, we have 6 other RFC's we'll need to think about: 2822, 5322, 5335, 5336, 6531, and 6854.


I don't expect you to read all of these, but you should know where to find them. We'll actually deal with most of the 36 RFC's detailed here tangentially, and I won't go into much detail on what parts of which RFC's we've implemented. (Though I haven't decided on that, yet.) The RFC rabbit-hole on IMAP is deep, and as a result we will need to be aware of issues we might run into that aren't detailed here. (We may discover new RFC's pertaining to what we're doing, for example.)

What is IMAP?

So let's back our complicated story up a little bit. First, let's talk about what is and isn't IMAP.

IMAP is the Internet Message Access Protocol, a protocol designed to allow a client (such as Outlook, Thunderbird, etc.) to retreive and update Internet Messages (Email's) associated with an account. This means you specify the account and credentials, the client uses those to perform IMAP interactions, and then displays those results to you.

This protocol does not specify how or where those messages must be stored (server or client side), nor how one can send messages to another Email box (which is typically SMTP / relays), it only specifies how a client system should access, modify, and remove Emails that it owns.

The short and simple: this means we can build a generic IMAP access protocol, which can then be used to interact with another service (or .NET library) for storage of the physical message.

Getting Started: Understanding Requirements

We'll begin our adventure down this IMAP server path by reading some of the RFC information. Fortunately, we won't read all of it, but we do have quite a bit of information to digest.

One of the more basic requirements we'll want to facilitate with our design is the transport-layer protocol and port, which happens to be TCP Port 143. This is found in RFC 3501 §2.1.

Another interesting note is the definition of a command / line, in RFC 3501 §2.2. By definition, a line ends in CRLF, and data is either a line, or a known length of octets (8-bit bytes) followed by a line.

Further in this same section, we are told that transferring data from Server to Client can happen on request, or be initiated by the server. We also learn that a client should generate an alpha-numeric "tag" which indicates which command the server is responding to. (There is no fixed syntax for a tag, just that it may only be alpha-numeric.)

Further reading tells us that there are a couple other items and states we'll need to keep. Particularly:

  • For each mailbox we must have a UIDVALIDITY value that is unique to that mailbox;
  • For each message, we must have a UID value that is unique to that message;

The combination of the mailbox name, UIDVALIDITY, and UID must describe a single message in the mail server. Each of these values is a 32-bit integer.


While there are many, many more requirements, I have set a goal to write code in every blog-post in this new series. As a result, I want to dive into writing, at the very least, a single-threading server for clients to connect to.

There are many examples out there of doing OpenSendReceiveClose, but that's not what we want. With IMAP, we need to do an OpenLoop (SendReceive) → Close. This obviously complicates things, especially once we get into mutliple concurrent connections.

So, to start this, we're going to build a demo client/server to get our infrastructure together. We're going to build one that allows us to send messages to the server, and pipe other messages back to the client. (Similar to a chat, but without message sharing.) This will allow us to build and demonstrate all the pieces of the IMAP infrastructure. (Many concurrent, connected clients, etc.) We're going to use F# Async-Computations to do so, which are actually really fun.

Let's start building a server!

Raw TCP/IP is a very difficult task to set out to implement without an understanding of how TCP/IP works. I'm not going to go into any of that detail, as I want to get right into writing some code, but the basics are that a TCP/IP connection is a "Socket", the socket is opened, and remains open until closed by the client or server. TCP/IP sockets are also bidirectional, meaning that at any given point the client and the server both reserve the right to unilaterally send data or close the connection.

We'll start with the TCP Server. This is an easy enough build, and can be abstracted to a pretty simple API. But before we do that, I want to make a few extensions to .NET.

Boilerplate

One of the pain-points (for me) with .NET is that there is no way to read an entire response from a raw TCP socket. For some reason (probably a good one), the designers left that part out. As a result, I always write some modifications to System.Net.Sockets.NetworkStream and System.Net.Sockets.Socket to make them more friendly:

module EBrown.Tcp.NetworkStream
open System.Net.Sockets

/// Add definitions to the `System.Net.Sockets.NetworkStream`
type NetworkStream with
    member stream.AsyncReceive (buffer : byte array, ?offset, ?count) =
        Async.FromBeginEnd(
            buffer, 
            defaultArg offset 0, 
            defaultArg count buffer.Length, 
            stream.BeginRead, 
            stream.EndRead)
    member stream.AsyncReceiveAll () =
        let rec receive buffer =
            async {
                let tempBuffer = 1024 |> Array.zeroCreate
                let! bytesReceived = tempBuffer |> stream.AsyncReceive
                let buffer = [|buffer; tempBuffer.[0..bytesReceived - 1]|] |> Array.concat
                if bytesReceived < tempBuffer.Length then return buffer else return! buffer |> receive }
        [||] |> receive
    member stream.ReadAll () =
        let rec receive buffer =
            let tempBuffer = 1024 |> Array.zeroCreate
            let bytes = (tempBuffer, 0, tempBuffer.Length) |> stream.Read
            let buffer = [|buffer; tempBuffer.[0..bytes - 1]|] |> Array.concat
            if bytes < tempBuffer.Length then buffer else buffer |> receive
        [||] |> receive
    member stream.Write (buffer : byte array) = stream.Write(buffer, 0, buffer.Length)

The first thing you see is AsyncRead, which performs an asynchronous call to BeginRead and EndRead, to do the entire thing in one go. This seems to be the most preferred way of doing .NET async operations at this level. The next line starts some interesting syntax: we define an AsyncReadAll function which will do an AsyncRead call until it has no more data. You might not have seen async { ... } before, or let! or return!, but they're F# Asynchronous Computation Expressions. I won't go into detail (because I don't really understand how they work, just how to use them) but we'll use them all over the place here.

The basics you need to know are that in an async block, you can make calls to other async code, and bind that. The let! (let bang) keyword will call to an async function, and bind the result, ONCE the result is ready. If the result isn't ready, then this function execution is paused and the thread is used elsewhere. The return! (return bang) will return the result of the async expression, after it's ready.

This is important to know because the following two lines are not the same:

let bytesReceived = tempBuffer |> stream.AsyncReceive
let! bytesReceived = tempBuffer |> stream.AsyncReceive

The first line binds an Async<int> to bytesReceived, whereas the second binds an int to bytesReceived. The first value is just a function call, that has not yet been started. The second calls the function and then yields the thread until the result is ready.

You then see that the NetworkStream has a ReadAll, which does a Read until no more data, and a Write, which does a stream.Write with a buffer.

Next we want to extend Socket to make life easier:

module EBrown.Tcp.Socket
open System.Net.Sockets

/// Add definitions to the `System.Net.Sockets.Socket`
type Socket with
    member socket.AsyncAccept () = Async.FromBeginEnd(socket.BeginAccept, socket.EndAccept)
    member socket.AsyncReceive (buffer : byte array, ?offset, ?count) =
        Async.FromBeginEnd(
            buffer, 
            defaultArg offset 0, 
            defaultArg count buffer.Length, 
            (fun (buffer, offset, size, callback, state) -> socket.BeginReceive(buffer, offset, size, SocketFlags.None, callback, state)), 
            socket.EndReceive)
    member socket.AsyncSend (buffer : byte array, ?offset, ?count) =
        Async.FromBeginEnd(
            buffer,
            defaultArg offset 0,
            defaultArg count buffer.Length,
            (fun (buffer, offset, size, callback, state) -> socket.BeginSend(buffer, offset, size, SocketFlags.None, callback, state)),
            socket.EndSend)
    member socket.AsyncReceiveAll () =
        let rec receive buffer =
            async {
                let tempBuffer = 1024 |> Array.zeroCreate
                let! bytesReceived = tempBuffer |> socket.AsyncReceive
                let buffer = [|buffer; tempBuffer.[0..bytesReceived - 1]|] |> Array.concat
                if bytesReceived < tempBuffer.Length then return buffer else return! buffer |> receive }
        [||] |> receive

This is the same as the NetworkStream for the most part, so I won't explain it.

Build a TCP Server

We're going to build two servers here, we'll build a TCP/IP server, and then an IMAP server. We'll do this because the raw TCP/IP server is pretty simple, and can be used for other servers as well.

A TCP/IP server needs two data-points to bind to: an IP Address, and a Port. These two values, when concatenated with a colon (:) make up a "socket". The server has a socket, and the client has a socket.

So one of the first things we need is the server endpoint: let endpoint = (ipAddress, port) |> IPEndPoint

So that's easy. The next thing we need is a server. In .NET we can accomplish a low-level TCP/IP server by using the Socket class (which we extended above), and binding. To start, we define our Socket:

let server = new Socket (AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)

This builds a TCP/IP socket (which we named the server) to the InterNetwork address family (IP), the Stream socket type (raw bytes to and from), and the Tcp protocol (the TCP part of TCP/IP). Thus, we can use a stream to send and receive data.

Next, we bind:

(ipAddress, port) |> IPEndPoint |> server.Bind

Because I'm lazy, I bound and created the endpoint in the same line. This tells the server that it will be dealing specifically on the IP Address and Port we provided.

After binding, we listen:

SocketOptionName.MaxConnections |> int |> server.Listen

This builds a server with the MaxConnections configuration (allowing the maximum number of connections in the backlog) and then starts listening on the socket we asked for.

Once we've done that, we need to do the hard part. We need to do something with inbound connections, because by default the server doesn't actually do anything.

Listening for a new connection can be done with the socket.AsyncAccept function we built earlier, and because it's async we can ignore the threading issues (for the most part) and bind directly to it:

let! socket = () |> server.AsyncAccept

Now we won't get anything into socket until someone tries to connect, so we can then start working with that connection:

try Async.Start(socket |> events.Connect, cancellationToken = cts.Token)
with e -> ()

Now, what I did here is attempt to spawn a new asynchronous computation expression to handle the connection, it's not the job of the TCP/IP server to track that. We also passed a cancellation token in case we want to stop that expression later.

We'll then wrap that in an async { } block, and bind it to a function. If we include some basic logging, we have:

let cts = new CancellationTokenSource ()
let server = new Socket (AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
(ipAddress, port) |> IPEndPoint |> server.Bind
SocketOptionName.MaxConnections |> int |> server.Listen
(() |> ipAddress.ToString, port) ||> printfn "Started listening on %s:%d"

let rec waitForConnection () = 
    async {
        printfn "Waiting for connection..."
        let! socket = () |> server.AsyncAccept
        () |> socket.RemoteEndPoint.ToString |> printfn "Socket connected: %s"
        try Async.Start(socket |> events.Connect, cancellationToken = cts.Token)
        with e -> e.ToString() |> printfn "An error occurred: %s"
        return! () |> waitForConnection }

Good good, progress. Next, we want to unconditionally start the waitForConnection loop:

Async.Start(() |> waitForConnection, cancellationToken = cts.Token)

And finally, we'll build an IDisposable to properly handle closing the server:

{ new IDisposable with
    member this.Dispose () =
        () |> events.Close
        () |> cts.Cancel
        () |> server.Close
        () |> cts.Dispose
        () |> server.Dispose }

One of the things not defined here is events. I built a custom record for events the server can perform, which is extremely simple:

type ServerEvents =
    { Connect : Socket -> Async<unit>
      Close : unit -> unit }

Basically, we have a Connect and Close event.

All-in-all, if we put everything together and build a couple functions in a type, we have:

type Server () =
    static member StartI events port (ipAddress : IPAddress) =
        let cts = new CancellationTokenSource ()
        let server = new Socket (AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
        (ipAddress, port) |> IPEndPoint |> server.Bind
        SocketOptionName.MaxConnections |> int |> server.Listen
        (() |> ipAddress.ToString, port) ||> printfn "Started listening on %s:%d"

        let rec waitForConnection () = 
            async {
                printfn "Waiting for connection..."
                let! socket = () |> server.AsyncAccept
                () |> socket.RemoteEndPoint.ToString |> printfn "Socket connected: %s"
                try Async.Start(socket |> events.Connect, cancellationToken = cts.Token)
                with e -> e.ToString() |> printfn "An error occurred: %s"
                return! () |> waitForConnection }
        Async.Start(() |> waitForConnection, cancellationToken = cts.Token)

        { new IDisposable with
            member this.Dispose () =
                () |> events.Close
                () |> cts.Cancel
                () |> server.Close
                () |> cts.Dispose
                () |> server.Dispose }
    static member Start events port = (events, port, IPAddress.Any) |||> Server.StartI 

Which is a fully-functional TCP/IP server that does nothing, as of yet.

Building the basic IMAP Server

Our basic IMAP server won't comply with any parts of the protocol, but it will function as an end-to-end test of TCP/IP and the server we created. The IMAP server will use a TCP/IP server instance from the previous class to handle the raw TCP/IP portions, then will use a mutable list of connected sockets (including a basic User ID) to manage the connections. It will also send a BYE on close of the server to all connected sockets, thus allowing us to tell clients we're not there anymore, cleanly.

To start this process, we'll actually define the Start function of the server:

member this.Start () = EBrown.Tcp.Server.Start { Connect = this.OnConnect; Close = this.OnClose } 143

This helps push us to the right direction of where to go next. We need to define OnConnect and OnClose, which will be called by the TCP/IP server.

The OnConnect function really only has one job: accept a connected client, spawn the async computation for it, and add it to the list of all sockets. So for that, we'll define a pretty brief function:

member this.OnConnect (socket : Socket) =
    async {
        try
            lock sockets (fun () -> sockets <- (socket, None)::sockets)
            sockets |> List.length |> printfn "Sockets open: %i"
            return! socket |> runClient
        finally
            lock sockets (fun () -> sockets <- sockets |> List.filter (fst >> (<>) socket))
            SocketShutdown.Both |> socket.Shutdown |> socket.Close
            sockets |> List.length |> printfn "Sockets open: %i" }

This function just takes a socket, indicates that it's open, and then runs the async computation for it. When it closes, it removes the socket from the open pool. Pretty simple and basic. What we see here, that we haven't yet written, is runClient.

The runClient function is a little more complex, as it will actually do some things in our IMAP server. For now, all we want it to do is echo back what the user sent over the socket, when the user sends data. For this, we'll use AsyncReceiveAll, and send back with AsyncSend:

let rec runClient (socket : Socket) =
    let printfn = printfn "Socket %s: %s" (socket.RemoteEndPoint.ToString())
    async {
        let! buffer = () |> socket.AsyncReceiveAll
        let str = buffer |> Encoding.ASCII.GetString
        sprintf "Received (%i bytes): %s" buffer.Length str |> printfn
        if str.Length = 3 && str = "BYE" then printfn "Disconnected"
        else
            let! bytesSent = [| "Hello, other world!"B |] |> Array.concat |> socket.AsyncSend
            bytesSent |> sprintf "Sent response (%i bytes)" |> printfn
            return! socket |> runClient }

The goal here, is to loop until we have to kill the socket (for any reason). Currently, the only valid reason is a BYE, but there may be others at some point. (Such as errors, authentication failures, etc.)

Finally, we need the OnClose function, which should kill the server and tell all clients it's gone. This is actually really simple:

member this.OnClose () =
    lock sockets (fun () ->
        sockets
        |> List.toArray
        |> Array.filter (fun (sock, _) -> sock.Connected)
        |> Array.map (fun (sock, user) -> [| "BYE"B |] |> Array.concat |> sock.AsyncSend)
        |> Async.Parallel
        |> Async.RunSynchronously
        |> ignore)

If we put it all together, and add our mutable sockets, we get:

type User = { Id : string }

type Server () =
    let mutable sockets : (Socket * User option) list = []
    let rec runClient (socket : Socket) =
        let printfn = printfn "Socket %s: %s" (socket.RemoteEndPoint.ToString())
        async {
            let! buffer = () |> socket.AsyncReceiveAll
            let str = buffer |> Encoding.ASCII.GetString
            sprintf "Received (%i bytes): %s" buffer.Length str |> printfn
            if str.Length = 3 && str = "BYE" then printfn "Disconnected"
            else
                let! bytesSent = [| "Hello, other world!"B |] |> Array.concat |> socket.AsyncSend
                bytesSent |> sprintf "Sent response (%i bytes)" |> printfn
                return! socket |> runClient }
    member this.OnConnect (socket : Socket) =
        async {
            try
                lock sockets (fun () -> sockets <- (socket, None)::sockets)
                sockets |> List.length |> printfn "Sockets open: %i"
                return! socket |> runClient
            finally
                lock sockets (fun () -> sockets <- sockets |> List.filter (fst >> (<>) socket))
                SocketShutdown.Both |> socket.Shutdown |> socket.Close
                sockets |> List.length |> printfn "Sockets open: %i" }
    member this.OnClose () =
        lock sockets (fun () ->
            sockets
            |> List.toArray
            |> Array.filter (fun (sock, _) -> sock.Connected)
            |> Array.map (fun (sock, user) -> [| "BYE"B |] |> Array.concat |> sock.AsyncSend)
            |> Async.Parallel
            |> Async.RunSynchronously
            |> ignore)
    member this.Start () = EBrown.Tcp.Server.Start { Connect = this.OnConnect; Close = this.OnClose } 143

So now we want to test it.

Testing our Server

To test our server we need to start it, and build a client to connect to it. Both are trivial:

use server = Server().Start()
Console.ReadLine() |> ignore
printfn "Closing..."

This runs the server, forever. It will stay online until the user presses "Enter".

The client is much simpler than the server, and is just a few lines of code:

printfn "Press [Enter] / [Return] to quit, any other character to send data."
let cts = new CancellationTokenSource ()
use client = new TcpClient("127.0.0.1", 143)
printfn "Connected to %s" (client.Client.RemoteEndPoint.ToString())
use stream = client.GetStream()

let sendData = 
    async {
        while (Console.ReadKey().Key <> ConsoleKey.Enter) do
            if not cts.IsCancellationRequested then
                printfn ""
                [| "Hello world!"B |] |> Array.concat |> stream.Write
        [| "BYE"B |] |> Array.concat |> stream.Write
        printfn "Disconnected" }
let receiveData = 
    async {
        let rec loop () =
            async {
                let! bytes = () |> stream.AsyncReadAll
                let str = bytes |> Encoding.ASCII.GetString
                printfn "Received %i bytes: %s" bytes.Length str
                if str.Length = 3 && str = "BYE" then printfn "Disconnected, press any key to exit."; cts.Cancel() else return! () |> loop }
        return! () |> loop }
Async.Start(receiveData, cancellationToken = cts.Token)
try Async.RunSynchronously(sendData, cancellationToken = cts.Token)
with | :? OperationCanceledException -> () | e -> printfn "%s" (e.ToString()); printfn "Press enter to exit..."; Console.ReadLine() |> ignore

We're not going to do client-side work at all, so I won't detail anything about it, but it's really quite trivial.

If we build out two console applications, throw this code in and make it work, we can test it and see the result:

Client:

Press [Enter] / [Return] to quit, any other character to send data.
Connected to 127.0.0.1:143
a
Received 19 bytes: Hello, other world!
Disconnected

Server:

Started listening on 0.0.0.0:143
Waiting for connection...
Socket connected: 127.0.0.1:57309
Waiting for connection...
Sockets open: 1
Socket 127.0.0.1:57309: Received (12 bytes): Hello world!
Socket 127.0.0.1:57309: Sent response (19 bytes)
Socket 127.0.0.1:57309: Received (3 bytes): BYE
Socket 127.0.0.1:57309: Disconnected
Sockets open: 0

And there we have it. We've built bi-directional TCP/IP in a small amount of F# code, and can now begin the adventure of building the actual IMAP server.


If you are lazy, like me, and want this project for free, the entire thing will be available on GitHub. The version for this specific blog post is tree 8d269692ab169806e87b59700d69853b1d2eb1ff.


The blog post today is brought to you buy a significant amount of Coca-Cola and Fritos. I do not recommend either of these products unless you hate yourself (like me) and are trying to gain a lot of unnecessary weight. What I do recommend, however, is that you try to enjoy yourself as much as possible, and ignore what anyone else says about that. (You do you, basically.)

It's been a short while, as I took a longer-than-expected hiatus after the F# Advent of Code, but I'm back and we're going to work on a fun little project that involves a lot of detail and such.

Loading