Erlang Central

Erlang ODBC Tips

From ErlangCentral Wiki

Contents

AUTHOR

Cole Fichter <cole.fichter at gmail.com>

Original text: http://erlang-odbc-tips.colefichter.ca/

Erlang ODBC Tips

A collection of helpful hints for developers using Erlang and ODBC. Using Erlang's ODBC (Open Database Connectivity) support to access a SQL database can be an exercise in frustration, but this guide will get you started in no time.

First Things First

This guide should be accessible to everyone, but we assume you have some experience building Erlang OTP applications. If you come across something you don't recognize, checkout the excellent Learn You Some Erlang on-line book.

If you work with Erlang long enough, you'll find yourself needing to connect to a SQL database to interact with a legacy system or to store data that demands strong consistency. To achieve this, you'll need to use ODBC. Erlang ships with native ODBC support as part of the standard OTP libraries, so it's available in every installation without any extra effort.

Where To Get Help

If you're using Erlang ODBC for the first time, you're going to need some extra help at some point (perhaps that's why you're here). Here are some good resources to check out if you get into a pickle:

Preparing To Connect

An ODBC connection requires a DSN (Data Source Name) to be configured on the system. Unfortunately, Erlang ODBC does not support file DSNs, so the DSN must be configured using the Windows ODBC configuration utilities.

Stata has published an excellent tutorial explaining how to set up a Windows Data Source Name. Be sure to follow the instructions for your version of Windows. Also, an ODBC DSN can be either a User DSN or a System DSN. A User DSN is only available to the user who created it, while a System DSN is available to all users of the local system.

Don't forget to configure a DSN on each environment to which you plan to deploy your compiled Erlang release! The lack of support for file DSNs means you cannot distribute the DSN configuration as part of an Erlang release package. Sadly, you'll need to perform this step manually for each machine on which you wish to run your application.

Connecting

The first step to using ODBC in Erlang is to declare a dependency on the ODBC OTP application and ensure that it has started. If you use a build system like Rebar you'll need to declare a dependency on odbc in the applications list inside your reltool.config file:

reltool.config

...
{rel, "MyApp", "1.0.0",
    [kernel,
     stdlib,
     odbc]
},
...

If you're using a bootstrap script for development, or you are manually starting the applications you need, simply add a line to start the odbc application to your initialization routine:

myapp_bootstrap.erl

-module(myapp_bootstrap).
-export([start/0]).
 
start() ->
    odbc:start(),   
    application:load(myapp),
    application:start(myapp).

A final alternative is to use a command line argument when starting the shell:

erl -s odbc start
You can ensure that the odbc OTP application is running by typing
application:which_applications().
into the Erlang shell and looking for a line like
{odbc,"Erlang ODBC application","2.10.18"}.

At this point, you should have a working environment with all the dependencies that you need to connect to a SQL database. The last step to get up and running is to gather your connection details into a connection string and attempt a simple SQL query.

The connection string requires three pieces of information: the name of the DSN that you configured earlier, the SQL login name, and the SQL password. Normally, we'd store this information in the application configuration file, but for this guide we'll declare it as a string variable for simplicity:
ConnectionString = "DSN=MyAppDSN;UID=me;PWD=secret"

The name of the DSN, in this case MyAppDSN must exactly match the name of the DSN that you created in the Windows ODBC tool. Of course the user name and password must be legitimate credentials for your SQL instance.

With the connection string ready, let's write a function to test our database connection with the simple query SELECT 1;:

myapp_sql_repo .erl

-module(myapp_sql_repo).
-export([test_connection/0]).
 
%%%==========================================
%%% Client API
%%%==========================================
test_connection() ->
    Sql = "SELECT 1;",
    Connection = connect(),
    case odbc:sql_query(Connection, Sql) of
        {selected, Columns, Results} ->
          io:format("Success!~n Columns: ~p~n Results: ~p~n", 
            [Columns, Results]),
          ok;
        {error, Reason} ->
            {error, Reason}                           
    end. 
 
%%%==========================================
%%% Internal functions
%%%==========================================
connect() ->
    ConnectionString = "DSN=MyAppDSN;UID=me;PWD=secret",    
    {ok, Conn} = odbc:connect(ConnectionString, []),
    Conn.

We now have everything we need to test our ODBC connection on the SQL database. Let's give it a try:

$>erl -s odbc start
Eshell V5.10  (abort with ^G)
1> myapp_sql_repo:test_connection().
Success!
 Columns: [[]]
 Results: [{1}]
ok

If something has gone wrong with SQL itself, you should see an {error, Reason} tuple with a relevant error message. Another common problem may occur if something is wrong with your connection string, preventing your program from establishing an ODBC connection. In that case, you'll see a message similar to:

$>erl -s odbc start
Eshell V5.10  (abort with ^G)
1> myapp_sql_repo:test_connection().             
** exception error: no match of right hand side value {error,
                                                       "[Microsoft][ODBC Driver
Manager] Data source name not found and no default driver specified SQLSTATE IS:
 IM002 Connection to database failed."}
     in function  myapp_sql_repo:connect/0 (myapp_sql_repo.erl, line 33)
     in call from myapp_sql_repo:test_connection/0 (myapp_sql_repo.erl, line 9)
4>

The message clearly indicates that my DSN name (in the connection string) is incorrect.

Gotchas (These will bite you in the butt)

There are plenty of wonky things in the Erlang ODBC library that can rear up and bite you without warning. They are often poorly documented and difficult to troubleshoot. This section contains a collection of errors that we've seen in the wild, along with some advice in case you experience them.

64-Bit/32-Bit Architecture Mismatch

Setting up a 32-bit DSN Setting up a 64-bit DSN On a 64-bit Windows machine, it's possible to setup both 32-bit and 64-bit ODBC DSNs in the data sources setup dialog. This can potentially create a problem if you've installed a 32-bit build of Erlang/OTP, but you are trying to connect to a 64-bit ODBC DSN.

The first step to preventing or fixing this issue is to determine what type of Erlang build you're running, as this will determine which type of DSN you need. On Windows, running the werl program will open an Erlang shell including the details of the build:

$>werl
Erlang R16A (erts-5.10) [smp:8:8] [async-threads:10]
 
Eshell V5.10  (abort with ^G)
1> 

Note that the first line contains the Erlang version, R16A, and the ERTS version, 5.10, but there is no indication of the architecture. That means this system is running a 32-bit build of Erlang and will need a 32-bit DSN. A 64-bit Erlang build will be clearly indicated with [64-bit] along with the other details:

>werl
Erlang R16A (erts-5.10) [64-bit] [smp:8:8] [async-threads:10]
 
Eshell V5.10  (abort with ^G)
1> 

Knowing exactly what architecture you need, you can now setup the matching type of DSN. Note that Windows has separate set up utilities for each, so you need to launch the correct one.

  • For a 64-bit DSN (on 64-bit Windows) run
    c:\windows\system32\odbcad32.exe
  • For a 32-bit DSN (on 64-bit Windows) run
    c:\windows\sysWOW64\odbcad32.exe
  • When searching for ODBC in the start menu, both utilities are clearly labeled.

Looking at those paths, it seems counter-intuitive that the 32-bit config tool would be in a folder called sysWOW64, but there you have it.

If you've already set up a DSN, but you've inadvertently used the wrong architecture, you'll see an architecture mismatch error when trying to connect:

$>erl -s odbc start
Eshell V5.10  (abort with ^G)
1> c(myapp_sql_repo).
{ok,myapp_sql_repo}
2> myapp_sql_repo:test_connection().
** exception error: no match of right hand side value {error,
                                                       "[Microsoft][ODBC Driver
Manager] The specified DSN contains an architecture mismatch between the Driver
and Application SQLSTATE IS: IM014 Connection to database failed."}
     in function  myapp_sql_repo:connect/0 (myapp_sql_repo.erl, line 33)
     in call from myapp_sql_repo:test_connection/0 (myapp_sql_repo.erl, line 9)
3>

If you encounter this error, simply replace the DSN with a new one using the correct architecture for your Erlang installation.

Row Counts and Pattern Matching

When you have a stored procedure or batch of SQL statements that include statements which return counts rather than result sets (for example, UPDATE, INSERT, DELETE) you may find that the returned record counts break the pattern matching in your Erlang code.

Suppose an application needs to insert a new employee record into an employee table, and then return the updated list of all employees:

create_employee(Name) ->
    Params = [{{sql_varchar, length(Name)}, [Name]}],
    Sql = "INSERT INTO employee(name) VALUES(?); SELECT * FROM employee;",
    Connection = connect(),
    case odbc:param_query(Connection, Sql, Params) of
        {selected, Columns, Results} ->
          % Results now contains the employee list
          ok;
        {error, Reason} ->
            {error, Reason}                           
    end. 

This code looks great, but when it runs it will produce a bad_match error in the case because the first item returned in the overall result set is the counter indicating that 1 row was inserted before the SELECT statement was executed. To fix the problem, add a NOCOUNT directive to prevent the counter from being returned:

create_employee(Name) ->
    Params = [{{sql_varchar, length(Name)}, [Name]}],
    Sql = "SET NOCOUNT ON; INSERT INTO employee(name) VALUES(?); SELECT * FROM employee;",
    Connection = connect(),
    case odbc:param_query(Connection, Sql, Params) of
        {selected, Columns, Results} ->
          % Results now contains the employee list
          ok;
        {error, Reason} ->
            {error, Reason}                           
    end. 

This issue crops up frequently when using stored procedures, since the whole point of a procedure is to encapsulate several operations into a single, concise command. Fix it in the same way by adding SET NOCOUNT ON inside your stored procedure before you begin running other SQL commands.

Buffer Overflows and String Truncation

One handy feature of Erlang is that it implements strings internally as linked lists of integer character codes, allowing strings of arbitrarily long length. We don't often have to think about string lengths, but there is one case when it is critically import to length check, and truncate, strings: when passing them to a SQL database through ODBC.

Since the Erlang ODBC tool relies on a driver written in unmanaged C code, you absolutely have to be sure of the string length when passing strings as parameters to a query. If you pass a string that is longer than you've indicated as the SQL parameter length, a buffer overflow will occur outside the Erlang VM, causing the entire runtime to block.

The following example query will cause a buffer overflow to crash the runtime because the SQL parameter is declared as VARCHAR(3) but the parameter value 'Canada' is six characters long:

cause_buffer_overflow() -> %Store the full country name into a field expecting an ISO code.
    Params = [{{sql_varchar, 3}, ['Canada']}],
    Sql = "INSERT INTO country(iso_code) VALUES(?);",
    Connection = connect(),
    odbc:param_query(Connection, Sql, Params).

A buffer overflow in the ODBC server. When we run this code, a Windows OS dialog box will alert us of the crash, and the Erlang runtime system will block until the dialog box is dismissed. I've found that on production servers, the Erlang system simply waits until a human logs in to check on things (which could be several days after the error) and then must be restarted anyway to resume normal operation.

To prevent this problem, you must be absolutely sure that you never pass a parameter value whose length exceeds the declared length of the SQL parameter. Often, that means you'll need to truncate strings in your Erlang code, since the SQL type (and maximum length) has already been determined in the database:

cause_buffer_overflow() ->
    Country = truncate('Canada', 3), %Prevents buffer overflow!
    Params = [{{sql_varchar, 3}, [Country]}],
    Sql = "INSERT INTO country(iso_code) VALUES(?);",
    Connection = connect(),
    odbc:param_query(Connection, Sql, Params).
 
truncate(_Any, 0) ->
    [];
truncate(String, MaxLength) when
        is_list(String), 
        is_integer(MaxLength) ->
    case MaxLength < 0 of
        true -> "";
        false ->
            truncate(String, length(String), MaxLength)
    end.
 
truncate(String, Length, MaxLength) when
        is_list(String), 
        is_integer(Length), 
        is_integer(MaxLength) ->
    case Length =< MaxLength of
        true -> String;
        false ->
            string:sub_string(String, 1, MaxLength)
    end.

Unicode Character Encodings

A wise man once wrote that "it does not make sense to have a string without knowing what encoding it uses." This goes doubly when trying to send strings as SQL parameters from your Erlang application to a SQL database. In particular, the default endianness of Unicode characters will likely be different in your Erlang system and your SQL server installation, so you must take care to properly encode Unicode strings before storing them. Compare the following two SQL parameters created in an Erlang program:

Params = [
  %First parameter: URL as a plain ASCII string
  {
    {sql_varchar, ?URL_MAX_LENGTH}, 
    [string_ext:truncate(Url, ?URL_MAX_LENGTH)]
  },
  %Second parameter: Content as a Unicode string
  {
    {sql_wlongvarchar, byte_size(Content)}, 
    [unicode:characters_to_binary(Content, Encoding, {utf16, little})]
  }
 ]

The first parameter, Url, is treated as a plain old ASCII string in both systems, so it doesn't require any special treatment to store the correct value. The second parameter, Content, is a different case. Because Erlang and SQL Server store Unicode values in different binary formats, we need to manually convert the Erlang representation to the correct format for SQL by calling characters_to_binary/3. Note that here, the Encoding variable contains an atom representing the detected encoding of the Content string. For example, if this was part of a web crawling system, you might detect the current encoding from the response headers or the page meta fields.

Support for NVARCHAR(MAX)

It's not immediately obvious from reading the ODBC reference manual how to correctly handle a SQL NVARCHAR(MAX) column in Erlang. To do so is actually fairly easy. First, make sure that your string is not longer than 2^31-1 bytes (2 GB), which is the maximum size allowed in SQL Server. Next, declare your SQL parameter as a sql_wlongvarchar with the length set to the string length. For example, the following Unicode string is correctly stored in SQL Server:

Params = [                
  {
    {sql_wlongvarchar, byte_size(Content)}, 
    [unicode:characters_to_binary(Content, utf8, {utf16, little})]
  }
 ]

Note that using VARCHAR(MAX) rather than NVARCHAR(MAX) is not feasible since the support for this type, though documented, does not actually seem to work in practice. (If you have figured out how to make it work, please let me know!)

Sequentializing IO Calls Anti-Pattern

When starting out with Erlang ODBC it can be tempting to sequentialize calls to the database. Suppose that you have a large group of working processes that all need to store results to the database when they finish computing, and so you decide to implement a repository as a gen_server that will accept messages from the workers and store the results of these computations in the database one at a time.

Doing so will usually lead to the repository's mailbox being flooded with messages, since the database IO is almost certainly a longer operation than the computations. The more worker processes you have, the worse the situation becomes. For example, with 10 workers the database call must take no longer than 1/10th as long as each computation, for your repository to keep up over the long term.

If you find yourself considering this approach, remember that SQL Server is really good at handling concurrent operations. By contrast, Erlang has no built-in mechanism to limit or handle an overflowing process mailbox. By simply making the database call from inside each worker process, you introduce natural rate limiting into the system and avoid swamping any single individual process (you also avoid introducing the complexity of rate limiting, back-off, and other similar schemes into your application).

Help Improve This Guide!

Have you found a typo or something that isn't as clear as it could be? Perhaps you have a tip of your own to add to the guide? We would be most grateful if you would submit a pull request to the github repository for this guide.

Erlang ODBC Tips by Cole Fichter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.