Eagle Pipeline GIS Blog

GUIDs as EventIDs in APDM / PODS ESRI Spatial Geodatabases

Posted by Tracy Thorleifson on Fri, Sep 03, 2010

One of the key goals of APDM 5 was to simplify manual data maintenance. Because PODS ESRI Spatial 5 borrows extensively from the APDM 5 core, many of the activities critical to the maintenance of an APDM 5 geodatabase also pertain to a PODS ESRI Spatial 5 geodatabase. This is the first of a series of practical articles dealing with manually populating an APDM 5 or PODS ESRI Spatial 5 geodatabase. This particular post deals with some necessary background information on key identifiers in the APDM / PODS ESRI Spatial. Future posts will get down to the nitty-gritty of data loading.

In an APDM or PODS ESRI Spatial geodatabase the primary key (or unique identifier) of each object / feature class is a field named EventID. You might ask, why not just use the geodatabase ObjectID? The answer is that ObjectIDs are not immutable; they can change when geodatabases are exported, copied or merged. If an ObjectID changes, all relationships that reference that ObjectID are broken. EventIDs never change, so relationships based on them cannot break.

The EventID field has a data type of GUID. GUID stands for Globally Unique ID (Identifier). Needless to say, in order to guarantee global uniqueness, a GUID cannot just be a simple integer (like an ArcMap ObjectID). Although the actual storage mechanism for GUIDs varies a bit depending on the underlying relational database, a GUID is actually a 16-byte integer value. This is a huge number. Given the randomness inherent in the GUID algorithm and the size of GUIDs, you could theoretically generate a million GUIDs per second for the next 100 years and still have only a one in two chance of generating a single duplicate.

GUIDs have been around and in use for a long time. For instance, every Microsoft COM class ever created is identified by a GUID, as a quick examination of the Windows registry will reveal:

Registry Editor ArcMap Identify Tool CLSID Entry

In ArcMap GUIDs appear as complex, thirty-eight character strings, e.g. {FF87DB81-8620-40D0-80FF-139286392ACD}. The curly brackets and dashes are just for (Microsoft CLSID-style) formatting; the remaining 32 alphanumeric characters are the actual hexadecimal representation of the GUID. (For this reason you'll never see a letter in a GUID higher than 'F.' Hexadecimal digits run from 0-9,A-F.)

On the plus side, GUIDs are guaranteed to be globally unique. This makes it easy to combine or merge different APDM or PODS ESRI Spatial geodatabases without any fear of primary key collisions. A bit more subtly, you don't actually have to request new key values for new records directly from the database; you can do it on the client side. This makes it very easy to create data in disconnected applications (e.g. field data collection) that can be loaded to the geodatabase with minimal massaging.

On the minus side, GUIDs are just a tad cumbersome; some might even say unaesthetic. There is also some underlying performance overhead with GUIDs. Keep in mind that a GUID requires 16 bytes of storage, whereas a long integer requires eight bytes and a regular integer only four bytes. For this reason, database indexes built on GUIDs are more voluminous than indexes built on integers. By extension, queries with joins based on GUIDs tend to be more expensive than queries with joins based on integers. Five to ten years ago when GUIDs first came into use as primary keys, a variety of legitimate performance concerns were raised. These concerns have largely been mitigated by hardware, operating system and database advances (particularly with Microsoft SQL Server). For even the largest pipeline databases, as long as you are running on current hardware and database software, query performance with GUIDs should not be an issue.

It ought to be fairly obvious that just banging out an arbitrary thirty-eight character string in no way guarantees a valid GUID value. So generating GUID values can be a royal pain in the derriere, unless you have the appropriate trick up you sleeve. The next article in the series teaches you that trick. And with that trick, generating GUIDs is a snap!

Topics: APDM, PODS Spatial, ArcGIS Pipeline Data Model, PODS