This chapter describes how the Caché Object and SQL engines make
use of multi-dimensional storage (globals) for storing persistent objects,
relational tables, and indices.
Though the Caché Object and SQL engines automatically provide
and manage data storage structures it can be useful to understand the details
of how this works.
The storage structures used by the object and relational view of data
are identical. For simplicity, this document only describes storage from the
object perspective.
Every persistent class that uses the
%CacheStorage storage
class (the default) has the ability to store instances of itself within the
Caché database using one or more nodes of multi-dimensional storage
(globals).
Every persistent class has a storage definition that defines how its
properties are stored within global nodes. This storage definition (referred
to as
Default Structure) is managed automatically by the Class
Compiler. (You can modify this storage definition or even provide alternate
versions of it if you like. This is not discussed in this document).
The default structure used for storing persistent objects is quite simple:
-
Data is stored in a global whose name starts with the complete
class name (including package name). A
D is appended to form
the name of the data global, while an
I is appended for the
Index global.
-
Data for each instance is stored within a single nodes of
the data global with all non-transient properties placed within a
$List structure.
-
Each node in the data global is subscripted by Object ID value.
By default, Object ID values are integers provided by invoking the
$Increment function on a counter node stored
at the root (with no subscript) of the data global.
For example, suppose we define a simple persistent class,
MyApp.Person,
with two literal properties:
Class MyApp.Person Extends %Persistent [ClassType = persistent]
{
Property Name As %String;
Property Age As %Integer;
}
If we create and save two instances of this class, the resulting global
will be similar to:
^MyApp.PersonD = 2 // counter node
^MyApp.PersonD(1) = $LB("",530,"Abraham")
^MyApp.PersonD(2) = $LB("",680,"Noah")
Note that the first piece of the $List structure stored in each node
is empty; this is reserved for a class name. If we define any subclasses of
this
Person class, this slot will contain the subclass
name. The
%OpenId method (provided by the
%Persistent class)
uses this information to polymorphically open the correct type of object when
multiple objects are stored within the same extent. This slot shows up in
the class storage definition as a property named
%%CLASSNAME.
For more details, refer to the section on
subclasses below.
The IDKEY mechanism allows you to explicitly define the value used as
an object ID. To do this, you simply add an IDKEY index definition to your
class and specify the property or properties that will provide the ID value.
Note that once you save an object its object ID value cannot change. This
means that after you save an object that uses the IDKEY mechanism, you can
no longer modify any of the properties on which the object ID is based.
For example, we can modify the
Person class used
in the previous example to use an IDKEY index:
Class MyApp.Person Extends %Persistent [ClassType = persistent]
{
Index IDKEY On Name [ Idkey ];
Property Name As %String;
Property Age As %Integer;
}
If we create and save two instances of the
Person class,
the resulting global will now be similar to:
^MyApp.PersonD("Abraham") = $LB("",530,"Abraham")
^MyApp.PersonD("Noah") = $LB("",680,"Noah")
Note that there is no longer any counter node defined. Also note that
by basing the object ID on the
Name property, we have
implied that the value of
Name must be unique for each
object.
If the IDKEY index is based on multiple properties, then the main data
nodes will have multiple subscripts. For example:
Class MyApp.Person Extends %Persistent [ClassType = persistent]
{
Index IDKEY On (Name,Age) [ Idkey ];
Property Name As %String;
Property Age As %Integer;
}
In this case, the resulting global will now be similar to:
^MyApp.PersonD("Abraham",530) = $LB("",530,"Abraham")
^MyApp.PersonD("Noah",680) = $LB("",680,"Noah")
By default, any fields introduced by a subclass of a persistent object
are stored in an additional node. The name of the subclass is used as an additional
subscript value.
For example, suppose we define a simple persistent
MyApp.Person class
with two literal properties:
Class MyApp.Person Extends %Persistent [ClassType = persistent]
{
Property Name As %String;
Property Age As %Integer;
}
Now we define a persistent subclass,
MyApp.Student,
that introduces two additional literal properties:
Class MyApp.Student Extends Person [ClassType = persistent]
{
Property Major As %String;
Property GPA As %Float;
}
If we create and save two instances of this
MyApp.Student class,
the resulting global will be similar to:
^MyApp.PersonD = 2 // counter node
^MyApp.PersonD(1) = $LB("Student",19,"Jack")
^MyApp.PersonD(1,"Student") = $LB(3.2,"Physics")
^MyApp.PersonD(2) = $LB("Student",20,"Jill")
^MyApp.PersonD(1,"Student") = $LB(3.8,"Chemistry")
The properties inherited from the
Personclass
are stored in the main node while those introduced by the
Student class
are stored in an additional subnode. This structure ensures that the
Student data
can be used interchangeably as
Person data. For example,
an SQL query listing names of all
Person objects will
correctly pick up both
Person and
Student data.
This structure also makes it easier for the Class Compiler to maintain data
compatibility as properties are added to either the super or sub classes.
Note that the first piece of the main node contains the string
Student identifying
that these nodes contain
Student data.
Parent Children Relationships
Within parent-children relationships, instances of child objects are
stored as subnodes of the parent object to which they belong. This structure
ensures that child instance data is physically clustered along with parent
data.
For example, here is the definition for two related classes,
Invoice:
/// An Invoice class
Class MyApp.Invoice Extends %Persistent [ClassType = persistent]
{
Property CustomerName As %String;
/// an Invoice has CHILDREN that are LineItems
Relationship Items As LineItem [inverse = TheInvoice, cardinality = CHILDREN];
}
/// A LineItem class
Class MyApp.LineItem Extends %Persistent [ClassType = persistent]
{
Property Product As %String;
Property Quantity As %Integer;
/// a LineItem has a PARENT that is an Invoice
Relationship TheInvoice As Invoice [inverse = Items, cardinality = PARENT];
}
If we store several instances of
Invoice object,
each with associated
LineItem objects, the resulting
global will be similar to:
^MyApp.InvoiceD = 2 // invoice counter node
^MyApp.InvoiceD(1) = $LB("","Wiley Coyote")
^MyApp.InvoiceD(1,"Items") = 2 // lineitem counter node
^MyApp.InvoiceD(1,"Items",1) = $LB("","Rocket Roller Skates",2)
^MyApp.InvoiceD(1,"Items",2) = $LB("","Acme Magnet",1)
^MyApp.InvoiceD(2) = $LB("","Road Runner")
^MyApp.InvoiceD(2,"Items") = 1 // lineitem counter node
^MyApp.InvoiceD(2,"Items",1) = $LB("","Birdseed",30)
Note that the name of the Relationship is used as an additional literal
subscript; this allows a class to support multiple relationships without data
conflict. Also note that each instance of
Invoice maintains
its own counter node for assigning ID values for
LineItem objects.
Embedded objects are stored by first converting them to a serialized
state (by default a
$Liststructure containing
the object's properties) and then storing this serial state in the same way
as any other property.
For example, suppose we define a simple serial (embeddable) class with
two literal properties:
Class MyApp.MyAddress Extends %SerialObject [ClassType = serial]
{
Property City As %String;
Property State As %String;
}
We now modify our earlier example to add an embedded
Home address
property:
Class MyApp.MyClass Extends %Persistent [ClassType = persistent]
{
Property Name As %String;
Property Age As %Integer;
Property Home As MyAddress;
}
If we create and save two instances of this class, the resulting global
will be equivalent to:
^MyApp.MyClassD = 2 // counter node
^MyApp.MyClassD(1) = $LB(530,"Abraham",$LB("UR","Mesopotamia"))
^MyApp.MyClassD(2) = $LB(680,"Noah",$LB("Gommorrah","Israel"))
Global streams are stored within globals by splitting their data into
a series of chunks, each smaller than 32K bytes, and writing the chunks into
a series of sequential nodes (File streams are stored in external files).
Persistent classes can define one or more indices; additional data structures
used to make operations (such as sorting or conditional searches) more efficient.
Caché SQL will make use of such indices when executing queries. Caché
Object and SQL automatically maintain the correct values within indices as
insert, update, and delete operations are carried out.
Storage Structure of Standard Indices
A standard index associates an ordered set of one or more property values
with the object ID values of the object containing the properties.
For example, suppose we define a simple persistent
MyApp.Person class
with two literal properties and an index on its
Name property:
Class MyApp.Person Extends %Persistent [ClassType = persistent]
{
Index NameIdx On Name;
Property Name As %String;
Property Age As %Integer;
}
If we create and save several instances of this
Person class,
the resulting data and index globals will be similar to:
// data global
^MyApp.PersonD = 3 // counter node
^MyApp.PersonD(1) = $LB("",34,"Jones")
^MyApp.PersonD(2) = $LB("",22,"Smith")
^MyApp.PersonD(3) = $LB("",45,"Jones")
// index global
^MyApp.PersonI("NameIdx"," JONES",1) = ""
^MyApp.PersonI("NameIdx"," JONES",3) = ""
^MyApp.PersonI("NameIdx"," SMITH",2) = ""
Note the following things about the index global:
-
By default, it is placed in a global whose name is the class
name with an
I (for Index) appended to it.
-
By default, the first subscript is the index name; this allows
multiple indices to be stored in the same global without conflict.
-
The second subscript contains the
collated data
value. In this case, the data is collated using the default SQLUPPER collation
function. This converts all characters to upper case (to sort without regard
to case) and prepends a space character (to force all data to collate as strings).
-
The third subscript contains the Object ID value of the object
that contains the indexed data value.
-
The nodes themselves are empty; all the needed data is held
within the subscripts. Note that if an index definition specifies that data
should be stored along with the index, it will be placed in the nodes of the
index global.
This index contains enough information to satisfy a number of queries,
such as listing all
Person class order by
Name.
A bitmap index is similar to a standard index except that it uses a
series of bit strings to store the set of object ID values that correspond
to the indexed value.
Logical Operation of Bitmap Indices
A bit string is a string containing a set of bits (0 and 1 values) in
a special compressed format. Caché includes a set of functions for
efficiently creating and working with bit strings. These are listed in the
following table:
Bit String Operations
| Function |
Description |
| $Bit |
Set or get a bit within a bit string. |
| $BitCount |
Count the number of bits within a bit string. |
| $BitFind |
Find the next occurrence of a bit within a bit string. |
| $BitLogic |
Perform logical (AND, OR) operations on two or more
bit strings. |
Within a bitmap index, ordinal positions within a bit string correspond
to rows (Object ID number) within the indexed table. A bitmap index maintains,
for a given value, a bit string containing a 1 for each row in which the given
value is present, and 0 for every row in which it is not. Note that bitmap
indices only work for objects that use the default storage structure with
system-assigned, numeric Object ID values.
For example, suppose we have a table similar to the following:
If the
State and
Product columns
have bitmap indices, then they will contain the following values:
A bitmap index on the
State column will contain
the following bitstring values:
Note that for the value,
MA, there is a 1 in the positions
(1, 4, and 5) that correspond to the table rows with
State equal
to
MA.
Simlarly, a bitmap index on the
Product column
will contain the following bitstring values (note that the values are collated
to upper case within the index):
The Caché SQL Engine can execute a number of operations by iterating
over, counting the bits within, or performing logical combinations (AND, OR)
on the bit strings maintained by these indices. For example, to find all rows
that have
State equal to
MA and
Product equal
to
HAT, the SQL Engine can simply AND the appropriate bit strings
together.
In addition to these indices, the system will maintain an additional,
so-called extent index, that contains a 1 for every row that exists and a
0 for rows that do not (such as deleted rows). This is used for certain operations,
such as negation.
Storage Structure of Bitmap Indices
A bitmap index associates an ordered set of one or more property values
with one or more bit strings containing the Object ID values corresponding
to the property values.
For example, suppose we define a simple persistent
MyApp.Person class
with two literal properties and a bitmap index on its
Age property:
Class MyApp.Person Extends %Persistent [ClassType = persistent]
{
Index AgeIdx On Age [Type = bitmap];
Property Name As %String;
Property Age As %Integer;
}
If we create and save several instances of this
Person class,
the resulting data and index globals will be similar to:
// data global
^MyApp.PersonD = 3 // counter node
^MyApp.PersonD(1) = $LB("",34,"Jones")
^MyApp.PersonD(2) = $LB("",34,"Smith")
^MyApp.PersonD(3) = $LB("",45,"Jones")
// index global
^MyApp.PersonI("AgeIdx",34,1) = 110...
^MyApp.PersonI("AgeIdx",45,1) = 001...
// extent index global
^MyApp.PersonI("$Person",1) = 111...
^MyApp.PersonI("$Person",2) = 111...
Note the following things about the index global:
-
By default, it is placed in a global whose name is the class
name with an
I (for Index) appended to it.
-
By default, the first subscript is the index name; this allows
multiple indices to be stored in the same global without conflict.
-
The second subscript contains the
collated data
value. In this case, a collation function is not applied as this is an index
on numeric data.
-
The third subscript contains a so-called
chunk number;
for efficiency, bitmap indices are divided into a series of bit strings each
containing information for about 64000 rows from the table. Each of these
bit strings are referred to as a chunk.
-
The nodes contain the bit strings.
Also note, that because this table has a bitmap index, an extent index
is automatically maintained. This extent index is stored within the index
global and uses the class name, with a
$ character prepended
to it, as its first subscript.
Direct Access of Bitmap Indices
The following example uses a class extent index to compute the total
number of stored object instances (rows). Note that it uses
$Order to
iterate over the chunks of the extent index (each chunk contains information
for about 64000 rows):
/// Return the number of objects for this class.<BR>
/// Equivalent to SELECT COUNT(*) FROM Person
ClassMethod Count() As %Integer
{
New total,chunk,data
Set total = 0
Set chunk = $Order(^MyApp.PersonI("$Person",""),1,data)
While (chunk '= "") {
Set total = total + $bitcount(data,1)
Set chunk = $Order(^MyApp.PersonI("$Person",chunk),1,data)
}
Quit total
}