PickDataStructure

From Pickwiki
Revision as of 22:23, 9 July 2019 by IanMcGowan (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The Pick Data Structure

Pick organises data as a string of writable characters using the eight bit ASCII character set. The first 32 characters (0-31) are unused. Characters 32 to 127 are the letters and numbers of English in the seven bit character set. The three characters 254, 253, and 252 are delimiters called attribute mark, value mark, and subvalue mark respectively. The original R83 Pick implementation used character 255 as a hidden delimiter between records. One implementation of Pick uses character 128 as a null and character 251 as a further delimiter. The use of these specific high-order ASCII characters means that Pick is incompatible with other character sets.

Accounts

An account is a collection of files with a master dictionary (MD) or VOC as the main file/starting point. An account is equivalent to a database in the relational world. Accounts contain Files. The MD or VOC has pointers to the files in the account, though it's also possible to point to a file in a different account. For traditional reasons this is called a Q-pointer.


Files

Files are equivalent to tables in the relational world. A traditional file is a hashed data structure, divided into a series of buckets (the count of which is called the modulo), where each bucket is a defined size (called the separation). Each file also has a dictionary, which contains ATB's that are used when querying the file. A good analogy for the ATB's is excel formulas - a simple ATB may just refer to a specific attribute in the current file. A more complex one may add some values together or perform functions on strings data. Most complex of all, you can create a Basic ATB that calls some compiled code that can then do basically anything.


Records

Records store the actual data in the database. In relational terminology they would be a row, but there are no named columns - data is accessed by attribute position. Nothing in the system enforces that a particular attribute is used in the same way, which makes records somewhat similar to NoSQL/schemaless databases. Records are a collection of data in buckets, accessed by hashing the item id (primary key) of the record to locate the data in the file. HashingFileSystem has more information on how this is done. The choice of ID's, along with the modulo/separation of the file, can have significant performance impacts, and can be difficult to change once the system has been in use for a while.


Attributes

The most fundamental thing in Pick is the idea of a record (item) consisting of a string of data split by the attribute mark into a number of fields (attributes). Pick uses a dictionary definition to indicate the nature of these fields by reference to their postion. The dictionary definitions are, however, indicative and not prescriptive. This is a weakness and a strength of the Pick data structure, but one can see how the layout somewhat resembles that used in most SQL DBMS's. I should note that in at least one implementation it is possible to overlay an SQL organisation which enforces prescriptive definitions. It is also true to say that many Pick applications enforce this by various means - most of these depend on maintaining a site discipline.


Values

Any attribute can be set up as a string consisting of repeated fields separated by value marks. These are called multivalues and are one of the most flexible, useful, and controversial aspects of the Pick data structure. The use of multivalues is often regarded by relational data advocates as something akin to a heresy because it appears to violate a 'rule' of atomicity of data. If one here establishes the basic concept that Pick is a data structure, and not a database, then the difficulty disappears. One is then left with a very compact method of grouping data that is in the nature of a list. There is an inherent order in the list which may or may not be meaningful. My own view is that it is unwise in the practical situation to assume the order of the list means anything.

A single-valued attribute is closely comparable to an attribute in XML. One with multiple values in it compares to repeated elements in XML. In the relational world, these correspond somewhat to simple and non-simple domains.


Associated Values

A further subtlety is to link a number of attributes that are multivalued. These are called associated multivalues and are a way to define data that is tabular in nature. In this case the position of each value has meaning in that the 'nth' value in each attribute are linked with each other as part of a meaningful set of data. A good practical example is using associated multivalues to store the lines of an invoice in the same record as the invoice header information. This somewhat resembles object data, but is more usually referred to as 'nested relational'. It compares with an XML element enclosing repeated sets of other elements.


Subvalues

Pick allows each value to be further divided into subvalues. In my experience, this is not at all common - mainly because the enquiry language has no way to use this data. I have seen it used in defining complex structures used to drive specific '4GL' tools. The only other time I have seen this used (in a way I felt made sense) was in a product called ALL where an attribute could contain an entire table. The value marks were used to separate table rows and the subvalue marks were analogous to attribute marks in the standard Pick data structure. I saw this as being effectively the same thing as associated multivalues. It had the benefit of keeping related data close to each other in the physical sense (you lose the benefit of keeping similar data close to each other). It also might mean that data could be held more compactly, depending on the existence and nature of 'empty' fields that may occur in the data.


Foreign Key Fields

The enquiry language in Pick (this goes by various names, but I will here use the R83 name 'Access') allows for a great deal of subtlety and power by means of complex dictionary definitions. Among the most useful mechanisms is the ability to refer to the contents of other files by using attributes or multivalues from one file to refer to keys (or parts of keys) in other files. In Pick terminology, these referred to as 'T-correlatives'. Dictionary definitions used by Access contain a number of other correlatives. They can also use other things called 'conversions' (EG converting a day number to a date). These mechanisms provide very sophisticated means to extract and format data for reporting purposes. This is a large and somewhat specialised part of the Pick system, so I won't go into any more detail.