UniVerseDistributedFiles

From Pickwiki
Jump to navigationJump to search

Distributed Files on UniVerse

This document explicitly refers to UniVerse. Similar facilities pertain on UniData (and Pick etc), but this particular facility seems to be unique to UniVerse (and INFORMATION, from which it was copied).

This technique is best used when either there is a lot of data and you are likely to break various file limits such as the two-gig limit, or when data access tends to be very "clumpy" base on key, for example if you can include the year in @ID and it's accounting data.

A distributed file is composed of one or more partfiles. A partfile is (almost) just an ordinary UV file. The difference is just that a partfile contains, in its header, an i-descriptor and a part number. The only part of the data record that the i-descriptor is permitted to reference is its @ID, and if part number and i-descriptor result don't match, a read or write error results. The i-descriptor can't reference the record data, because how would it work out where to read the record from?

You should always use the DEFINE.DF command to manipulate distributed files in any way, shape or form. Just like with UV indexes, you can get into trouble if you don't understand what is going on "under the bonnet", and UV keeps track of distributed files using an &PARTFILES& file in the UV account. Unlike with indexes, however, all file-integrity data is kept in the file itself, so you can't actually corrupt a file with external tools.

I've found initially creating the distibuted file itself a pain - INFORMATION was far more forgiving/intuitive/do-it-for-you.

DEFINE.DF newfile ADDING partfile partno algorithm

"newfile" will now appear to be a perfectly normal F record in the VOC, except you may have to manually create the 3rd, dict line. Or you may find that DEFINE.DF won't work until you've done a CREATE.FILE DICT to get a blank VOC entry to add to :-(

Once you've got one partfile in a distributed file, you can easily add further partfiles with

DEFINE.DF distributed-file ADDING partfile partno

This will make field 2 of the VOC entry into a multi-valued field with a list of the part files.

Note that you can only have one algorithm per distributed file. Be very careful if you add the same partfile to several distributed files - you can force the algorithm to change, and this will give you some seriously confused distributed files!

Now you have the flexibility to access the entire file, any individual partfile, or any subset of partfiles defined by a VOC entry, and from a user or programming point of view you don't need to know anything about the underlying file (other than that WRITEs can fail if the @ID is illegal for the relevant part you're accessing).


When designing a distributed file, be very careful about your algorithm. Especially if the reason you are using a distributed file is to reduce its size to something manageable!

We had an enquiries file which recorded all of the enquiries coming into our customer services department. And we wanted the data online for analysis for whatever ad-hoc reasons our research departments could think up.

What we did was to put the year into the key. The first year was 86, so we subtracted 85 to give us a part number >= 1. As records were added they were given a sequential number as key, so we also had the case that if no year was found the part number was returned as 0.

Every year, as we wanted to archive the records, we simply added a new partfile for the year just gone, and renamed the year's records to include the year. We could have done a select and rename on the entire distributed file, or selected partfile 0 then done a rename on the entire file, or as I think we did, a basic program that selected file 0, then wrote the records with their new key to the new partfile before deleting them from the original.


A poster to the Oliver lists had a slightly different problem. He wanted to resize his entire file, and had difficulty finding a suitably large "window of opportunity". It seems he had three partfiles, and was going to split each partfile in half. I will assume here that the files were called ENTIREDF, PART0, PART1 and PART2, and also that the algorithm was "MOD(@ID,3)". He actually found a window that enabled him to do it all in one hit, but my suggestion was that he do it in parts as follows:

CREATE.FILE PART3

Create a NEW-IDESC of "MOD(@ID,6); IF @1 GT 3 THEN @1 - 3"

SELECT PART0 WITH NEW-IDESC NE 0 COPY FROM PART0 TO PART3 DELETING

DEFINE.DF ENTIREDF ADDING PART3 3 NEW-IDESC FORCE

Note that the new algorithm returns exactly the same result for files PART1 and PART2, and we've moved into PART3 all the records who's result will have changed for PART0. Then the DEFINE.DF adds the new partfile, at the same time forcibly changing the algorithm for all partfiles. PART1 and PART2 can be dealt with the same way in the next window.


The main point to address here is that if you expect the file to keep growing, you want an algorithm that you can extend to new partfiles, without changing the result for existing partfiles. Otherwise, the time window required for any necessary file maintenance is likely to grow, while the modern trend seems to be decreasing time windows to actually do anything in.