VSAM space allocation Q&A >>
 
By Dan Janda

How can I predict the number of logical records within a Control Interval (CI)?

First, we have to know what goes into Control Intervals (CIs).

VSAM Data Control Intervals contain:

Logical Records
Logical records are loaded into a CI beginning at its left (or low addressed) end.
Control Iinformation
At the right (or high addressed) end, a four-byte Control Interval Definition Field (CIDF) is inserted. This principally indicates the position and size of the free space, if any, within the CI.

Following the CIDF (immediately to its left, or lower in address) will be the first Record Descriptor Field (RDF). This describes the length of the first (leftmost, lowest addressed) logical record within the CI.

If there is a second logical record within the CI, there will be a second RDF. If the second logical record is the same length as the first, then the second RDF will be a counter indicating that the second record (and third, etc.) are the same length as the preceeding record. On the other hand, if the second record is of a different length than the first, a normal RDF will be used specifying the length of the second record. This same decision is made as each additional logical record is added -- either update the counter if the new logical record is the same length, or add a new normal RDF.

In the case of spanned records, the RDF will refer to the record segment contained within the CI.

Free Space

Finally, there may be free space within the CI, between the right end of the last logical record and the left end of the last RDF. Again, the CIDF indicates the position and length of the free space within the CI.

The minimum amount of free space within a CI can be specified in the DEFINE CLUSTER command's FREESPACE parameter. It indicates the minimum percentage of the CISIZE (rounded up to the next whole byte) that is to be left free during sequential insertion processing (e.g. during file load, extend, or during mass sequential insertions of a series of records).

For example,

CISIZE(4096)
Logical record length = 200 bytes fixed length
FRSPC(10 10)

Then:
There will be one CIDF of 4 bytes, and two RDF of 3 bytes each, 
for a total of 10 bytes of control information.
There will be 10% of 4096 = 410 bytes of reserved free space.
There will be 420 bytes of CI space that cannot be loaded with
logical records... and 3576 bytes that can be loaded with logical records.
3576 / 200 = 17 logical records will be inserted in each CI being loaded.
4086 - 3400 = 686 bytes of free space will exist in each CI being loaded.

In the case of variable length records, your ability to predict the number of logical records inserted into a CI depends on your knowledge of the sizes of the records inserted into the specific CI. Even if other records in other CIs happen to have differing lengths, the records being inserted into a particular CI might be of one length, so you can set bounds on the number of RDFs in a CI -- one if only one record is stored in a given CI, two if more than one, and as many as there might be logical records.

In the example case above, suppose the records were of variable length, but averaged 200 bytes in length. Then I could need as many as 20 RDFs (one for each possible record that could be inserted). In this case, the potential 18 additional RDFs would occupy at most 54 additional bytes of control information, and the total number of records inserted and number of additional records that could be directly inserted into the available free space would be unchanged.

No -- I didn't give you a formula to figure this out. I did show you what VSAM inserts and how big it is, so you can figure it out yourself.

Still want a formula? For fixed-length logical records:


[NumRecs] = (CISize - 10 - Ceil(FRSPC% * CISize)) / [RecSize]

Where:
NumRecs == the number of logical records to be inserted at load
CISize  == the defined Control Interval Size
10      == constant for sum of length of CIDF + 2 RDFs
Ceil    == function that returns the integer value rounded up
FRSPC%  == the defined Control Interval Freespace percentage
RecSize == the actual record size
 

This works for all cases when more than one logical record is inserted into a CI. In the cases when only one logical record is inserted, the factor 10 above should be changed to 7.

It has been my experience that application designers and programmers do not know the actual average logical record length nor the distribution of those lengths unless they actually have built the system and measured the result.

How can I predict the number of Control Intervals within a Control Area (CA)?

First, we have to know what goes into Control Areas (CAs).

VSAM Data Control Areas contain:

Loaded Control Intervals
Logical records are loaded into a CA beginning at its left (or low addressed) end.
Free Control Intervals
We can calculate the number of CIs within a CA that will be left free at load time if we know the number of CIs in the CA and the CA Freespace percentage.

NumFree = Ceil(CISinCA * CAFRSPC%)

Where:
NumFree  == Number of CIs not initially loaded
Ceil     == function that returns the integer value rounded up
CISinCA  == Number of CIs in the CA
CAFRSPC% == Defined CA free space percentage

To determine the number of CIs in a CA, you first need to know the CA Size.

The CA Size is the smallest of:

  • The primary space allocation amount
  • The secondary space allocation amount
  • The size of one cylinder

If allocations are specified in cylinders, or (in VSE's implementation) if the amount of track or record allocations (both primary and secondary amounts) are at least the size of one cylinder, then the CA size will be one cylinder.

You need to know the physical characteristics of the disk device being used (or simulated, in the case of virtual disks or Multiprise Internal Disk subsystems). For example, a 3390 device contains 15 tracks per cylinder, with room on each track for:

  • 2 26KB CIs
  • 6 8KB CIs
  • 12 4KB CIs
and so on. For details about the number of CIs that can be contained on one cylinder (the maximum CA size), you can refer to the VSE/VSAM User's Guide manual... or look at this link (not yet working...)

How can I predict the amount of space my file will need?

First, please read the following two Question and Answer discussions:

With that background, we can make an accurate estimate of the amount of disk space that VSAM will use when loading a file.

Next, calculate the number of CIs that will need to be loaded to contain the logical records in your file:


NumCIs = Ceil(NumLogRecs / RecsInCI)

Where:
NumCIs      == Number of CIs initially loaded
Ceil        == function that returns the integer value rounded up
NumLogRecs  == Number of Logical Records to be loaded
RecsInCI    == Number of Logical Records loaded into a CI

To determine the number of Logical Records to be loaded into a CI, see this discussion.

Now we know how many CIs will need to be loaded with records. To calculate the number of CAs that will need to be loaded to contain the CIs for your file:


NumCAs = Ceil(NumCIs / LoadCIsInCA)

Where:
NumCAs      == Number of CAs initially loaded
Ceil        == function that returns the integer value rounded up
NumCIs      == Number of CIs initially loaded
LoadCIsInCA == Number of CIs to be loaded in each CA

To determine the number of CIs to be loaded in each CA, see this discussion.

You can check the LISTCAT output after loading a file to determine if your calculations were correct:

Validate the following fields:

  • Data Component ALLOCATION: HI-USED-RBA
  • Data Component ATTRIBUTES: CISIZE
  • Data Component ATTRIBUTES: CI/CA

Multiply the CI values (CISIZE and CI/CA) to determine the CA SIZE in bytes

Divide the ALLOCATION HI-USED-RBA by the CA SIZE calculated above to determine the number of CAs initially loaded.

If this is not what you estimated, check to ensure that the following values are as intended:

Data Component ATTRIBUTES: AVGLRECL
This value is only meaningful if you specified space allocation (SPACE-TYPE) in RECORDS.
Data Component ATTRIBUTES: CISIZE
This should be as you specified, unless VSAM was limited because you specified BUFSPACE too small. If this occurred, change your DEFINE parameters for this file to omit the BUFSPACE value or specify a larger BUFSPACE value -- this is particularly useful for VSAM Base Clusters accessed via Alternate Indexes.
Data Component STATISTICS: REC-TOTAL
Were the intended number of records loaded? Often a significant difference can be found when comparing the number of records actually loaded to the design specifications.
Data Component STATISTICS: FREESPACE-%CI
This should be as you specified. It affects the number of logical records loaded per CI -- More FREESPACE means fewer logical records inserted into each CI during initial load.
Data Component STATISTICS: FREESPACE-%CA
This should be as you specified. It affects the number of CIs initially loaded per CA -- More FREESPACE means fewer CIs in each CA will have records inserted during initial load.
Data Component ALLOCATION: SPACE-TYPE
This should be as you specified -- RECORDS, BLOCKS, TRACKS or CYLINDERS. It affects the CA Size.
Data Component ALLOCATION: SPACE-PRI
This should be as you specified. It affects the CA Size and number of allocations.
Data Component ALLOCATION: SPACE-SEC
This should be as you specified. It affects the CA Size and number of allocations.
Data Component VOLUME: VOLSER
Is the volume you intended to use being used?
Data Component VOLUME: DEVTYPE
Is the device type what you intended to use?
Data Component VOLUME: PHYREC-SIZE
Is the physical record size VSAM chose as you expected? Sometimes VSAM will choose a physical record size smaller than your CI size to use track space more completely. This may be an indication that you should review the CI size you chose.
Data Component VOLUME: PHYRECS/TRK
Is the number of physical records per track as expected?
Data Component VOLUME: TRACKS/CA
If your file is of any size, this value should be equal to the number of tracks per cylinder. If this is not the case, it means your A
Data Component VOLUME: EXTENT-NUMBER
If the number of extents is large, it may be that the disk space where the file was allocated has been fragmented by other files' DEFINE and DELETE activity. Physical Space defragmentation may be desirable, or the allocation amounts defined for the file may be smaller than optimumb.

Extract of sample LISTCAT data

CLUSTER ------- TST.FILE . . . DATA ------- TST.FILE.DATA . . . ATTRIBUTES KEYLEN----------------44 AVGLRECL-------------505 BUFSPACE-----------10240 CISIZE--------------4096 RKP--------------------0 MAXLRECL-------------505 EXCPEXIT----------(NULL) CI/CA----------------180 SHROPTNS(2,3) NORECOVERY SUBALLOC NOERASE INDEXED NOWRITECHECK NOIMBED NOREPLICAT UNORDERED NOREUSE NONSPANNED NONRECVABLE STATISTICS REC-TOTAL-------------13 SPLITS-CI--------------0 EXCPS-- ---------------28 REC-DELETED------------0 SPLITS-CA--------------0 EXTENTS----------------2 REC-INSERTED-----------0 FREESPACE-%CI----------0 SYSTEM-TIMESTAMP: REC-UPDATED------------0 FREESPACE-%CA----------0 88.194 08:38:01 REC-RETRIEVED----------0 FREESPC-BYTES----------0 X'9ECD8557B1912C10' ALLOCATION SPACE-TYPE------CYLINDER SPACE-PRI--------------6 USECLASS-PRI-----------0 HI-ALLOC-RBA-------94208 SPACE-SEC--------------3 USECLASS-SEC-----------0 HI-USED-RBA--------94208 VOLUME VOLSER------------SIN238 PHYREC-SIZE---------4096 HI-ALLOC-RBA-------47104 EXTENT-NUMBER----------1 DEVTYPE-------------3380 PHYRECS/TRK-----------12 HI-USED-RBA--------47104 EXTENT-TYPE--------X'00' VOLFLAG------------PRIME TRACKS/CA-------------15 LOW-KEY---------------00 HIGH-KEY--------------3F HI-KEY-RBA----------5120 EXTENTS: LOW-CCHH-----X'00000002' LOW-RBA----------------0 TRACKS-----------------3 HIGH-CCHH----X'00000004' HIGH-RBA-----------47103 VOLUME . . . INDEX ------ TST.FILE.INDEX . . .

Why does my catalog keep running out of space?

>There are several unrelated conditions that can cause a catalog to become full.

The simple cases occur when there are many datasets defined in a catalog, but the vexing case occurs when there are relatively few (perhaps less than a hundred) datasets defined within the catalog.

Note that this may be essentially the same problem as discussed in the next question about why files may take much more space than estimated.

First, let's dispose of the cases from most obvious to least obvious:

  • An additional secondary allocation is needed for the catalog, but no space is available...

    Well, you may need to make more space available -- it may be that the volume on which the catalog was allocated became full of extents belonging to other datasets.

    To correct this, you may be able to use IDCAMS DELETE to remove datasets that are no longer needed, thus freeing space for the catalog itself to grow.

    BUT... it may be that your catalog is very dynamic and uses cluster or component names that are system or application generated, and there is much more space than the number of datasets would seem to demand. If this is the case, see the third item in this answer.

  • An additional secondary allocation is needed for the catalog, but sixteen extents have already been allocated...

    You'll have to redefine the catalog with larger allocation amounts, since sixteen extents of the size you've allocated is not enough space for the catalog itself.

    Again, it is possible that the additional allocations are needed, not because of the number of files or datasets defined in the catalog, but because the cluster and/or component names are continually increasing in value. If this is the case, you should see the next item in this answer.

  • One of the above cases seems to be indicated, but only a few datasets are defined in this catalog.

    The catalog may have been in use for some time, DEFINEing and DELETEing datasets, and there's never more datasets defined than could easily fit into the catalog space available.

    You've run into the problem that happens because a catalog is, in fact, a VSAM KSDS. When a KSDS is loaded, a range of keys is allocated to each control interval, and that space within each CI will not be used for records with keys outside that range, even if there are no records left in the control interval. Within a catalog, two keyranges are defined.

    • The higher-valued keyrange encompasses keys beginning with x'40' through x'FF', and records with these keys are sometimes referred to as "true name" records. The record key is in fact the name of the VSAM object -- dataset or cluster, component, volume, etc. It is within this keyrange that the usual problems occur.
    • The lower-valued keyrange encompasses keys beginning with x'00' through x'3F', and the records within this range contain the majority of the information maintained by VSAM in a catalog, such as extent information, volume space maps, details of datasets, and so on.

    This problem occurs regularly when you allow VSAM data set components' -- the data and index components' -- names to default to VSAM generated names, or when you specifically create VSAM data set names with similar characteristics. The default names are created from the system's Time-of-Day Clock, and so each time a data set is deleted and defined, new and higher-valued names are generated. The old component records and their names may be deleted, but the space in the control intervals cannot be reused by the new (higher-key-valued) records being added to the catalog. Many automatic processes tend to use some sort of automatic, date and time based name generation, so that the names generated will always be unique.

    One widely used application in VSE environments is the report distribution system, INFOPAC, by Mobius Management Systems, Inc.. This application gives you an install time option to generate its names using DATE/TIME or TIME/DATE ordering. By letting the TIME component of its name be the more significant part of the dataset name, you will tend over time to recycle through the key ranges and reclaim space within the catalog. The DATE/TIME ordering will cause the name values to continually increase, and the catalog space used to manage deleted datasets will never be reused.

    I mentioned INFOPAC here only because its developers were aware of this VSAM characteristic and give the user an option to minimize its impact. You can use the same or a similar procedure to allow your dataset names to cycle through the same range of key values by using similar techniques.

    Recently (early 2002) there were several threads on the VSE-L list which discussed aspects of this, including JCL routines to automatically generate dataset names for automatic FTP processes and the like.

    Note that even if you've re-used the same -- identical -- cluster (or dataset) names over and over, if you have not explicitly defined the component names for the cluster, you can encounter the same problem, as VSAM generated names for those components will continually increase and eventually cause the same problem.

    What should you do about this, when you cannot change the way the names are created?

    One thing that's simple to do is to exploit the catalog reorganization function of IDCAMS REPRO. When REPRO's input dataset is a catalog, and its output dataset is a sequential file or ESDS, the catalog's records will be unloaded in key sequence. Then, if that sequential file or ESDS is used as the input file, with the catalog itself as the output dataset, the catalog will be reorganized and the space will be reclaimed and made available for additional catalog entries.

    Catalog REPRO is very quick, as it only processes the catalog records, not the datasets defined by those records. The total amount of data involved is small.

    I recommend that you consider this function, and use it regularly to reorganize your catalogs when you have dataset and/or component names which are automatically generated. You will need to quiesce those subsystems which use datasets within the catalog being reorganized, but this should be of small impact since the process can be so quick.

    You might evaluate how much space a catalog needs to have for a month of activity, and then arrange for a weekly reorganization, thus keeping far from the catalog's capacity limits. Similarly, you might consider providing enough space in the catalog for three month's activity, and reorganize the catalog once a month.

How can I defragment the space owned by VSAM?

VSAM Managed Space can become fragmented.

As files are defined, space is allocated to them. Later, files are deleted and the space they occupy becomes free for use by other file allocation requests. Depending on the size and distribution of the free and used space allocations within the VSAM space, fragmentation may occur.

There is no VSAM defragmentation command. Using IDCAMS commands, however, VSAM space can be defragmented:

  1. Use IDCAMS BACKUP to make a copy of all files with extents on the volume containing the space to be defragmented.
  2. Use IDCAMS DELETE CLUSTER to delete all the files that were backed up. The output of the BACKUP command will contain a list of these files.
  3. Use IDCAMS RESTORE to restore all the files that were backed up and deleted.

IDCAMS RESTORE will delete files as they are being restored, but this will only release the space owned by one file at a time, leaving significant space fragmentation. To allow the space to be defragmented, all the files in the space should be deleted before the RESTORE command is processed.

You may want to consider making a second BACKUP because of the (unlikely) possibility that the first BACKUP output copy is unreadable, and you will be DELETEing the files!

Clearly this is fairly simple, and it could be readily automated using a REXX procedure, among other techniques.

Why does my file keep running out of space?

There are several unrelated conditions that can cause a file to need more space than estimated.

Let's review the cases from most obvious to least obvious:

  • An additional secondary allocation is needed for the file, but no space is available...

    Well, you may need to make more space available -- it may be that the volume(s) on which the file was allocated became full of extents belonging to this and/or other datasets.

    To correct this, you may be able to use IDCAMS DELETE to remove datasets that are no longer needed, thus freeing space for the catalog itself to grow. You may be able to allocate additional volumes for the file in question -- the IDCAMS ALTER datacomponentname ADDVOLUMES... command function can do this for you.

    BUT... it may be that:

    • Your file is very dynamic and uses keys that are system or application generated in a continually increasing manner
    • Your file has a high percentage of CI and/or CA freespace defined at load time
    • The insert activity for the file causes CI and CA splits, which essentially add additional freespace, but that additional space is not usually needed for additional inserts that would fall into the same range of keys as the CI where the split was triggered.

    Thus there is much more space required than the number of records would seem to demand.

  • An additional secondary allocation is needed for the file, but no more extents can be allocated since 123 extents have already been allocated...

    You'll have to redefine the file with larger allocation amounts, since 123 extents of the size you've allocated is not enough space for the catalog itself.

    Again, it is possible that the additional allocations are needed, not because of the number of records in the file, but because the allocation amounts and other parameters that control how VSAM allocates and uses space for files are set improperly.

    The most common case for this seems to occur because a file has grown beyond its initial design. You should ensure that primary and secondary space allocation amounts are appropriate for the actual file's requirements. It is often the case that secondary space allocation amounts are too small. Consider ensuring:

    • Primary and secondary allocation amounts should be at least one cylinder in size for any file larger than a few tracks in size.
    • Primary and secondary allocation amounts might be chosen to be on the order of 1/10th of the total size of the file for files less than a volume in size.
    • Remember that the first allocation for a file on any volume will be the size of the primary space allocation, and any additional allocations will be the size of the secondary space allocation. It's not inappropriate to have the secondary allocation size larger than the primary allocation size if that permits allocations to fill volumes better.
    • RAMAC Virtual Array DASD, and similar log-structured subsystems will not waste real disk space for disk tracks that are left empty. Why not let this technology simplify your allocations if it is available to you?
  • One of the above cases seems to be indicated, but only a few records (relative to the space allocated) are actually loaded in the file.

    The file may have been in use for some time, INSERTing and DELETEing records, and there's never more records defined than could easily fit into the space available.

    You've run into the problem that happens because of the index structure of VSAM KSDS. When a KSDS is loaded, a range of keys is allocated to each control interval, and that space within each CI will not be used for records with keys outside that range, even if there are no records left in the control interval.

    This problem occurs regularly when you all (or significant) insert activity occurs at a limited number of places in the file. The typical example is in a branch banking scenario, where each branch has its own range of account numbers (file keys), and each new account opened in a branch has a key value just larger than the previous account opened. (The worst case scenario is where there is essentially only one branch, and all inserts into the file occur at one point. The insert "hot spot" need not be at the end of the file, but can be located anywhere within the range of keys for the file.

    What should you do about this, when you cannot change the way the keys are created?

    One thing that's simple to do is to reorganize the file. User applications, system utilities, SORTs, and IDCAMS REPRO can be used as a reorganization tool.

    I recommend that you consider reorganization, and use it regularly to reorganize your files when you have keys which have this behavior. Reorganization can be of help, but careful design based on observed use patterns may be better yet. See the question that discusses reorganization.

    You might evaluate how much space -- primary and secondary allocations, CI and CA freespace -- a file needs to have for a month of activity, and then arrange for a weekly reorganization, thus keeping far from the file's capacity limits. Similarly, you might consider providing enough space in the file for three month's activity, and reorganize the file once a month.

    Reorganization should NOT be performed because some number of CI or CA splits have occurred. The CI and CA split process creates free space within a file at the point of insertion. If there is single or multiple "hot spots" for insertion, CI and CA splits can be a very efficient way to get appropriate free space generated within a file. Too-frequent reorganization in these cases would simply eliminate that free space that could have been used by additional insert activiity.


    Home