Following the CIDF (immediately to its left, or lower in address) will be the
first Record Descriptor Field (RDF). This describes the length of the first (leftmost,
lowest addressed) logical record within the CI.
If there is a second logical record within the CI, there will be a second RDF.
If the second logical record is the same length as the first, then the second RDF
will be a counter indicating that the second record (and third, etc.) are the same
length as the preceeding record. On the other hand, if the second record is of a
different length than the first, a normal RDF will be used specifying the length
of the second record. This same decision is made as each additional logical
record is added -- either update the counter if the new logical record is the
same length, or add a new normal RDF.
In the case of spanned records, the RDF will refer to the record segment contained
within the CI.
Finally, there may be free space within the CI, between the right
end of the last logical record and the left end of the last RDF. Again,
the CIDF indicates the position and length of the free space within the
CI.
The minimum amount of free space within a CI can be specified in the
DEFINE CLUSTER command's FREESPACE parameter. It indicates the minimum percentage of
the CISIZE (rounded up to the next whole byte) that is to be left free during
sequential insertion processing (e.g. during file load, extend, or during
mass sequential insertions of a series of records).
For example,
CISIZE(4096)
Logical record length = 200 bytes fixed length
FRSPC(10 10)
Then:
There will be one CIDF of 4 bytes, and two RDF of 3 bytes each,
for a total of 10 bytes of control information.
There will be 10% of 4096 = 410 bytes of reserved free space.
There will be 420 bytes of CI space that cannot be loaded with
logical records... and 3576 bytes that can be loaded with logical records.
3576 / 200 = 17 logical records will be inserted in each CI being loaded.
4086 - 3400 = 686 bytes of free space will exist in each CI being loaded.
In the case of variable length records, your ability to predict
the number of logical records inserted into a CI depends on your
knowledge of the sizes of the records inserted into the specific CI.
Even if other records in other CIs happen to have differing lengths,
the records being inserted into a particular CI might be of one length,
so you can set bounds on the number of RDFs in a CI -- one if only one
record is stored in a given CI, two if more than one, and as many as
there might be logical records.
In the example case above, suppose the records were of variable
length, but averaged 200 bytes in length. Then I could need as many
as 20 RDFs (one for each possible record that could be inserted). In
this case, the potential 18 additional RDFs would occupy at most 54
additional bytes of control information, and the total number of
records inserted and number of additional records that could be
directly inserted into the available free space would be unchanged.
No -- I didn't give you a formula to figure this out. I did
show you what VSAM inserts and how big it is, so you can figure
it out yourself.
Still want a formula? For fixed-length logical records:
[NumRecs] = (CISize - 10 - Ceil(FRSPC% * CISize)) / [RecSize]
Where:
NumRecs == the number of logical records to be inserted at load
CISize == the defined Control Interval Size
10 == constant for sum of length of CIDF + 2 RDFs
Ceil == function that returns the integer value rounded up
FRSPC% == the defined Control Interval Freespace percentage
RecSize == the actual record size
This works for all cases when more than one logical record is
inserted into a CI. In the cases when only one logical record is
inserted, the factor 10 above should be changed to 7.
It has been my experience that application designers and
programmers do not know the actual average logical record length
nor the distribution of those lengths unless they actually have
built the system and measured the result.
How can I predict the number of Control Intervals within a Control Area (CA)?
First, we have to know what goes into Control Areas (CAs).
VSAM Data Control Areas contain:
- Loaded Control Intervals
- Logical records are loaded into a CA beginning at its left (or low addressed) end.
- Free Control Intervals
- We can calculate the number of CIs within a CA that will be left free at
load time if we know the number of CIs in the CA and the CA Freespace percentage.
NumFree = Ceil(CISinCA * CAFRSPC%)
Where:
NumFree == Number of CIs not initially loaded
Ceil == function that returns the integer value rounded up
CISinCA == Number of CIs in the CA
CAFRSPC% == Defined CA free space percentage
To determine the number of CIs in a CA, you first need
to know the CA Size.
The CA Size is the smallest of:
- The primary space allocation amount
- The secondary space allocation amount
- The size of one cylinder
If allocations are specified in cylinders, or (in VSE's implementation)
if the amount of track or record allocations (both primary and secondary amounts) are at least the size of one cylinder, then the CA size will be one cylinder.
You need to know the physical characteristics of the disk device
being used (or simulated, in the case of virtual disks or Multiprise Internal Disk
subsystems). For example, a 3390 device contains 15 tracks per cylinder, with
room on each track for:
- 2 26KB CIs
- 6 8KB CIs
- 12 4KB CIs
and so on. For details about the number of CIs that can be contained
on one cylinder (the maximum CA size), you can refer to the VSE/VSAM User's Guide
manual... or look at this link (not yet working...)
How can I predict the amount of space my file will need?
First, please read the following two Question and Answer discussions:
With that background, we can make an accurate estimate of the amount of disk space that VSAM will use when loading a file.
Next, calculate the number of CIs that will need to be loaded to contain the logical records in your file:
NumCIs = Ceil(NumLogRecs / RecsInCI)
Where:
NumCIs == Number of CIs initially loaded
Ceil == function that returns the integer value rounded up
NumLogRecs == Number of Logical Records to be loaded
RecsInCI == Number of Logical Records loaded into a CI
To determine the number of Logical Records to be loaded into a CI,
see this discussion.
Now we know how many CIs will need to be loaded with records.
To calculate the number of CAs that will need to be loaded to contain the CIs for your file:
NumCAs = Ceil(NumCIs / LoadCIsInCA)
Where:
NumCAs == Number of CAs initially loaded
Ceil == function that returns the integer value rounded up
NumCIs == Number of CIs initially loaded
LoadCIsInCA == Number of CIs to be loaded in each CA
To determine the number of CIs to be loaded in each CA,
see this discussion.
You can check the LISTCAT output after loading a file to determine
if your calculations were correct:
Validate the following fields:
- Data Component ALLOCATION: HI-USED-RBA
- Data Component ATTRIBUTES: CISIZE
- Data Component ATTRIBUTES: CI/CA
Multiply the CI values (CISIZE and CI/CA) to determine the CA SIZE in bytes
Divide the ALLOCATION HI-USED-RBA by the CA SIZE calculated above to
determine the number of CAs initially loaded.
If this is not what you estimated, check to ensure that the following
values are as intended:
- Data Component ATTRIBUTES: AVGLRECL
- This value is only meaningful if you specified space allocation (SPACE-TYPE) in RECORDS.
- Data Component ATTRIBUTES: CISIZE
- This should be as you specified, unless VSAM was limited because you specified BUFSPACE too small.
If this occurred, change your DEFINE parameters for this file to omit the BUFSPACE value or specify
a larger BUFSPACE value -- this is particularly useful for VSAM Base Clusters accessed via Alternate
Indexes.
- Data Component STATISTICS: REC-TOTAL
- Were the intended number of records loaded? Often a significant difference can be found when
comparing the number of records actually loaded to the design specifications.
- Data Component STATISTICS: FREESPACE-%CI
- This should be as you specified. It affects the number of logical records loaded per CI --
More FREESPACE means fewer logical records inserted into each CI during initial load.
- Data Component STATISTICS: FREESPACE-%CA
- This should be as you specified. It affects the number of CIs initially loaded per CA --
More FREESPACE means fewer CIs in each CA will have records inserted during initial load.
- Data Component ALLOCATION: SPACE-TYPE
- This should be as you specified -- RECORDS, BLOCKS, TRACKS or CYLINDERS. It affects the CA Size.
- Data Component ALLOCATION: SPACE-PRI
- This should be as you specified. It affects the CA Size and number of allocations.
- Data Component ALLOCATION: SPACE-SEC
- This should be as you specified. It affects the CA Size and number of allocations.
- Data Component VOLUME: VOLSER
- Is the volume you intended to use being used?
- Data Component VOLUME: DEVTYPE
- Is the device type what you intended to use?
- Data Component VOLUME: PHYREC-SIZE
- Is the physical record size VSAM chose as you expected? Sometimes VSAM will choose a physical record
size smaller than your CI size to use track space more completely. This may be an indication that you
should review the CI size you chose.
- Data Component VOLUME: PHYRECS/TRK
- Is the number of physical records per track as expected?
- Data Component VOLUME: TRACKS/CA
- If your file is of any size, this value should be equal to the number of tracks per cylinder.
If this is not the case, it means your A
- Data Component VOLUME: EXTENT-NUMBER
- If the number of extents is large, it may be that the disk space where the file was allocated
has been fragmented by other files' DEFINE and DELETE activity. Physical Space defragmentation
may be desirable, or the allocation amounts defined for the file may be smaller than optimumb.
Extract of sample LISTCAT data
CLUSTER ------- TST.FILE
. . .
DATA ------- TST.FILE.DATA
. . .
ATTRIBUTES
KEYLEN----------------44 AVGLRECL-------------505 BUFSPACE-----------10240 CISIZE--------------4096
RKP--------------------0 MAXLRECL-------------505 EXCPEXIT----------(NULL) CI/CA----------------180
SHROPTNS(2,3) NORECOVERY SUBALLOC NOERASE INDEXED NOWRITECHECK NOIMBED NOREPLICAT
UNORDERED NOREUSE NONSPANNED NONRECVABLE
STATISTICS
REC-TOTAL-------------13 SPLITS-CI--------------0 EXCPS-- ---------------28
REC-DELETED------------0 SPLITS-CA--------------0 EXTENTS----------------2
REC-INSERTED-----------0 FREESPACE-%CI----------0 SYSTEM-TIMESTAMP:
REC-UPDATED------------0 FREESPACE-%CA----------0 88.194 08:38:01
REC-RETRIEVED----------0 FREESPC-BYTES----------0 X'9ECD8557B1912C10'
ALLOCATION
SPACE-TYPE------CYLINDER
SPACE-PRI--------------6 USECLASS-PRI-----------0 HI-ALLOC-RBA-------94208
SPACE-SEC--------------3 USECLASS-SEC-----------0 HI-USED-RBA--------94208
VOLUME
VOLSER------------SIN238 PHYREC-SIZE---------4096 HI-ALLOC-RBA-------47104 EXTENT-NUMBER----------1
DEVTYPE-------------3380 PHYRECS/TRK-----------12 HI-USED-RBA--------47104 EXTENT-TYPE--------X'00'
VOLFLAG------------PRIME TRACKS/CA-------------15
LOW-KEY---------------00
HIGH-KEY--------------3F
HI-KEY-RBA----------5120
EXTENTS:
LOW-CCHH-----X'00000002' LOW-RBA----------------0 TRACKS-----------------3
HIGH-CCHH----X'00000004' HIGH-RBA-----------47103
VOLUME
. . .
INDEX ------ TST.FILE.INDEX
. . .
Why does my catalog keep running out of space?
>There are several unrelated conditions that can cause a catalog to become full.
The simple cases occur when there are many datasets defined in a catalog, but
the vexing case occurs when there are relatively few (perhaps less than a hundred)
datasets defined within the catalog.
Note that this may be essentially the same problem as discussed in
the next question about why files may take much more
space than estimated.
First, let's dispose of the cases from most obvious to least obvious:
- An additional secondary allocation is needed for the catalog, but no space
is available...
Well, you may need to make more space available -- it may be that the volume on
which the catalog was allocated became full of extents belonging to other datasets.
To correct this, you may be able to use IDCAMS DELETE to remove datasets that
are no longer needed, thus freeing space for the catalog itself to grow.
BUT... it may be that your catalog is very dynamic and uses cluster or component
names that are system or application generated, and there is much more space
than the number of datasets would seem to demand. If this is the case, see
the third item in this answer.
- An additional secondary allocation is needed for the catalog, but sixteen
extents have already been allocated...
You'll have to redefine the catalog with larger allocation amounts, since
sixteen extents of the size you've allocated is not enough space for the catalog
itself.
Again, it is possible that the additional allocations are needed, not because
of the number of files or datasets defined in the catalog, but because the cluster
and/or component names are continually increasing in value. If this is the case,
you should see the next item in this answer.
- One of the above cases seems to be indicated, but only a few datasets are
defined in this catalog.
The catalog may have been in use for some time,
DEFINEing and DELETEing datasets, and there's never more
datasets defined than could easily fit into the catalog space available.
You've run into the problem that happens because a catalog is, in fact, a
VSAM KSDS. When a KSDS is loaded, a range of keys is allocated to each
control interval, and that space within each CI will not be used for
records with keys outside that range, even if there are no records left
in the control interval.
Within a catalog, two keyranges are defined.
- The higher-valued keyrange encompasses
keys beginning with x'40' through x'FF', and records with these keys are
sometimes referred to as "true name"
records. The record key is in fact the name of the VSAM object -- dataset or cluster,
component, volume, etc. It is within this keyrange that the usual problems occur.
- The lower-valued keyrange encompasses keys beginning with x'00' through x'3F',
and the records within this range contain the majority of the information maintained
by VSAM in a catalog, such as extent information, volume space maps, details of
datasets, and so on.
This problem occurs regularly when you allow VSAM data set components' -- the
data and index components' -- names to default to VSAM generated names, or when
you specifically create VSAM data set names with similar characteristics. The
default names are created from the system's Time-of-Day Clock, and so each time
a data set is deleted and defined, new and higher-valued names are generated. The
old component records and their names may be deleted, but the space in the control
intervals cannot be reused by the new (higher-key-valued) records being added to
the catalog. Many automatic processes tend to use some sort of automatic, date
and time based name generation, so that the names generated will always be unique.
One widely used application in VSE environments is the report distribution system,
INFOPAC, by Mobius Management Systems, Inc.. This application gives you an install time option to
generate its names using DATE/TIME or TIME/DATE ordering. By letting the TIME
component of its name be the more significant part of the dataset name, you will
tend over time to recycle through the key ranges and reclaim space within the
catalog. The DATE/TIME ordering will cause the name values to continually
increase, and the catalog space used to manage deleted datasets will never be
reused.
I mentioned INFOPAC here only because its developers were aware of this VSAM
characteristic and give the user an option to minimize its impact. You can use the
same or a similar procedure to allow your dataset names to cycle through the
same range of key values by using similar techniques.
Recently (early 2002) there were several threads on the VSE-L list which
discussed aspects of this, including JCL routines to automatically generate
dataset names for automatic FTP processes and the like.
Note that even if you've re-used the same -- identical -- cluster (or dataset)
names over and over, if you have not explicitly defined the component names for
the cluster, you can encounter the same problem, as VSAM generated names for
those components will continually increase and eventually cause the same problem.
What should you do about this, when you cannot change the way the names
are created?
One thing that's simple to do is to exploit the catalog reorganization
function of IDCAMS REPRO. When REPRO's input dataset is a catalog, and its
output dataset is a sequential file or ESDS, the catalog's records will be
unloaded in key sequence. Then, if that sequential file or ESDS is used
as the input file, with the catalog itself as the output dataset, the catalog
will be reorganized and the space will be reclaimed and made available for
additional catalog entries.
Catalog REPRO is very quick, as it only processes the catalog records, not
the datasets defined by those records. The total amount of data involved is
small.
I recommend that you consider this function, and use it regularly to
reorganize your catalogs when you have dataset and/or component names which
are automatically generated. You will need to quiesce those subsystems which
use datasets within the catalog being reorganized, but this should be of small
impact since the process can be so quick.
You might evaluate how much space a catalog needs to have for a month of
activity, and then arrange for a weekly reorganization, thus keeping far from
the catalog's capacity limits. Similarly, you might consider providing enough
space in the catalog for three month's activity, and reorganize the catalog once
a month.
How can I defragment the space owned by VSAM?
VSAM Managed Space can become fragmented.
As files are defined, space is allocated to them.
Later, files are deleted and the space they occupy becomes free for use by other file allocation requests. Depending on the size and distribution of the free and used space allocations within the VSAM space, fragmentation may occur.
There is no VSAM defragmentation command. Using IDCAMS commands, however, VSAM space can be defragmented:
- Use IDCAMS BACKUP to make a copy of all files with extents on the volume containing the space to be defragmented.
- Use IDCAMS DELETE CLUSTER to delete all the files that were backed up. The output of the BACKUP command will contain a list of these files.
- Use IDCAMS RESTORE to restore all the files that were backed up and deleted.
IDCAMS RESTORE will delete files as they are being restored, but this will only release the space owned by one file at a time, leaving significant space fragmentation. To allow the space to be defragmented, all the files in the space should be deleted before the RESTORE command is processed.
You may want to consider making a second BACKUP because of the (unlikely) possibility that the first BACKUP output copy is unreadable, and you will be DELETEing the files!
Clearly this is fairly simple, and it could be readily automated using a REXX procedure, among other techniques.
Why does my file keep running out of space?
There are several unrelated conditions that can cause a file to need more space than estimated.
Let's review the cases from most obvious to least obvious:
- An additional secondary allocation is needed for the file, but no space
is available...
Well, you may need to make more space available -- it may be that the volume(s) on
which the file was allocated became full of extents belonging to this and/or other
datasets.
To correct this, you may be able to use IDCAMS DELETE to remove datasets that
are no longer needed, thus freeing space for the catalog itself to grow. You may
be able to allocate additional volumes for the file in question -- the IDCAMS
ALTER datacomponentname ADDVOLUMES... command function can do this for you.
BUT... it may be that:
- Your file is very dynamic and uses keys that are
system or application generated in a continually increasing manner
- Your file has a high percentage of CI and/or CA freespace defined
at load time
- The insert activity for the file causes CI and CA splits, which
essentially add additional freespace, but that additional space is
not usually needed for additional inserts that would fall into the
same range of keys as the CI where the split was triggered.
Thus there is much more space required
than the number of records would seem to demand.
- An additional secondary allocation is needed for the file, but no more
extents can be allocated since 123 extents have already been allocated...
You'll have to redefine the file with larger allocation amounts, since
123 extents of the size you've allocated is not enough space for the catalog
itself.
Again, it is possible that the additional allocations are needed, not because
of the number of records in the file, but because the allocation amounts and
other parameters that control how VSAM allocates and uses space for files are
set improperly.
The most common case for this seems to occur because a file has grown
beyond its initial design. You should ensure that primary and secondary
space allocation amounts are appropriate for the actual file's requirements.
It is often the case that secondary space allocation amounts are too small.
Consider ensuring:
- Primary and secondary allocation amounts should be at least one cylinder
in size for any file larger than a few tracks in size.
- Primary and secondary allocation amounts might be chosen to be on the
order of 1/10th of the total size of the file for files less than a volume
in size.
- Remember that the first allocation for a file on any volume will be the
size of the primary space allocation, and any additional allocations will be
the size of the secondary space allocation. It's not inappropriate to have
the secondary allocation size larger than the primary allocation size if that
permits allocations to fill volumes better.
- RAMAC Virtual Array DASD, and similar log-structured subsystems will not
waste real disk space for disk tracks that are left empty. Why not let this
technology simplify your allocations if it is available to you?
- One of the above cases seems to be indicated, but only a few records
(relative to the space allocated) are actually loaded in the file.
The file may have been in use for some time,
INSERTing and DELETEing records, and there's never more
records defined than could easily fit into the space available.
You've run into the problem that happens because of the index structure of
VSAM KSDS. When a KSDS is loaded, a range of keys is allocated to each
control interval, and that space within each CI will not be used for
records with keys outside that range, even if there are no records left
in the control interval.
This problem occurs regularly when you all (or significant) insert activity
occurs at a limited number of places in the file. The typical example is in
a branch banking scenario, where each branch has its own range of account
numbers (file keys), and each new account opened in a branch has a key value
just larger than the previous account opened. (The worst case scenario is
where there is essentially only one branch, and all inserts into the file
occur at one point. The insert "hot spot" need not be at the end of the
file, but can be located anywhere within the range of keys for the file.
What should you do about this, when you cannot change the way the keys
are created?
One thing that's simple to do is to reorganize the file. User applications,
system utilities, SORTs, and IDCAMS REPRO can be used as a reorganization
tool.
I recommend that you consider reorganization, and use it regularly to
reorganize your files when you have keys which have this behavior.
Reorganization can be of help, but careful design based on observed use
patterns may be better yet. See the question that discusses reorganization.
You might evaluate how much space -- primary and secondary allocations,
CI and CA freespace -- a file needs to have for a month of
activity, and then arrange for a weekly reorganization, thus keeping far from
the file's capacity limits. Similarly, you might consider providing enough
space in the file for three month's activity, and reorganize the file once
a month.
Reorganization should NOT be performed because some number of CI
or CA splits have occurred. The CI and CA split process creates free space
within a file at the point of insertion. If there is single or multiple
"hot spots" for insertion, CI and CA splits can be a very efficient way to
get appropriate free space generated within a file. Too-frequent reorganization
in these cases would simply eliminate that free space that could have been
used by additional insert activiity.