Tiered Storage
Tiered storage is an attempt to allow Traffic Server to take advantage of physical storage with different properties. This design concerns only mechanism. Policies to take advantage of these are outside of the scope of this document. Instead we will presume an oracle which implements this policy and describe the queries that must be answered by the oracle and the effects of the answers.
Beyond avoiding question of tier policy, the design is also intended to be effectively identical to current operations for the case where there is only one tier.
The most common case for tiers is an ordered list of tiers, where higher tiers are presumed faster but more expensive (or more limited in capacity). This is not required. It might be that different tiers are differentiated by other properties (such as expected persistence). The design here is intended to handle both cases.
The design presumes that if a user has multiple tiers of storage and an ordering for those tiers, they will usually want content stored at one tier level to also be stored at every other lower level as well, so that it does not have to be copied if evicted from a higher tier.
Configuration
Each storage unit in storage.config
can be marked with a quality value which is 32 bit number. Storage units that are not marked are all assigned the same value which is guaranteed to be distinct from all explicit values. The quality value is arbitrary from the point of view of this design, serving as a tag rather than a numeric value. The user (via the oracle) can impose what ever additional meaning is useful on this value (rating, bit slicing, etc.).
In such cases, all volumes should be explicitly assigned a value, as the default (unmarked) value is not guaranteed to have any relationship to explicit values. The unmarked value is intended to be useful in situations where the user has no interest in tiered storage and so wants to let Traffic Server automatically handle all volumes as a single tier.
Operations
After a client request is received and processed, volume assignment is done. For each tier, the oracle would return one of four values along with a volume pointer:
READ
The tier appears to have the object and can serve it.
WRITE
The object is not in this tier and should be written to this tier if possible.
RW
Treat as READ
if possible, but if the object turns out to not in the cache treat as WRITE
.
NO_SALE
Do not interact with this tier for this object.
The volume returned for the tier must be a volume with the corresponding tier quality value. In effect, the current style of volume assignment is done for each tier, by assigning one volume out of all of the volumes of the same quality and returning one of RW
or WRITE
, depending on whether the initial volume directory lookup succeeds. Note that as with current volume assignment, it is presumed this can be done from in memory structures (no disk I/O required).
If the oracle returns READ
or RW
for more than one tier, it must also return an ordering for those tiers (it may return an ordering for all tiers, ones that are not readable will be ignored). For each tier, in that order, a read of cache storage is attempted for the object. A successful read locks that tier as the provider of cached content. If no tier has a successful read, or no tier is marked READ
or RW
then it is a cache miss. Any tier marked RW
that fails the read test is demoted to WRITE
.
If the object is cached, every tier that returns WRITE
receives the object to store in the selected volume (this includes RW
returns that are demoted to WRITE
). This is a cache to cache copy, not from the origin server. In this case, tiers marked RW
that are not tested for read will not receive any data and will not be further involved in the request processing.
For a cache miss, all tiers marked WRITE
will receive data from the origin server connection (if successful).
This means, among other things, that if there is a tier with the object all other tiers that are written will get a local copy of the object, and the origin server will not be used. In terms of implementation, currently a cache write to a volume is done via the construction of an instance of CacheVC
which recieves the object stream. For tiered storage, the same thing is done for each target volume.
For cache volume overrides (via hosting.config
) this same process is used except with only the volumes stripes contained within the specified cache volume.
Copying
It may be necessary to provide a mechanism to copy objects between tiers outside of a client originated transaction. In terms of implementation, this is straight forward using HttpTunnel
as if in a transaction, only using a CacheVC
instance for both the producer and consumer. The more difficult question is what event would trigger a possible copy. A signal could be provided whenever a volume directory entry is deleted, although it should be noted that the object in question may have already been evicted when this event happens.
Additional Notes
As an example use, it would be possible to have only one cache volume that uses tiered storage for a particular set of domains using volume tagging. hosting.config
would be used to direct those domains to the selected cache volume. The oracle would check the URL in parallel and return NO_SALE
for the tiers in the target cache volume for other domains. For the other tier (that of the unmarked storage units), the oracle would return RW
for the tier in all cases as that tier would not be queried for the target domains.