sds_bptree_cow_instance - Online Linux Manual PageSection : 3
Updated : Fri Feb 26 2021
Source : Version 2.0.3
Note : dirsrv

NAMEsds_bptree_cow_instance

SYNOPSIS
#include <sds​.h>

Data Fieldsuint32_t checksum
sds_bptree_instance * bi
sds_bptree_transaction * txn
sds_bptree_transaction * tail_txn
uint64_t txn_count
pthread_rwlock_t * read_lock
pthread_mutex_t * write_lock

Detailed DescriptionThe copy on write tree instance​. This stores the current active reading transaction, number of active transactions, and pointers to the root of the b+tree​. This instance is the most important part of the tree and provides us with memory safety​. Once we have memory safety, individual transactions are free to garbage collect on their own​. Garbage collection is coordinated from the instance​.

MEMORY SAFTEYWithin the transaction, we have a small number of important fields​. These are: • read_lock • write_lock • tail_txn

Read Transaction BeginAt the beginning of a read transaction, we take a rolock on read_lock​. This ensures that on our working thread, that the correct sequence of barriers for memory consistency of the tree is issued​. Within the read_lock, we use cpu atomics to increment the transaction reference count by 1​. We then release the read_lock​. In this way, we allow multiple read transactions to be initiated at the same time, with the only serialisation point being the atomic increment​. While the read transaction is 'active' (IE the transaction that is listed in binst, where new transactions will reference), the reference count will never go below 1​.

Write TransactionAt the beginning of a write transaction, we take the write_lock mutex​. This guarantees we are the only writer in the tree at a point in time​. Due to the way mutexes work, this guarantees our memory barries of the tree are issued, so we see all other writes that have occured​. During a write, we store a list of all nodes that were cloned: Both so that we know who we cloned from but aso what nodes we created​.

Write AbortDuring a write abort, we release the write lock immediately​. We then use the created node list, and free all contents as they were never made active​.

Write commitDuring a commit, we prepare the transaction by changing it's state flags, and setting it's reference count atomically​. We relinquish the created list, because we have no need to free the nodes that are about to become part of the tree​. However, we update the previous transaction with the 'owned' list, as these are nodes we cloned from: IE these are nodes that serve no future role in the tree, so the previous transaction is the last point where they are valid and needed - The last transaction now must clean up after itself when it's ready to GC! We set our reference count to 2, to indicate that a following node (the previous active transaction) relies on us, and that we are also active​. We now take the write variant of the rw read_lock​. This waits for all in progress read transactions to begin, and we take exclusive access of this lock​. We then pivot the older transaction out with our transaction​. The read_lock is released, and read transactions may now resume​. At this point, we release the write_lock, and a new write may begin​. We decrement the reference count of the previous transaction which may cause it to drop to 0​. This is the equivalent of a read transaction close​. After this point, the previous transactions reference count will only decrement, as previous holders close their transactions​. Increment only occurs from the active transaction!

Read transaction close​.When we close a read transaction, we require no locks​. We use cpu atomic to set and fetch the reference count​. If the reference count remains positive, we complete, and continue​. If the reference count drops to 0, we take a reference to the next transaction in the series, and we free our node​. We then decrement the reference counter of the next transaction and check the result​. Either, the reference count is > 0 because it is the active transaction or there is a current read transaction​. OR, the reference count reaches 0 because there are no active transactions, so we can free this node​. in the same operation, we then decrement the reference count to the next node in the series, and we repeat this til we reach a node with a positive reference count​. Due to the design of this system, we are guarantee no thread races, as one and only one thread can ever set the reference count to 0: either the closing read, or the closing previous transaction​. This gives us complete thread safety, but without the need for a mutex to ensure this​.

Usage of the memory safe properties​.Due to the way the barriers are issued, when you have a read transaction or a write transaction, you can guarantee that the tree memory is consistent and correct, and that all values will remain alive until you close the transaction​. This means you can use this to allow parallel tasks in complex cases​. Directory Server has such a case with the plugin subsystem​. Consider every operation needs a consistent list of plugins​. You build a tree of the plugin references (dlopen pointers, reference count, pblock) and store them​. When you begin the operation you take a read transaction of this​. This means every plugin in the operation is guaranteed to be in the tree and held open for the duration of this operation​. During the read, if another thread commits the delete on a plugin, this drops the ref count, but not below 0 - the operation can continue safely​. Once the operation is complete, the transaction is closed​. At this point, the vacuum runs during close, which would trigger the value free function​. This would now actually close the plugin pointers, values​. Additionally, the correct use of the transaction guarantees the memory content of the plugin is correct due to the barriers inside of the transaction​.

Field Documentation

sds_bptree_instance* sds_bptree_cow_instance::biPointer to the current b+tree instance​. This contains our function pointers for free, dup of key and values​.

uint32_t sds_bptree_cow_instance::checksumchecksum of the instance data​.

pthread_rwlock_t* sds_bptree_cow_instance::read_lockThe transaction read lock​. This is used to prevent corruption of the txn pointer during a read or write transaction acquisition​.

sds_bptree_transaction* sds_bptree_cow_instance::tail_txnThe pointer to the current 'oldest' alive transaction​. This is needed for correct cleanup without corrupting intermediate trees​. Consider we have a set of transactions, such as A -> B -> C​. Between each a set of changes has occured​. We assume that if we have branches in these, then during a cow operation they can be updated​. So we end up in this situation​. [ A ] [ B ] [ C ] | | | [ root a ] [ root b ] [ root c ] /   /   / [ a ] [ x ] [ a ] [ b ] [ c ] [ b ]
 Now, when we clone txn B, because C performed a cow on the leaf A, txn B 'owns' leaf A​. At this point, leaf A will be freed, which frees it from underneath transaction A​. We have corrupted the tree of a previous node!
To prevent this, we only truly free a transaction when the ref count is 0 AND it's the last transaction in the list of live transactions​. This way, B even though ref count reached 0, would not be freed til A is also freed​.

sds_bptree_transaction* sds_bptree_cow_instance::txnThe pointer to the current 'latest' commited transaction​. When a new read transaction is taken, this is the value that is returned (until a new write transaction is commited​.)

uint64_t sds_bptree_cow_instance::txn_countThe number of active transactions in the system​.

pthread_mutex_t* sds_bptree_cow_instance::write_lockThe transaction write lock​. During a write transaction, this is held for the duration to allow only a single writer​.

AuthorGenerated automatically by Doxygen for dirsrv from the source code​.
0
Johanes Gumabo
Data Size   :   20,472 byte
man-sds_bptree_cow_instance.3Build   :   2024-12-05, 20:55   :  
Visitor Screen   :   x
Visitor Counter ( page / site )   :   2 / 233,533
Visitor ID   :     :  
Visitor IP   :   18.221.240.14   :  
Visitor Provider   :   AMAZON-02   :  
Provider Position ( lat x lon )   :   39.962500 x -83.006100   :   x
Provider Accuracy Radius ( km )   :   1000   :  
Provider City   :   Columbus   :  
Provider Province   :   Ohio ,   :   ,
Provider Country   :   United States   :  
Provider Continent   :   North America   :  
Visitor Recorder   :   Version   :  
Visitor Recorder   :   Library   :  
Online Linux Manual Page   :   Version   :   Online Linux Manual Page - Fedora.40 - march=x86-64 - mtune=generic - 24.12.05
Online Linux Manual Page   :   Library   :   lib_c - 24.10.03 - march=x86-64 - mtune=generic - Fedora.40
Online Linux Manual Page   :   Library   :   lib_m - 24.10.03 - march=x86-64 - mtune=generic - Fedora.40
Data Base   :   Version   :   Online Linux Manual Page Database - 24.04.13 - march=x86-64 - mtune=generic - fedora-38
Data Base   :   Library   :   lib_c - 23.02.07 - march=x86-64 - mtune=generic - fedora.36

Very long time ago, I have the best tutor, Wenzel Svojanovsky . If someone knows the email address of Wenzel Svojanovsky , please send an email to johanes_gumabo@yahoo.co.id .
If error, please print screen and send to johanes_gumabo@yahoo.co.id
Under development. Support me via PayPal.