# Containers for storing coercion data¶

Containers for storing coercion data

This module provides TripleDict and MonoDict. These are structures similar to WeakKeyDictionary in Python’s weakref module, and are optimized for lookup speed. The keys for TripleDict consist of triples (k1,k2,k3) and are looked up by identity rather than equality. The keys are stored by weakrefs if possible. If any one of the components k1, k2, k3 gets garbage collected, then the entry is removed from the TripleDict.

Key components that do not allow for weakrefs are stored via a normal refcounted reference. That means that any entry stored using a triple (k1,k2,k3) so that none of the k1,k2,k3 allows a weak reference behaves as an entry in a normal dictionary: Its existence in TripleDict prevents it from being garbage collected.

That container currently is used to store coercion and conversion maps between two parents (trac ticket #715) and to store homsets of pairs of objects of a category (trac ticket #11521). In both cases, it is essential that the parent structures remain garbage collectable, it is essential that the data access is faster than with a usual WeakKeyDictionary, and we enforce the “unique parent condition” in Sage (parent structures should be identical if they are equal).

MonoDict behaves similarly, but it takes a single item as a key. It is used for caching the parents which allow a coercion map into a fixed other parent (trac ticket #12313).

By trac ticket #14159, MonoDict and TripleDict can be optionally used with weak references on the values.

class sage.structure.coerce_dict.MonoDict

Bases: object

This is a hashtable specifically designed for (read) speed in the coercion model.

It differs from a python WeakKeyDictionary in the following important ways:

• Comparison is done using the ‘is’ rather than ‘==’ operator.
• Only weak references to the keys are stored if at all possible. Keys that do not allow for weak references are stored with a normal refcounted reference.
• The callback of the weak references is safe against recursion, see below.

There are special cdef set/get methods for faster access. It is bare-bones in the sense that not all dictionary methods are implemented.

IMPLEMENTATION:

It is implemented as a list of lists (hereafter called buckets). The bucket is chosen according to a very simple hash based on the object pointer, and each bucket is of the form [id(k1), r1, value1, id(k2), r2, value2, ...], on which a linear search is performed.

If ki supports weak references then ri is a weak reference to ki with a callback to remove the entry from the dictionary if ki gets garbage collected. If ki is does not support weak references then ri is identical to ki. In the latter case the presence of the key in the dictionary prevents it from being garbage collected.

INPUT:

• size – an integer, the initial number of buckets. To spread objects evenly, the size should ideally be a prime, and certainly not divisible by 2.
• data – optional iterable defining initial data.
• threshold – optional number, default $$0.7$$. It determines how frequently the dictionary will be resized (large threshold implies rare resizing).
• weak_values – optional bool (default False). If it is true, weak references to the values in this dictionary will be used, when possible.

EXAMPLES:

sage: from sage.structure.coerce_dict import MonoDict
sage: L = MonoDict(31)
sage: a = 'a'; b = 'ab'; c = -15
sage: L[a] = 1
sage: L[b] = 2
sage: L[c] = 3


The key is expected to be a unique object. Hence, the item stored for c can not be obtained by providing another equal number:

sage: L[a]
1
sage: L[b]
2
sage: L[c]
3
sage: L[-15]
Traceback (most recent call last):
...
KeyError: -15


Not all features of Python dictionaries are available, but iteration over the dictionary items is possible:

sage: # for some reason the following failed in "make ptest"
sage: # on some installations, see #12313 for details
sage: sorted(L.iteritems()) # random layout
[(-15, 3), ('a', 1), ('ab', 2)]
sage: # the following seems to be more consistent
sage: set(L.iteritems())
set([('a', 1), ('ab', 2), (-15, 3)])
sage: del L[c]
sage: sorted(L.iteritems())
[('a', 1), ('ab', 2)]
sage: len(L)
2
sage: L.stats()             # random
(0, 0.06451..., 1)
sage: L.bucket_lens()       # random layout
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
sage: for i in range(1000):
...       L[i] = i
sage: len(L)
1002
sage: L.stats()             # random
(26, 32.32258064516129, 37)
sage: L.bucket_lens()       # random layout
[32, 34, 31, 32, 33, 32, 32, 33, 34, 31, 32, 32, 31, 31, 32, 34, 31, 33, 34, 32, 32, 33, 33, 31, 33, 35, 32, 32, 32, 32, 31]
sage: L = MonoDict(101, L)
sage: L.stats()             # random
(8, 9.92079207920792, 12)
sage: L = MonoDict(3, L)
sage: L.stats()             # random
(0, 334.0, 985)
sage: L['a']
1
sage: L['c']
Traceback (most recent call last):
...
KeyError: 'c'


The following illustrates why even sizes are bad:

sage: L = MonoDict(4, L, threshold=0)
sage: L.stats()
(0, 250.5, 1002)
sage: L.bucket_lens()
[1002, 0, 0, 0]


Note that this kind of dictionary is also used for caching actions and coerce maps. In previous versions of Sage, the cache was by strong references and resulted in a memory leak in the following example. However, this leak was fixed by trac ticket #715, using weak references:

sage: K = GF(1<<55,'t')
sage: for i in range(50):
...     a = K.random_element()
...     E = EllipticCurve(j=a)
...     P = E.random_point()
...     Q = 2*P
sage: import gc
sage: n = gc.collect()
sage: from sage.schemes.elliptic_curves.ell_finite_field import EllipticCurve_finite_field
sage: LE = [x for x in gc.get_objects() if isinstance(x, EllipticCurve_finite_field)]
sage: len(LE)    # indirect doctest
1


TESTS:

Here, we demonstrate the use of weak values.

sage: M = MonoDict(13)
sage: MW = MonoDict(13, weak_values=True)
sage: class Foo: pass
sage: a = Foo()
sage: b = Foo()
sage: k = 1
sage: M[k] = a
sage: MW[k] = b
sage: M[k] is a
True
sage: MW[k] is b
True
sage: k in M
True
sage: k in MW
True


While M uses a strong reference to a, MW uses a weak reference to b, and after deleting b, the corresponding item of MW will be removed during the next garbage collection:

sage: import gc
sage: del a,b
sage: _ = gc.collect()
sage: k in M
True
sage: k in MW
False
sage: len(MW)
0
sage: len(M)
1


Note that MW also accepts values that do not allow for weak references:

    sage: MW[k] = int(5)
sage: MW[k]
5

TESTS:

The following demonstrates that :class:MonoDict is safer than
:class:~weakref.WeakKeyDictionary against recursions created by nested
callbacks; compare :trac:15069::

sage: M = MonoDict(11)
sage: class A: pass
sage: a = A()
sage: prev = a
sage: for i in range(1000):
....:     newA = A()
....:     M[prev] = newA
....:     prev = newA
sage: len(M)
1000
sage: del a
sage: len(M)
0

The corresponding example with a Python :class:weakref.WeakKeyDictionary
would result in a too deep recursion during deletion of the dictionary
items::

sage: import weakref
sage: M = weakref.WeakKeyDictionary()
sage: a = A()
sage: prev = a
sage: for i in range(1000):
....:     newA = A()
....:     M[prev] = newA
....:     prev = newA
sage: len(M)
1000
sage: del a
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <function remove at ...> ignored
sage: len(M)>0
True

AUTHORS:

- Simon King (2012-01)
- Nils Bruin (2012-08)
- Simon King (2013-02)
bucket_lens()

The distribution of items in buckets.

OUTPUT:

A list of how many items are in each bucket.

EXAMPLES:

sage: from sage.structure.coerce_dict import MonoDict
sage: L = MonoDict(37)
sage: for i in range(100): L[i] = None
sage: L.bucket_lens() # random
[3, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 3, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4]
sage: sum(L.bucket_lens())
100

sage: L = MonoDict(1,threshold=0)
sage: for i in range(100): L[i] = None
sage: L.bucket_lens()
[100]

eraser
iteritems()

EXAMPLES:

sage: from sage.structure.coerce_dict import MonoDict
sage: L = MonoDict(31)
sage: L[1] = None
sage: L[2] = True
sage: list(sorted(L.iteritems()))
[(1, None), (2, True)]

resize(buckets=0)

Change the number of buckets of self, while preserving the contents.

If the number of buckets is 0 or not given, it resizes self to the smallest prime that is at least twice as large as self.

EXAMPLES:

sage: from sage.structure.coerce_dict import MonoDict
sage: L = MonoDict(8)
sage: for i in range(100): L[i] = None
sage: L.bucket_lens() # random
[50, 0, 0, 0, 50, 0, 0, 0]
sage: L.resize(7)
sage: L.bucket_lens() # random
[15, 14, 14, 14, 14, 15, 14]
sage: L.resize()
sage: len(L.bucket_lens())
17

stats()

The distribution of items in buckets.

OUTPUT:

• (min, avg, max)

EXAMPLES:

sage: from sage.structure.coerce_dict import MonoDict
sage: L = MonoDict(37)
sage: for i in range(100): L[i] = None
sage: L.stats() # random
(2, 2.7027027027027026, 4)

sage: L = MonoDict(3007)
sage: for i in range(100): L[i] = None
sage: L.stats() # random
(0, 0.03325573661456601, 1)

sage: L = MonoDict(1,threshold=0)
sage: for i in range(100): L[i] = None
sage: L.stats()
(100, 100.0, 100)

class sage.structure.coerce_dict.MonoDictEraser

Bases: object

Erase items from a MonoDict when a weak reference becomes invalid.

This is of internal use only. Instances of this class will be passed as a callback function when creating a weak reference.

EXAMPLES:

sage: from sage.structure.coerce_dict import MonoDict
sage: class A: pass
sage: a = A()
sage: M = MonoDict(11)
sage: M[a] = 1
sage: len(M)
1
sage: del a
sage: import gc
sage: n = gc.collect()
sage: len(M)    # indirect doctest
0


TESTS:

The following shows that trac ticket #15069 is fixed. Background: If a MonoDictEraser is called when a weakly referenced key is deleted, then it deletes the corresponding key-value pair. But if the value is also used as a key, then a recursion of calls to the MonoDictEraser may occur. Therefore, we must keep the value-to-be-deleted alive until the callback is completed. This is achieved by letting the callback actually return the value.

sage: M = MonoDict(11)
sage: class A: pass
sage: a = A()
sage: prev = a
sage: for i in range(1000):
....:     newA = A()
....:     M[prev] = newA
....:     prev = newA
sage: len(M)
1000
sage: del a
sage: len(M)
0


AUTHOR:

• Simon King (2012-01)
class sage.structure.coerce_dict.TripleDict

Bases: object

This is a hashtable specifically designed for (read) speed in the coercion model.

It differs from a python dict in the following important ways:

• All keys must be sequence of exactly three elements. All sequence types (tuple, list, etc.) map to the same item.
• Comparison is done using the ‘is’ rather than ‘==’ operator.

There are special cdef set/get methods for faster access. It is bare-bones in the sense that not all dictionary methods are implemented.

It is implemented as a list of lists (hereafter called buckets). The bucket is chosen according to a very simple hash based on the object pointer, and each bucket is of the form [id(k1), id(k2), id(k3), r1, r2, r3, value, id(k1), id(k2), id(k3), r1, r2, r3, value, ...], on which a linear search is performed. If a key component ki supports weak references then ri is a weak reference to ki; otherwise ri is identical to ki.

INPUT:

• size – an integer, the initial number of buckets. To spread objects evenly, the size should ideally be a prime, and certainly not divisible by 2.
• data – optional iterable defining initial data.
• threshold – optional number, default $$0.7$$. It determines how frequently the dictionary will be resized (large threshold implies rare resizing).
• weak_values – optional bool (default False). If it is true, weak references to the values in this dictionary will be used, when possible.

If any of the key components k1,k2,k3 (this can happen for a key component that supports weak references) gets garbage collected then the entire entry disappears. In that sense this structure behaves like a nested WeakKeyDictionary.

EXAMPLES:

sage: from sage.structure.coerce_dict import TripleDict
sage: L = TripleDict(31)
sage: a = 'a'; b = 'b'; c = 'c'
sage: L[a,b,c] = 1
sage: L[a,b,c]
1
sage: L[c,b,a] = -1
sage: list(L.iteritems())     # random order of output.
[(('c', 'b', 'a'), -1), (('a', 'b', 'c'), 1)]
sage: del L[a,b,c]
sage: list(L.iteritems())
[(('c', 'b', 'a'), -1)]
sage: len(L)
1
sage: L.stats()             # min, avg, max (bucket length)
(0, 0.03225806451612903, 1)
sage: L.bucket_lens()       # random layout
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
sage: for i in range(1000):
...       L[i,i,i] = i
sage: len(L)
1001
sage: L.stats()             # random
(31, 32.29032258064516, 35)
sage: L.bucket_lens()       # random layout
[33, 34, 32, 32, 35, 32, 31, 33, 34, 31, 32, 34, 32, 31, 31, 32, 32, 31, 31, 33, 32, 32, 32, 33, 33, 33, 31, 33, 33, 32, 31]
sage: L = TripleDict(101, L)
sage: L.stats()             # random
(8, 9.9108910891089117, 11)
sage: L = TripleDict(3, L)
sage: L.stats()             # random
(291, 333.66666666666669, 410)
sage: L[c,b,a]
-1
sage: L[a,b,c]
Traceback (most recent call last):
...
KeyError: ('a', 'b', 'c')
sage: L[a]
Traceback (most recent call last):
...
KeyError: 'a'
sage: L[a] = 1
Traceback (most recent call last):
...
KeyError: 'a'


The following illustrates why even sizes are bad (setting the threshold zero, so that no beneficial resizing happens):

sage: L = TripleDict(4, L, threshold=0)
sage: L.stats()
(0, 250.25, 1001)
sage: L.bucket_lens()
[1001, 0, 0, 0]


Note that this kind of dictionary is also used for caching actions and coerce maps. In previous versions of Sage, the cache was by strong references and resulted in a memory leak in the following example. However, this leak was fixed by trac ticket #715, using weak references:

sage: K = GF(1<<55,'t')
sage: for i in range(50):
...     a = K.random_element()
...     E = EllipticCurve(j=a)
...     P = E.random_point()
...     Q = 2*P
sage: import gc
sage: n = gc.collect()
sage: from sage.schemes.elliptic_curves.ell_finite_field import EllipticCurve_finite_field
sage: LE = [x for x in gc.get_objects() if isinstance(x, EllipticCurve_finite_field)]
sage: len(LE)    # indirect doctest
1


TESTS:

Here, we demonstrate the use of weak values.

sage: class Foo: pass
sage: T = TripleDict(13)
sage: TW = TripleDict(13, weak_values=True)
sage: a = Foo()
sage: b = Foo()
sage: k = 1
sage: T[a,k,k]=1
sage: T[k,a,k]=2
sage: T[k,k,a]=3
sage: T[k,k,k]=a
sage: TW[b,k,k]=1
sage: TW[k,b,k]=2
sage: TW[k,k,b]=3
sage: TW[k,k,k]=b
sage: len(T)
4
sage: len(TW)
4
sage: (k,k,k) in T
True
sage: (k,k,k) in TW
True
sage: T[k,k,k] is a
True
sage: TW[k,k,k] is b
True


Now, T holds a strong reference to a, namely in T[k,k,k]. Hence, when we delete a, all items of T survive:

sage: del a
sage: _ = gc.collect()
sage: len(T)
4


Only when we remove the strong reference, the items become collectable:

sage: del T[k,k,k]
sage: _ = gc.collect()
sage: len(T)
0


The situation is different for TW, since it only holds weak references to a. Therefore, all items become collectable after deleting a:

sage: del b
sage: _ = gc.collect()
sage: len(TW)
0


Note

The index $$h$$ corresponding to the key [k1, k2, k3] is computed as a value of unsigned type size_t as follows:

$h = id(k1) + 13*id(k2) xor 503 id(k3)$

The natural type for this quantity is Py_ssize_t, which is a signed quantity with the same length as size_t. Storing it in a signed way gives the most efficient storage into PyInt, while preserving sign information.

As usual for a hashtable, we take h modulo some integer to obtain the bucket number into which to store the key/value pair. A problem here is that C mandates sign-preservation for the modulo operator “%”. We cast to an unsigned type, i.e., (<size_t> h)% N If we don’t do this we may end up indexing lists with negative indices, which may lead to segfaults if using the non-error-checking python macros, as happens here.

This has been observed on 32 bits systems, see trac ticket #715 for details.

AUTHORS:

• Robert Bradshaw, 2007-08
• Simon King, 2012-01
• Nils Bruin, 2012-08
• Simon King, 2013-02
bucket_lens()

The distribution of items in buckets.

OUTPUT:

A list of how many items are in each bucket.

EXAMPLES:

sage: from sage.structure.coerce_dict import TripleDict
sage: L = TripleDict(37, threshold=0)
sage: for i in range(100): L[i,i,i] = None
sage: L.bucket_lens() # random
[3, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 3, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4]
sage: sum(L.bucket_lens())
100


In order to have a test that isn’t random, we use parameters that should not be used in real applications:

sage: L = TripleDict(1, threshold=0)
sage: for i in range(100): L[i,i,i] = None
sage: L.bucket_lens()
[100]

eraser
iteritems()

EXAMPLES:

sage: from sage.structure.coerce_dict import TripleDict
sage: L = TripleDict(31)
sage: L[1,2,3] = None
sage: list(L.iteritems())
[((1, 2, 3), None)]

resize(buckets=0)

Change the number of buckets of self, while preserving the contents.

If the number of buckets is 0 or not given, it resizes self to the smallest prime that is at least twice as large as self.

EXAMPLES:

sage: from sage.structure.coerce_dict import TripleDict
sage: L = TripleDict(8)
sage: for i in range(100): L[i,i,i] = None
sage: L.bucket_lens() # random
[50, 0, 0, 0, 50, 0, 0, 0]
sage: L.resize(7) # random
[15, 14, 14, 14, 14, 15, 14]
sage: L.resize()
sage: len(L.bucket_lens())
17

stats()

The distribution of items in buckets.

OUTPUT:

• (min, avg, max)

EXAMPLES:

sage: from sage.structure.coerce_dict import TripleDict
sage: L = TripleDict(37)
sage: for i in range(100): L[i,i,i] = None
sage: L.stats() # random
(2, 2.7027027027027026, 4)

sage: L = TripleDict(3007)
sage: for i in range(100): L[i,i,i] = None
sage: L.stats() # random
(0, 0.03325573661456601, 1)


In order to have a test that isn’t random, we use parameters that should not be used in real applications:

sage: L = TripleDict(1, threshold=0)
sage: for i in range(100): L[i,i,i] = None
sage: L.stats()
(100, 100.0, 100)

class sage.structure.coerce_dict.TripleDictEraser

Bases: object

Erases items from a TripleDict when a weak reference becomes invalid.

This is of internal use only. Instances of this class will be passed as a callback function when creating a weak reference.

EXAMPLES:

sage: from sage.structure.coerce_dict import TripleDict
sage: class A: pass
sage: a = A()
sage: T = TripleDict(11)
sage: T[a,ZZ,None] = 1
sage: T[ZZ,a,1] = 2
sage: T[a,a,ZZ] = 3
sage: len(T)
3
sage: del a
sage: import gc
sage: n = gc.collect()
sage: len(T)
0


TESTS:

The following tests against a bugfix from trac ticket #15069:

sage: from sage.structure.coerce_dict import TripleDict
sage: a,b,c = L =  [A(), A(), A()]
sage: for x in range(1000):
....:     newA = A()
....:     T[L[0],L[1],L[2]] = newA
....:     tmp = L.pop(0)
....:     L.append(newA)
sage: len(T)
1000
sage: del a,b,c, L
sage: len(T)
0


AUTHOR:

• Simon King (2012-01)
sage.structure.coerce_dict.signed_id(x)

A function like Python’s id() returning signed integers, which are guaranteed to fit in a Py_ssize_t.

Theoretically, there is no guarantee that two different Python objects have different signed_id() values. However, under the mild assumption that a C pointer fits in a Py_ssize_t, this is guaranteed.

TESTS:

sage: a = 1.23e45  # some object
sage: from sage.structure.coerce_dict import signed_id
sage: s = signed_id(a)
sage: id(a) == s or id(a) == s + 2**32 or id(a) == s + 2**64
True
sage: signed_id(a) <= sys.maxsize
True


#### Previous topic

Base class for old-style parent objects with generators

Formal sums