Custom Scala Collections
TODO english docs in few months
A collection library that may be missing in the standard Scala library. Currently under development and will be ready in a few months. At present, it contains the following collections:
-
Mutable:
- VList
- BoundedQueue
- IntMap
- IntSet
- PrefixMap
- PrefixSet
-
Immutable:
- BoundedQueue
- BoundedBucketQueue
- Router
The library is built for versions 2.11.8 and 2.12.1. To connect the library, add the following line to your .sbt file
libraryDependencies += "com.github.a14e" %% "collz" % "0.2.2"
All mutable collections are not thread-safe.
The class is a possible alternative to ListBuffer or ArrayBuffer. Unlike ListBuffer, it provides fast access to elements by index and more efficient memory use, and unlike ArrayBuffer, it allows adding elements to the end without rebuilding the entire array.
Implemented as a VList with one modification: the list of subarrays is replaced with an array buffer, which has somewhat simplified the implementation and added efficiency to iterations in both forward and reverse directions. The complexity of appending to the end is O(log(log(n))), which is approximately ~O(1). The speed of access by index in the worst case (for index 0) is O(log(n)), but in most cases it's O(1), since 50% ... 75% of the indices are contained in the last 2 subarrays.
import a14e.collz.mut.VList
val list = VList[Int](1, 2, 3)
list += 1 // list == VList(1, 2, 3, 1)
list ++= List(2, 3) // list == VList(1, 2, 3, 1, 2, 3)
val x = list(1) // x == 2
var sum = 0
list.foreach(sum += _) // sum == 12
This is a FIFO queue with a fixed size. New elements are added to the end using methods like push, pushAll, :+ and so on. Elements are retrieved and removed from the beginning by calling pull methods. When the size is exceeded, elements from the beginning are removed. It is implemented with quick addition to the end and extraction from the beginning without rebuilding the entire queue. Also, it has very fast access by index and quick iterations through the queue.
Internally, it is implemented using two arrays of the same size as the queue. The first array has pointers to the beginning and the end of the queue, and the second array has only a pointer to the end. (here, a pointer refers to the index number). When the first array is full, the second one starts to fill. When the size is exceeded or elements are extracted from the queue by the pull method, the pointer to the beginning of the queue shifts, and the place of the extracted element is filled with null, to assist the garbage collector. When the second array runs out of space, or the pointer to the beginning in the second array reaches the end, then the second array replaces the first.
import a14e.collz.mut.BoundedQueue
val queue = BoundedQueue[Int](2)
queue += 1 // queue == BoundedQueue(1)
queue += 2 // queue == BoundedQueue(1, 2)
queue += 3 // queue == BoundedQueue(2, 3)
val x1 = queue.pull() // x1 == 2, queue == BoundedQueue(3)
val x2 = queue.pull() // x2 == 3, queue == BoundedQueue()
queue ++= List(1, 2, 3) // queue == BoundedQueue(2, 3)
val x3 = queue(0) // x3 == 2
val x4 = queue(1) // x4 == 3
var sum = 0
queue.foreach(sum += _) // sum === 5
The immutable implementations are done in a similar way. Only instead of arrays, the Vector class is used.
import a14e.collz.immut.BoundedQueue
val queue0 = BoundedQueue[Int](2)
val queue1 = queue0 :+ 1 // queue1 == BoundedQueue(1)
val queue2 = queue1 :+ 2 // queue2 == BoundedQueue(1, 2)
val queue3 = queue1 :+ 3 // queue3 == BoundedQueue(2, 3)
val (queue4, x1) = queue3.pull() // x1 == 2, queue4 == BoundedQueue(3)
val (queue5, x2) = queue4.pull() // x2 == 3, queue5 == BoundedQueue()
val queue6 = queue5 :++ List(1, 2, 3) // BoundedQueue(2, 3)
val x3 = queue6(0) // x3 == 2
val x4 = queue6(1) // x4 == 3
var sum = 0
queue6.foreach(sum += _) // sum === 5
In the immutable version, some elements may secretly remain in the collection after pull(), it's worth remembering this to avoid memory leaks.
The standard Scala library has an efficient immutable IntMap implementation. Here, an efficient mutable implementation is presented.
Implemented as a prefix tree, each node of which has up to 16 branches, the branch number is determined as key & F, as the depth increases, the key is shifted 4 bits to the right. It's very close to the immutable HashMap implementation, but it has a much simpler implementation and slightly better performance when searching for and adding elements.
When searching by element, benchmarks show no significant difference compared to mutable AnyRefMap, HashMap, etc. But the speed of adding and removing elements can be 4-8 times faster. The complexity of adding, removing, and key lookup operations is O(log16(n)).
A disadvantage of this implementation due to the use of 16 arrays is higher memory consumption and slower iterations through the collection.
Keys or null values are not allowed.
import a14e.collz.mut.IntMap
val map = IntMap[Int](1 -> 3, 2 -> 4) // IntMap(1 -> 1, 2 -> 2)
val x1 = map.get(1) // x1 == Some(3)
val x2 = map.get(2) // x2 == Some(4)
val x3 = map.get(0) // x3 == None
val x4 = map.get(3) // x4 == None
map(1) = 5 // map == IntMap(1 -> 5, 2 -> 2)
map(6) = 7 // map == IntMap(1 -> 5, 2 -> 2, 6 -> 7)
val x5 = map.contains(2) // x5 == true
map -= 2 // map == IntMap(1 -> 5, 6 -> 7)
val x6 = map.contains(2) // x6 == false
var sum = 0
map.foreach{ case (key, value) => sum += value } // sum == 12
map.clear() // map = IntMap()
An implementation of mutable.Set[_], optimized for working with Int. It's based on prefix trees. It is a lightweight wrapper over IntMap.
import a14e.collz.mut.IntSet
val set = IntSet(1, 2, 3, 4) // IntSet(1, 2, 3, 4)
val res1 = set.contains(1) //res1 == true
val res2 = set.contains(2) //res2 == true
val res3 = set.contains(6) //res3 == false
set += 6 // set == IntSet(1, 2, 3, 4, 6)
val res4 = set.contains(6) //res4 == true
set -= 1 // set == IntSet(2, 3, 4, 6)
val res5 = set.contains(1) //res5 == false
var sum = 0
set.foreach(sum += _) // sum == 15
An implementation of mutable.Map[_, _], optimized for working with strings. Implemented as prefix trees. For quick element lookup in a node, each node contains an IntMap, which allows the worst possible search time to depend little on the number of elements in the collection, but only on the length of the key. The search time in the worst case is approximately equal to the time of search in the best case in a hash table, and the difference will be greater the longer the key. For long strings like URLs with large common parts, a significant increase in speed can be achieved compared to a hash table. There is also the ability to efficiently find all strings starting with a certain prefix and determine if such strings exist.
Keys or null values are not allowed.
import a14e.collz.mut.PrefixMap
val map = PrefixMap("string" -> 1, "stringWithSuffix" -> 2, "string2" -> 3, "anotherString" -> 4)
val x1 = map.get("string") // s1 == Some(1)
val x2 = map.get("string2") // s1 == Some(3)
val x3 = map.get("string3") // s1 == None
val x4 = map.hasPrefix("stri") // x4 == true
val x5 = map.hasPrefix("unknown") // x5 == false
val x6 = map.findForPrefix("stri").toList
// x6 == List("string" -> 1, "stringWithSuffix" -> 2, "string2" -> 3)
map -= "string"
// map == PrefixMap("stringWithSuffix" -> 2, "string2" -> 3, "anotherString" -> 4)
map("string3") = 5
// map == PrefixMap("stringWithSuffix" -> 2, "string2" -> 3, "anotherString" -> 4, "string3" -> 5)
An implementation of mutable.Set[_], optimized for working with strings. Implemented as a thin wrapper around PrefixMap.
Null values are not allowed.
import a14e.collz.mut.PrefixSet
val set = PrefixSet("string", "stringWithSuffix", "string2", "anotherString" )
val x1 = set.contains("string") // s1 == true
val x2 = set.contains("string2") // s1 == true
val x3 = set.contains("string3") // s1 == false
val x4 = set.hasPrefix("stri") // x4 == true
val x5 = set.hasPrefix("unknown") // x5 == false
val x6 = set.findForPrefix("stri").toList
// x6 == List("string", "stringWithSuffix", "string2")
set -= "string"
// set == PrefixSet("stringWithSuffix", "string2", "anotherString")
set += "string3"
// set == PrefixSet("stringWithSuffix", "string2", "anotherString", "string3")
In general, it's very similar to BoundedQueue, but it does not delete past values, instead, it stores them in a bucket, which is filled according to the stack principle.
Since the collection consists of 2 sub-collections, it doesn't make sense for it to inherit various types of Iterable[_], etc. Therefore, to access the iterations, you first need to call the queue or bucket methods.
import a14e.collz.immut.BoundedBucketQueue
val queue0 = BoundedBucketQueue[Int](2)
val queue1 = queue0 :+ 1 // queue1.queue == BoundedQueue(1) и queue1.bucket == Nil
val queue2 = queue1 :+ 2 // queue2.queue == BoundedQueue(1, 2) и queue1.bucket == Nil
val queue3 = queue1 :+ 3 // queue3.queue == BoundedQueue(2, 3) и queue1.bucket == List(1)
val (queue4, x1) = queue3.pull() // x1 == 2, queue4 == BoundedQueue(3) и queue1.bucket == List(1)
val (queue5, x2) = queue4.pull() // x2 == 3, queue5 == BoundedQueue() и queue1.bucket == List(1)
val queue6 = queue5 :++ List(1, 2, 3) // queue == BoundedQueue(2, 3) и queue1.bucket == List(1, 1)
val x3 = queue6(0) // x3 == 2
val x4 = queue6(1) // x4 == 3
A simple class for routing. It's needed for the random selection of incoming elements based on a certain key. The key can be any descendant of Any. Elements will be selected approximately randomly, but unambiguously for each key. Regardless of the order of addition, routing will be unambiguous. However, even a single-element change can significantly alter the distribution by keys. The algorithm is also known as Ketama.
import a14e.collz.immut.Router
val router = Router("127.0.0.1:2222", "127.0.0.1:2223", "127.0.0.1:2224")
val id1 = 1
val id2 = 2
val id3 = 2
router.route(id1) // 127.0.0.1:2223
router.route(id2) // 127.0.0.1:2224
router.route(id3) // 127.0.0.1:2222
The project is built for Scala 2.11.8 and 2.12.1
To copy the source code:
$ git clone https://github.com/a14e/collz.git
$ cd collz
To build:
$ sbt update
$ sbt +test
$ sbt +package
Take the .jar from the target folder.
- Make proper documentation in English (English docs)
- Add new collections
- Immutable:
a. IntervalMap
- Immutable:
- Add benchmarks to the repository