Skip to content

Commit

Permalink
Refactor isbits Union arrays to treat the type tag bytes like a prope…
Browse files Browse the repository at this point in the history
…r jl_array_t* with potential a->offset and a->maxsize values. Before, any resizing operations on isbits Union arrays (push, pushfirst, insert, append, deleteat, slice, etc.) incurred extra cost by having to constantly move the type tag bytes to be directly after the last array element. By treating them rather as jl_array_t* objects, sharing the a->offset and a->maxsize fields with the parent array, resizing operations now only require moving bytes around if the parent array's data itself must be moved. Fixes JuliaLang#27825 and JuliaLang#27809.
  • Loading branch information
quinnj committed Jul 22, 2018
1 parent 11cba0e commit 545e1d3
Show file tree
Hide file tree
Showing 11 changed files with 537 additions and 105 deletions.
1 change: 1 addition & 0 deletions doc/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ const PAGES = [
"devdocs/cartesian.md",
"devdocs/meta.md",
"devdocs/subarrays.md",
"devdocs/isbitsunionarrays.md",
"devdocs/sysimg.md",
"devdocs/llvm.md",
"devdocs/stdio.md",
Expand Down
13 changes: 13 additions & 0 deletions doc/src/devdocs/isbitsunionarrays.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# isbits Union Optimizations

In Julia, the `Array` type holds both "bits" values as well as heap-allocated "boxed" values. The distinction is whether the value itself is stored inline (in the direct allocated memory of the array), or if the memory of the array is simply a collection of pointers to objects allocated elsewhere. In terms of performance, accessing values inline is clearly an advantage over having to follow a pointer to the actual value. The definition of "isbits" generally means any Julia type with a fixed, determinate size, meaning no "pointer" fields, see `?isbitstype`.

Julia also supports Union types, quite literally the union of a set of types. Custom Union type definitions can be extremely handy for applications wishing to "cut across" the nominal type system (i.e. explicit subtype relationships) and define methods or functionality on these, otherwise unrelated, set of types. A compiler challenge, however, is in determining how to treat these Union types. The naive approach (and indeed, what Julia itself did pre-0.7), is to simply make a "box" and then a pointer in the box to the actual value, similar to the previously mentioned "boxed" values. This is unfortunate, however, because of the number of small, primitive "bits" types (think `UInt8`, `Int32`, `Float64`, etc.) that would easily fit themselves inline in this "box" without needing any indirection for value access. There are two main ways Julia can take advantage of this optimization as of 0.7: isbits Union fields in types, and isbits Union Arrays.

## isbits Union Structs

Julia now includes an optimization wherein "isbits Union" fields in types (`mutable struct`, `struct`, etc.) will be stored inline. This is accomplished by determining the "inline size" of the Union type (e.g. `Union{UInt8, Int16}` will have a size of 16 bytes, which represents the size needed of the largest Union type `Int16`), and in addition, allocating an extra "type tag byte" (`UInt8`), whose value signals the type of the actual value stored inline of the "Union bytes". The type tag byte value is the index of the actual value's type in the Union type's order of types. For example, a type tag value of `0x02` for a field with type `Union{Nothing, UInt8, Int16}` would indicate that an `Int16` value is stored in the 16 bytes of the field in the structure's memory; a `0x01` value would indicate that a `UInt8` value was stored in the first 8 bytes of the 16 bytes of the field's memory. Lastly, a value of `0x00` signals that the `nothing` value will be returned for this field, even though, as a singleton type with a single type instance, it technically has a size of 0. The type tag byte for a type's Union field is stored directly after the field's computed Union memory.

## isbits Union Arrays

Julia can now also store "isbits Union" values inline in an Array, as opposed to requiring an indirection box. The optimization is accomplished by storing an extra "type tag array" of bytes, one byte per array element, alongside the bytes of the actual array data. This type tag array serves the same function as the type field case: it's value signals the type of the actual stored Union value in the array. In terms of layout, a Julia Array can include extra "buffer" space before and after it's actual data values, which are tracked in the `a->offset` and `a->maxsize` fields of the `jl_array_t*` type. The "type tag array" is treated exactly as another `jl_array_t*`, but which shares the same `a->offset`, `a->maxsize`, and `a->len` fields. So the formula to access an isbits Union Array's type tag bytes is `a->data + (a->maxsize - a->offset) * a->elsize + a->offset`; i.e. the Array's `a->data` pointer is already shifted by `a->offset`, so correcting for that, we follow the data all the way to the max of what it can hold `a->maxsize`, then adjust by `a->ofset` more bytes to account for any present "front buffering" the array might be doing. This layout in particular allows for very efficient resizing operations as the type tag data only ever has to move when the actual array's data has to move.
204 changes: 121 additions & 83 deletions src/array.c

Large diffs are not rendered by default.

37 changes: 34 additions & 3 deletions src/cgutils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1664,6 +1664,11 @@ static Value *emit_arraysize(jl_codectx_t &ctx, const jl_cgval_t &tinfo, int dim
return emit_arraysize(ctx, tinfo, ConstantInt::get(T_int32, dim));
}

static Value *emit_vectormaxsize(jl_codectx_t &ctx, const jl_cgval_t &ary)
{
return emit_arraysize(ctx, ary, 2); // maxsize aliases ncols in memory layout for vector
}

static Value *emit_arraylen_prim(jl_codectx_t &ctx, const jl_cgval_t &tinfo)
{
Value *t = boxed(ctx, tinfo);
Expand Down Expand Up @@ -1695,7 +1700,7 @@ static Value *emit_arraylen_prim(jl_codectx_t &ctx, const jl_cgval_t &tinfo)
#endif
}

static Value *emit_arraylen(jl_codectx_t &ctx, const jl_cgval_t &tinfo, jl_value_t *ex)
static Value *emit_arraylen(jl_codectx_t &ctx, const jl_cgval_t &tinfo)
{
return emit_arraylen_prim(ctx, tinfo);
}
Expand Down Expand Up @@ -1743,6 +1748,15 @@ static Value *emit_arrayflags(jl_codectx_t &ctx, const jl_cgval_t &tinfo)
return tbaa_decorate(tbaa_arrayflags, ctx.builder.CreateLoad(addr));
}

static Value *emit_arrayndims(jl_codectx_t &ctx, const jl_cgval_t &ary)
{
Value *flags = emit_arrayflags(ctx, ary);
cast<LoadInst>(flags)->setMetadata(LLVMContext::MD_invariant_load, MDNode::get(jl_LLVMContext, None));
flags = ctx.builder.CreateLShr(flags, 2);
flags = ctx.builder.CreateAnd(flags, 0x3FF); // (1<<10) - 1
return flags;
}

static Value *emit_arrayelsize(jl_codectx_t &ctx, const jl_cgval_t &tinfo)
{
Value *t = boxed(ctx, tinfo);
Expand All @@ -1757,6 +1771,23 @@ static Value *emit_arrayelsize(jl_codectx_t &ctx, const jl_cgval_t &tinfo)
return tbaa_decorate(tbaa_const, ctx.builder.CreateLoad(addr));
}

static Value *emit_arrayoffset(jl_codectx_t &ctx, const jl_cgval_t &tinfo, int nd)
{
if (nd != -1 && nd != 1) // only Vector can have an offset
return ConstantInt::get(T_int32, 0);
Value *t = boxed(ctx, tinfo);
#ifdef STORE_ARRAY_LEN
int offset_field = 4;
#else
int offset_field = 3;
#endif

Value *addr = ctx.builder.CreateStructGEP(jl_array_llvmt,
emit_bitcast(ctx, decay_derived(t), jl_parray_llvmt),
offset_field);
return tbaa_decorate(tbaa_arrayoffset, ctx.builder.CreateLoad(addr));
}

// Returns the size of the array represented by `tinfo` for the given dimension `dim` if
// `dim` is a valid dimension, otherwise returns constant one.
static Value *emit_arraysize_for_unsafe_dim(jl_codectx_t &ctx,
Expand Down Expand Up @@ -1810,7 +1841,7 @@ static Value *emit_array_nd_index(
// the last one which we therefore have to do here.
if (nidxs == 1) {
// Linear indexing: Check against the entire linear span of the array
Value *alen = emit_arraylen(ctx, ainfo, ex);
Value *alen = emit_arraylen(ctx, ainfo);
ctx.builder.CreateCondBr(ctx.builder.CreateICmpULT(i, alen), endBB, failBB);
} else if (nidxs >= (size_t)nd){
// No dimensions were omitted; just check the last remaining index
Expand Down Expand Up @@ -1844,7 +1875,7 @@ static Value *emit_array_nd_index(
// Remove after 0.7: Ensure no dimensions were 0 and depwarn
ctx.f->getBasicBlockList().push_back(depfailBB);
ctx.builder.SetInsertPoint(depfailBB);
Value *total_length = emit_arraylen(ctx, ainfo, ex);
Value *total_length = emit_arraylen(ctx, ainfo);
ctx.builder.CreateCondBr(ctx.builder.CreateICmpULT(i, total_length), depwarnBB, failBB);

ctx.f->getBasicBlockList().push_back(depwarnBB);
Expand Down
28 changes: 20 additions & 8 deletions src/codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ static MDNode *tbaa_arrayptr; // The pointer inside a jl_array_t
static MDNode *tbaa_arraysize; // A size in a jl_array_t
static MDNode *tbaa_arraylen; // The len in a jl_array_t
static MDNode *tbaa_arrayflags; // The flags in a jl_array_t
static MDNode *tbaa_arrayoffset; // The offset in a jl_array_t
static MDNode *tbaa_arrayselbyte; // a selector byte in a isbits Union jl_array_t
static MDNode *tbaa_const; // Memory that is immutable by the time LLVM can see it

Expand Down Expand Up @@ -2511,10 +2512,16 @@ static bool emit_builtin_call(jl_codectx_t &ctx, jl_cgval_t *ret, jl_value_t *f,
else if (!isboxed && jl_is_uniontype(ety)) {
Type *AT = ArrayType::get(IntegerType::get(jl_LLVMContext, 8 * al), (elsz + al - 1) / al);
Value *data = emit_bitcast(ctx, emit_arrayptr(ctx, ary, ary_ex), AT->getPointerTo());
// isbits union selector bytes are stored directly after the last array element
Value *selidx = emit_arraylen_prim(ctx, ary);
// isbits union selector bytes are stored after a->maxsize
Value *ndims = (nd == -1 ? emit_arrayndims(ctx, ary) : ConstantInt::get(T_int16, nd));
Value *is_vector = ctx.builder.CreateICmpEQ(ndims, ConstantInt::get(T_int16, 1));
Value *offset = emit_arrayoffset(ctx, ary, nd);
Value *selidx_v = ctx.builder.CreateSub(emit_vectormaxsize(ctx, ary), ctx.builder.CreateZExt(offset, T_size));
Value *selidx_m = emit_arraylen(ctx, ary);
Value *selidx = ctx.builder.CreateSelect(is_vector, selidx_v, selidx_m);
Value *ptindex = ctx.builder.CreateInBoundsGEP(AT, data, selidx);
ptindex = emit_bitcast(ctx, ptindex, T_pint8);
ptindex = ctx.builder.CreateInBoundsGEP(T_int8, ptindex, offset);
ptindex = ctx.builder.CreateInBoundsGEP(T_int8, ptindex, idx);
Instruction *tindex = tbaa_decorate(tbaa_arrayselbyte, ctx.builder.CreateLoad(T_int8, ptindex));
tindex->setMetadata(LLVMContext::MD_range, MDNode::get(jl_LLVMContext, {
Expand Down Expand Up @@ -2606,9 +2613,15 @@ static bool emit_builtin_call(jl_codectx_t &ctx, jl_cgval_t *ret, jl_value_t *f,
jl_cgval_t rhs_union = convert_julia_type(ctx, val, ety);
Value *tindex = compute_tindex_unboxed(ctx, rhs_union, ety);
tindex = ctx.builder.CreateNUWSub(tindex, ConstantInt::get(T_int8, 1));
Value *selidx = emit_arraylen_prim(ctx, ary);
Value *ndims = (nd == -1 ? emit_arrayndims(ctx, ary) : ConstantInt::get(T_int16, nd));
Value *is_vector = ctx.builder.CreateICmpEQ(ndims, ConstantInt::get(T_int16, 1));
Value *offset = emit_arrayoffset(ctx, ary, nd);
Value *selidx_v = ctx.builder.CreateSub(emit_vectormaxsize(ctx, ary), ctx.builder.CreateZExt(offset, T_size));
Value *selidx_m = emit_arraylen(ctx, ary);
Value *selidx = ctx.builder.CreateSelect(is_vector, selidx_v, selidx_m);
Value *ptindex = ctx.builder.CreateInBoundsGEP(AT, data, selidx);
ptindex = emit_bitcast(ctx, ptindex, T_pint8);
ptindex = ctx.builder.CreateInBoundsGEP(T_int8, ptindex, offset);
ptindex = ctx.builder.CreateInBoundsGEP(T_int8, ptindex, idx);
tbaa_decorate(tbaa_arrayselbyte, ctx.builder.CreateStore(tindex, ptindex));
if (jl_is_datatype(val.typ) && jl_datatype_size(val.typ) == 0) {
Expand Down Expand Up @@ -2819,8 +2832,7 @@ static bool emit_builtin_call(jl_codectx_t &ctx, jl_cgval_t *ret, jl_value_t *f,
return true;
}
else if (jl_is_datatype(sty) && sty->name == jl_array_typename) {
jl_value_t *ary_ex = jl_exprarg(ex, 1);
auto len = emit_arraylen(ctx, obj, ary_ex);
auto len = emit_arraylen(ctx, obj);
jl_value_t *ety = jl_tparam0(sty);
Value *elsize;
size_t elsz = 0, al = 0;
Expand Down Expand Up @@ -6697,6 +6709,7 @@ static void init_julia_llvm_meta(void)
tbaa_arraysize = tbaa_make_child("jtbaa_arraysize", tbaa_array_scalar).first;
tbaa_arraylen = tbaa_make_child("jtbaa_arraylen", tbaa_array_scalar).first;
tbaa_arrayflags = tbaa_make_child("jtbaa_arrayflags", tbaa_array_scalar).first;
tbaa_arrayoffset = tbaa_make_child("jtbaa_arrayoffset", tbaa_array_scalar).first;
tbaa_const = tbaa_make_child("jtbaa_const", nullptr, true).first;
tbaa_arrayselbyte = tbaa_make_child("jtbaa_arrayselbyte", tbaa_array_scalar).first;
tbaa_unionselbyte = tbaa_make_child("jtbaa_unionselbyte", tbaa_data_scalar).first;
Expand Down Expand Up @@ -6818,13 +6831,12 @@ static void init_julia_llvm_env(Module *m)
#endif
, T_int16
, T_int16
, T_int32
};
static_assert(sizeof(jl_array_flags_t) == sizeof(int16_t),
"Size of jl_array_flags_t is not the same as int16_t");
jl_array_llvmt =
StructType::create(jl_LLVMContext,
ArrayRef<Type*>(vaelts,sizeof(vaelts)/sizeof(vaelts[0])),
"jl_array_t");
StructType::create(jl_LLVMContext, makeArrayRef(vaelts), "jl_array_t");
jl_parray_llvmt = PointerType::get(jl_array_llvmt, 0);

global_to_llvm("__stack_chk_guard", (void*)&__stack_chk_guard, m);
Expand Down
8 changes: 4 additions & 4 deletions src/dump.c
Original file line number Diff line number Diff line change
Expand Up @@ -649,9 +649,9 @@ static void jl_serialize_value_(jl_serializer_state *s, jl_value_t *v, int as_li
jl_serialize_value(s, jl_box_long(jl_array_dim(ar,i)));
jl_serialize_value(s, jl_typeof(ar));
if (!ar->flags.ptrarray) {
size_t extra = jl_is_uniontype(jl_tparam0(jl_typeof(ar))) ? jl_array_len(ar) : 0;
size_t tot = jl_array_len(ar) * ar->elsize + extra;
ios_write(s->s, (char*)jl_array_data(ar), tot);
ios_write(s->s, (char*)jl_array_data(ar), jl_array_len(ar) * ar->elsize);
if (jl_array_isbitsunion(ar))
ios_write(s->s, jl_array_typetagdata(ar), jl_array_len(ar));
}
else {
for (i = 0; i < jl_array_len(ar); i++) {
Expand Down Expand Up @@ -1508,7 +1508,7 @@ static jl_value_t *jl_deserialize_value_array(jl_serializer_state *s, uint8_t ta
jl_value_t *aty = jl_deserialize_value(s, &jl_astaggedvalue(a)->type);
jl_set_typeof(a, aty);
if (!a->flags.ptrarray) {
size_t extra = jl_is_uniontype(jl_tparam0(aty)) ? jl_array_len(a) : 0;
size_t extra = jl_array_isbitsunion(a) ? jl_array_len(a) : 0;
size_t tot = jl_array_len(a) * a->elsize + extra;
ios_read(s->s, (char*)jl_array_data(a), tot);
}
Expand Down
7 changes: 4 additions & 3 deletions src/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -851,11 +851,12 @@ void jl_gc_reset_alloc_count(void)
static size_t array_nbytes(jl_array_t *a)
{
size_t sz = 0;
if (jl_array_ndims(a)==1)
sz = a->elsize * a->maxsize + (a->elsize == 1 ? 1 : 0);
int isbitsunion = jl_array_isbitsunion(a);
if (jl_array_ndims(a) == 1)
sz = a->elsize * a->maxsize + ((a->elsize == 1 && !isbitsunion) ? 1 : 0);
else
sz = a->elsize * jl_array_len(a);
if (!a->flags.ptrarray && jl_is_uniontype(jl_tparam0(jl_typeof(a))))
if (isbitsunion)
// account for isbits Union array selector bytes
sz += jl_array_len(a);
return sz;
Expand Down
2 changes: 1 addition & 1 deletion src/intrinsics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -932,7 +932,7 @@ static jl_cgval_t emit_intrinsic(jl_codectx_t &ctx, intrinsic f, jl_value_t **ar

switch (f) {
case arraylen:
return mark_julia_type(ctx, emit_arraylen(ctx, argv[0], args[1]), false, jl_long_type);
return mark_julia_type(ctx, emit_arraylen(ctx, argv[0]), false, jl_long_type);
case pointerref:
return emit_pointerref(ctx, argv);
case pointerset:
Expand Down
3 changes: 3 additions & 0 deletions src/julia.h
Original file line number Diff line number Diff line change
Expand Up @@ -778,6 +778,8 @@ JL_DLLEXPORT size_t jl_array_len_(jl_array_t *a);
#define jl_array_data_owner_offset(ndims) (offsetof(jl_array_t,ncols) + sizeof(size_t)*(1+jl_array_ndimwords(ndims))) // in bytes
#define jl_array_data_owner(a) (*((jl_value_t**)((char*)a + jl_array_data_owner_offset(jl_array_ndims(a)))))

JL_DLLEXPORT char *jl_array_typetagdata(jl_array_t *a);

STATIC_INLINE jl_value_t *jl_array_ptr_ref(void *a, size_t i) JL_NOTSAFEPOINT
{
assert(i < jl_array_len(a));
Expand Down Expand Up @@ -964,6 +966,7 @@ static inline int jl_is_layout_opaque(const jl_datatype_layout_t *l) JL_NOTSAFEP
#define jl_is_cpointer(v) jl_is_cpointer_type(jl_typeof(v))
#define jl_is_pointer(v) jl_is_cpointer_type(jl_typeof(v))
#define jl_is_intrinsic(v) jl_typeis(v,jl_intrinsic_type)
#define jl_array_isbitsunion(a) (!(((jl_array_t*)(a))->flags.ptrarray) && jl_is_uniontype(jl_tparam0(jl_typeof(a))))

JL_DLLEXPORT int jl_subtype(jl_value_t *a, jl_value_t *b);

Expand Down
8 changes: 5 additions & 3 deletions src/staticdata.c
Original file line number Diff line number Diff line change
Expand Up @@ -566,8 +566,7 @@ static void jl_write_values(jl_serializer_state *s)
// make some header modifications in-place
jl_array_t *newa = (jl_array_t*)&s->s->buf[reloc_offset];
size_t alen = jl_array_len(ar);
size_t extra = (!ar->flags.ptrarray && jl_is_uniontype(jl_tparam0(jl_typeof(ar)))) ? alen : 0;
size_t tot = alen * ar->elsize + extra;
size_t tot = alen * ar->elsize;
if (newa->flags.ndims == 1)
newa->maxsize = alen;
newa->offset = 0;
Expand All @@ -586,9 +585,12 @@ static void jl_write_values(jl_serializer_state *s)
assert(data < ((uintptr_t)1 << RELOC_TAG_OFFSET) && "offset to constant data too large");
arraylist_push(&s->relocs_list, (void*)(reloc_offset + offsetof(jl_array_t, data))); // relocation location
arraylist_push(&s->relocs_list, (void*)(((uintptr_t)ConstDataRef << RELOC_TAG_OFFSET) + data)); // relocation target
if (ar->elsize == 1)
int isbitsunion = jl_array_isbitsunion(ar);
if (ar->elsize == 1 && !isbitsunion)
tot += 1;
ios_write(s->const_data, (char*)jl_array_data(ar), tot);
if (isbitsunion)
ios_write(s->const_data, jl_array_typetagdata(ar), alen);
}
else {
newa->data = (void*)tsz; // relocation offset
Expand Down
Loading

0 comments on commit 545e1d3

Please sign in to comment.