Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 #5417

Merged
merged 11 commits into from
Apr 30, 2020
Prev Previous commit
Next Next commit
address logan's comments
  • Loading branch information
areusch committed Apr 27, 2020
commit d351b9c0097cb8a6d8461adae41bebdb09bab773
4 changes: 2 additions & 2 deletions src/runtime/micro/openocd_low_level_device.cc
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,8 @@ class OpenOCDLowLevelDevice final : public LowLevelDevice {

/*! \brief number of bytes in a word on the target device (64-bit) */
static const constexpr ssize_t kWordSize = 8;
// NOTE: OpenOCD will call any request larger than this constant an "absurd
// request".
// NOTE: The OS pipe buffer must be able to handle a line long enough to
// print this transfer request.
/*! \brief maximum number of bytes allowed in a single memory transfer */
static const constexpr ssize_t kMemTransferLimit = 8000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious what openocd version you're running, because it seems like the standard for an "absurd request" is 64k (line 4274)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember anymore exactly where this limit hits, but iirc it's due to mac os x pipe buffering. I think it's because we are reading the pipe line by line on the TVM side, but if you issue a memory transfer that prints more than ~24k of characters, the os pipe buffer fills up before the newline char is sent and we deadlock. updated comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. we might want some preprocessor magic to detect if the platform is linux or mac and set this constant accordingly, because leaving it at 8k means linux is issuing 8 times more requests than needs to.

/*! \brief number of milliseconds to wait for function execution to halt */
Expand Down
2 changes: 1 addition & 1 deletion topi/python/topi/arm_cpu/conv2d_spatial_pack.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def schedule_conv2d_spatial_pack_nchw(cfg, s, data_vec, kernel_vec,
axis_lens=[cfg['tile_oh'].size[-1],
cfg['tile_ow'].size[-1],
cfg['tile_co'].size[-1]],
max_unroll=None,
max_unroll=16,
cfg=cfg)
s[conv].compute_at(s[last], ow)

Expand Down
6 changes: 3 additions & 3 deletions topi/python/topi/arm_cpu/cortex_m7/micro_kernel/gemm.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def intrin_gemm_MxKxN(M, K, N, in_dtype, out_dtype):
# collisions in the generated source (e.g., if there are multiple operators
# in the same module that use the same intrinsic)
#
# TODO to cut down on memory usage, we should cache each intrinsic
# TODO(areusch): to cut down on memory usage, we should cache each intrinsic
# instantiation and include it only once, eliminating the need for unique
# IDs
UNIQ_ID_LEN = 8
Expand All @@ -47,7 +47,7 @@ def intrin_gemm_MxKxN(M, K, N, in_dtype, out_dtype):
if isinstance(N, tvm.tir.IntImm):
N = N.value
assert K % 4 == 0
# TODO support more dtypes?
# TODO(areusch): support more dtypes?
assert in_dtype == 'int8'
assert out_dtype == 'int32'
A = te.placeholder((M, K), name='a', dtype=in_dtype)
Expand Down Expand Up @@ -124,7 +124,7 @@ def _body():

def gemm_MxKxN_impl(M, K, N, uniq_id):
"""Emit C code for gemm impl."""
# TODO are there any SIMD tricks to zero out arrays quickly?
# TODO(areusch): are there any SIMD tricks to zero out arrays quickly?
aa_pad_size = M * K
bb_pad_size = N * K
# code reference: CMSIS-NN paper (https://arxiv.org/abs/1801.06601)
Expand Down