Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 #5417

Merged
merged 11 commits into from
Apr 30, 2020

Conversation

areusch
Copy link
Contributor

@areusch areusch commented Apr 23, 2020

This PR contains changes that implement the prototype of microTVM according to the RFC. It can be tested using the scripts in this demo repository. Additional documentation and changes will follow with finer granularity, but would like to get feedback on the implementation thus far.

@tqchen
Copy link
Member

tqchen commented Apr 23, 2020

cc @tmoreau89 @liangfu @weberlo @u99127

@tqchen tqchen changed the title micro TVM prototype [RUNTIME][uTVM] Arm Cortext-M support Apr 23, 2020
@tqchen tqchen changed the title [RUNTIME][uTVM] Arm Cortext-M support [RUNTIME][uTVM] AutoTVM + uTVM Apr 23, 2020
@tqchen tqchen changed the title [RUNTIME][uTVM] AutoTVM + uTVM [RUNTIME][uTVM] AutoTVM + uTVM for Cortext-M Apr 23, 2020
@tmoreau89
Copy link
Contributor

cc @u99127

@areusch areusch force-pushed the areusch/utvm-merge branch 2 times, most recently from 54b9410 to c4dfa29 Compare April 23, 2020 23:13
@areusch areusch changed the title [RUNTIME][uTVM] AutoTVM + uTVM for Cortext-M [RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 Apr 23, 2020
python/tvm/autotvm/tuner/tuner.py Outdated Show resolved Hide resolved
Comment on lines 244 to 255
gdb_init_dir = os.environ.get('MICRO_GDB_INIT_DIR')
if gdb_init_dir is not None:
gdb_init_path = f'{gdb_init_dir}/.gdbinit'
with open(gdb_init_path, 'r') as f:
gdbinit_contents = f.read().split('\n')
new_contents = []
for line in gdbinit_contents:
new_contents.append(line)
if line.startswith('target'):
new_contents.append(f'add-symbol-file {rel_obj_path}')
with open(gdb_init_path, 'w') as f:
f.write('\n'.join(new_contents))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth splitting these lines into a separate µTVM debugging tools PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's also going to change soon, so would prefer to fix then

python/tvm/contrib/debugger/debug_runtime.py Outdated Show resolved Hide resolved
Comment on lines 155 to 176
def _calc_max_workspace_usage(src):
# TODO factor in alignment to the calculation (alloc sizes will be aligned up to the word size)
alloc_re = re.compile(
r'.*\* ?(.+) = (\(.+\))? TVMBackendAllocWorkspace\(.+, .+, \(uint64_t\)(.+), .+, .+\).*')
free_re = re.compile(r'.*if \(TVMBackendFreeWorkspace\(.+, .+, (\(void\*\))? (.+)\) != 0\) {.*')
max_usage = 0
alloc_map = {}
for line in src.split('\n'):
if line.strip().startswith('//'):
continue
match = alloc_re.match(line)
if match is not None:
alloc_map[match.group(1)] = int(match.group(3))
max_usage = max(max_usage, sum(alloc_map.values()))
else:
match = free_re.match(line)
if match is not None:
print(alloc_map)
del alloc_map[match.group(2)]
return max_usage


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for sure a hacky way to calculate the memory footprint of workspace allocations.
in a followup PR, we should move this calculation further upstream and instead use a visitor to find workspace allocs in the AST.
in the meantime, let's just make sure it doesn't ever crash when src doesn't match the format expected by the regexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah more robust memory analysis will come in a follow-on. can you explain what you mean by crash if it doesn't match the format given? I don't know it will crash if it finds 0 allocs, it will just not catch anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for posterity, i think it would crash if free_re matches, but alloc_re doesn't, so it would attempt to delete an entry in the alloc_map that isn't there. C that's coming from codegen should never have this problem, but I remember running into it when writing wrappers for CMSIS-NN by hand.

else:
options = list(options)
# Cannot increase optimization level on host due to code loading method.
options.append('-O0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Os (and maybe -O1) work. it's just -O2 that's been causing problems on the host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just leave as is for now--the runtime will change a lot soon

Comment on lines +632 to +644
// TODO(weberlo): add a `clear_batch_timer` func
} else if (name == "get_last_batch_time") {
return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
*rv = this->GetLastBatchTime();
});
// TODO(weberlo): remove this func
} else if (name == "get_last_batch_cycles") {
return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
*rv = this->GetLastBatchCycles();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should rename these functions to GetLastBatchHostTime and GetLastBatchDevTime. I think having both would be of use, for example, if a user wants to verify their device timer impl with host timings, or if a device doesn't have a timer (i think this case is rare tho).

we may also want to rethink the timing API, because resetting the batch time to 0 when GetLastBatchTime is called isn't very user-friendly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now it returns either last host or last device time based on use_device_timer. I agree we should rethink the timing API, but perhaps we can move it to next PR, when I would want to implement an API between the host and device, and we can better define the concept of batch time (and the units it is logged in) there?

@@ -210,9 +210,9 @@ class OpenOCDLowLevelDevice final : public LowLevelDevice {
// NOTE: OpenOCD will call any request larger than this constant an "absurd
// request".
/*! \brief maximum number of bytes allowed in a single memory transfer */
static const constexpr ssize_t kMemTransferLimit = 64000;
static const constexpr ssize_t kMemTransferLimit = 8000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious what openocd version you're running, because it seems like the standard for an "absurd request" is 64k (line 4274)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember anymore exactly where this limit hits, but iirc it's due to mac os x pipe buffering. I think it's because we are reading the pipe line by line on the TVM side, but if you issue a memory transfer that prints more than ~24k of characters, the os pipe buffer fills up before the newline char is sent and we deadlock. updated comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. we might want some preprocessor magic to detect if the platform is linux or mac and set this constant accordingly, because leaving it at 8k means linux is issuing 8 times more requests than needs to.

Comment on lines +28 to +31
# # Use the host emulated micro device.
DEV_CONFIG_A = micro.device.host.generate_config()
DEV_CONFIG_B = micro.device.host.generate_config()
TARGET = 'c -device=micro_dev'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be re-collapsed into a single DEV_CONFIG. I separated them for prototyping, because you need separate server ports for physical devices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still have test_interleave_sessions for now though?

topi/python/topi/arm_cpu/conv2d_spatial_pack.py Outdated Show resolved Hide resolved
topi/python/topi/arm_cpu/cortex_m7/micro_kernel/gemm.py Outdated Show resolved Hide resolved
Copy link
Member

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! @areusch Thank you very much for your contribution. I left some review comments, mostly regard to the coding styles.

Makefile Outdated Show resolved Hide resolved
'-DARM_MATH_CM7',
'-D__FPU_PRESENT=1U',
'-DARM_MATH_DSP',
'-Wno-unused-variable',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Why can't we remove the unused variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are in the generated GEMM code. for example:

/var/folders/9y/3j808g591ln3kys4qpyl3qmc0000gn/T/tmpb4sk1ylf/temp.c:104:11: error: unused variable \'arg1_code\' [-Werror=unused-variable]

it would be nice if we did not need to include this, but i'm hoping we can merge this PR as-is and incrementally improve things like this (especially since I didn't author most of the code generation stuff and will need a bit of time to understand how to improve it to remove errors like this). the next PR will look at the generated code more in detail, so if it's a quick fix, I can fix it then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm.. it seems to be a problem in codegen, please feel free to leave this line as-is.

"-nostdlib",
"-fdata-sections",
"-ffunction-sections",
f'{toolchain_prefix}gcc',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo the changes in converting double quotes to single quotes, and please undo other similar changes as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undid all the quote conversions, except those that made sense (i.e. to avoid escapes, as in f'unknown variable "{var}"'). can you point out any other changes you want me to revert? I didn't originally author most of this code so I don't have all the context in my head.

python/tvm/micro/device/base.py Outdated Show resolved Hide resolved
python/tvm/micro/device/host.py Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Outdated Show resolved Hide resolved
src/target/source/codegen_c_host.cc Outdated Show resolved Hide resolved
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# pylint: disable=invalid-name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this at the specific line of code, instead of leaving this on the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this placement seems to be pretty conventional for topi, since most of the variable names are too short to be considered valid?

topi/python/topi/arm_cpu/cortex_m7/conv2d/direct.py Outdated Show resolved Hide resolved
Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @areusch for the great work, that is quite a large PR! I made a few comments that are mostly costmetic. I suggest that we keep changes to autoTVM minimal, and that instead we make the non uTVM changes as part of an isolated RFC+PR combo if needed.

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @areusch for the great work, that is quite a large PR! I made a few comments that are mostly costmetic. I suggest that we keep changes to autoTVM minimal, and that instead we make the non uTVM changes as part of an isolated RFC+PR combo if needed.

Yes please.

I've only done a quick scan and I promise a more detailed review when this gets split up further.

Ramana

python/tvm/autotvm/measure/local_executor.py Outdated Show resolved Hide resolved
python/tvm/micro/device/arm/stm32f746xx.py Outdated Show resolved Hide resolved
@tmoreau89
Copy link
Contributor

@u99127 the pieces that changed the behavior of autoTVM are a small fraction of the overall PR; so post-changes, the PR will remain significantly large / unchanged.

Let us know if you'd like to be able to do a review before it gets merged!

@tqchen
Copy link
Member

tqchen commented Apr 29, 2020

Copy link
Member

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the missing comments, LGTM.

lib_headers: TODO
e.g., `['cmsis_gcc.h', 'arm_math.h']`

lib_include_paths: TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a meaningful comment here.

@@ -36,23 +55,40 @@ def create_micro_lib(obj_path, src_path, lib_type, options=None):

options : Optional[List[str]]
additional options to pass to GCC

lib_src_paths : Optional[List[str]]
TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a meaningful comment here as well.

@@ -62,56 +78,31 @@ def default_config(base_addr, server_addr, server_port):
server_port : int
port of OpenOCD server to connect to

TODO correct type annotation?
section_constraints: Optional[Dict[str, Tuple[Number, MemConstraint]]]
TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave a meaningful comment

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @areusch for addressing the comments/reviews. the PR is approved

@tmoreau89 tmoreau89 merged commit 8d72496 into apache:master Apr 30, 2020
@tmoreau89
Copy link
Contributor

Thanks @areusch , @liangfu @weberlo @u99127 the PR has been merged

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 9, 2020
* Prototype for micro TVM.

* Cleanup and sync micro tvm prototype.

* Use /std:c++14 with MSVC.

 * Per tqchen: project has already moved to C++14
 * Presubmit failed for code that built locally on gcc.

* fix ASF lint, and fix add_asf_header too

* Compiles with USE_MICRO=OFF.

* Cleanup TargetPtr and word size representations.

* fix compile warning

* address logan's comments

* address logan and liangfu comments

* address thierry's comments

* address u99127, liangfu, tmoreau89 comments

Co-authored-by: Logan Weber <[email protected]>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 18, 2020
* Prototype for micro TVM.

* Cleanup and sync micro tvm prototype.

* Use /std:c++14 with MSVC.

 * Per tqchen: project has already moved to C++14
 * Presubmit failed for code that built locally on gcc.

* fix ASF lint, and fix add_asf_header too

* Compiles with USE_MICRO=OFF.

* Cleanup TargetPtr and word size representations.

* fix compile warning

* address logan's comments

* address logan and liangfu comments

* address thierry's comments

* address u99127, liangfu, tmoreau89 comments

Co-authored-by: Logan Weber <[email protected]>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 18, 2020
* Prototype for micro TVM.

* Cleanup and sync micro tvm prototype.

* Use /std:c++14 with MSVC.

 * Per tqchen: project has already moved to C++14
 * Presubmit failed for code that built locally on gcc.

* fix ASF lint, and fix add_asf_header too

* Compiles with USE_MICRO=OFF.

* Cleanup TargetPtr and word size representations.

* fix compile warning

* address logan's comments

* address logan and liangfu comments

* address thierry's comments

* address u99127, liangfu, tmoreau89 comments

Co-authored-by: Logan Weber <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants