Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to apply the "3-step flow" using Questa #899

Open
jfrensch opened this issue Feb 16, 2023 · 66 comments
Open

How to apply the "3-step flow" using Questa #899

jfrensch opened this issue Feb 16, 2023 · 66 comments

Comments

@jfrensch
Copy link

Hi all,

my understanding is, that the VUnit framework (always?) uses the "2-step flow" (-> vcom & vsim) of Questa, where the 'vopt' step is automatically applied during 'vsim'.
Unfortunaly, I need some intermediate result of the vopt step to be able to use the 'Visuallizer' to analyze the results of a simulation:

  "vopt -debug -designfile design.bin -o tb_opt tb" 

not only provides an optimized design 'tb_opt' of the original testbench 'tb' for faster simulation, but also a database (-> design.bin) required by the 'Visualizer' to correlate the simulation results from 'tb_opt' to the design being simulated.

How can/should I apply that 3rd step (vopt) within the VUnit framework?

Many thanks for advice
Jochen

@tasgomes
Copy link

tasgomes commented Aug 2, 2023

This 3-step flow would also solve this issue #877

@LarsAsplund
Copy link
Collaborator

Good news today is that Siemens decided to support VUnit with Questa licenses such that issues like this be solved.

LarsAsplund added a commit that referenced this issue Jun 27, 2024
Known limitation is that parallel threads aren't supported.
@LarsAsplund
Copy link
Collaborator

I started to prototype on this and there is a first iteration to try out where I simply run vopt before calling vsim. I'm only using a few options in the vopt call, most importantly is probably -floatgenerics which is applied recursively on the simulation top-level. The runner_cfg generic needs to be floating since it has no default value. Is there a need to control that more selectively to enable optimization for any custom generics added to the testbench?

In this iteration, optimization is not a proper step before the simulation step. This means that vopt is called before starting the simulation of each test in a testbench. This is a problem as noted in #877.

# ** Note: (vsim-3812) Design is being optimized...
# ** Warning: (vopt-6) -- Waiting for lock by "larsa@LAPTOP-B6KUINVE, pid = 25780
# ". Lockfile is "C:/github/vunit/examples/vhdl/three_step_flow/vunit_out/modelsim/libraries/lib/_lock".
# ** Error: (vopt-2261) 'lib.opt_libtbexampletb' is already an optimized design.
# Optimization failed

This error is not suppressible so this doesn't help (I've added support for setting vopt options, both in batch mode and in gui mode)

vu.set_sim_option("modelsim.vopt_flags", ["-suppress", "2261"])

If I run the tests after one another this doesn't seem to be a concern. vopt is simply skipped if the design is already optimized:

# Incremental compilation check found no design-units have changed.

Making optimization a proper step executed before starting the simulation step is probably the solution to this.

Until then, please give it a try for your other use cases.

@LarsAsplund
Copy link
Collaborator

Btw, there is a small example that you can start playing with in https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow

@xkvkraoSICKAG
Copy link

xkvkraoSICKAG commented Aug 16, 2024

I tried the three-step-flow branch and it will cause errors when running with multiple threads. The vopt call will need to be thread safe, you cannot run vopt in parallel with the same output destination apparently.

It can be solved either by:

  1. Adding a thread unique suffix to each vopt output destination
  2. Calling vopt directly from Python protected by a mutex/lock. There would be dictionary of locks with the top level name as the key so that vopt of a given top level test bench is only done once and not in parallel with simulation.

The benefit of 2 is that it only runs vopt once for a top level and not for every simulation which could save time.

@xkvkraoSICKAG
Copy link

xkvkraoSICKAG commented Aug 16, 2024

I have made a prototype of solution 1. described above and it works without problems in multi threading. This is really the simplest solution to get it working as separating the vopt step and using a Python-lock on it would require a lot of restructuring of modelsim.py vs vsim_simulator_mixin.py.

PS: Apparently it still has a probability to fail when using multi-threading. It seems even when using unique vopt artifacts per thread the common files in the library is also mutated by Questa.

@xkvkraoSICKAG
Copy link

Note also that I found problems with the floatgenerics argument. It caused vopt to hang indefinitely on some of my test benches. By removing the dot from the end of -floatgenerics+top. to -floatgenerics+top it no longer hanged.

My understanding from the manual is that the trailing dot causes the generics of all lower instances to also be floating. For the purpose of VUnit it should be enough that the top level test bench generics are floating as changing the generic of a deeper instance is not required or supported. I would assume a floating top level generic coupled to a lower instance generic would also cause it to be floating anyway without the trailing dot.

@xkvkraoSICKAG
Copy link

xkvkraoSICKAG commented Aug 16, 2024

It seems running vopt will mutate the common files in the library folder even if multiple threads use different vopt output targets. I have verified this by diffing all md5sum of all files in the library folder before and after running vopt. However just running vsim on an already created vopt folder does not seem to change any md5sum at all.

This makes me think a solution needs to ensure all vopt calls for a single library needs to happen before any simulation starts.
Even between two test benches within the same library the vopt calls cannot run in parallel it seems.

PS: Another alternativ would be to just duplicate the library folders with one copy per thread. That would avoid any potential concurrency problem within the simulator itself.

@LarsAsplund
Copy link
Collaborator

@xkvkraoSICKAG Thanks for trying this out.

Yes, adding a thread suffix to the name of the optimized design would be a nice solution. It works in my simple example but I also see that the library files are modified.

As I see it, duplication is the only option. I will give it a try. I will also consult Siemens to get these observations verified.

The reason for using floating generics on all levels is that there are use cases where the runner_cfg are passed to a lower-level entity. However, that is a less common use case that we can ignore in the first iterations.

@LarsAsplund
Copy link
Collaborator

@xkvkraoSICKAG Different directories for each thread is something that was already implemented in another feature branch so there is code to reuse from that branch (which was dropped when we realized that there were other ways to solve that feature).

Library duplication will have to be added and I've a discussion ongoing with Siemens to figure out to what extent that is needed. That will be an overhead that we obviously want to minimize so it will probably only be activated with the 3-step flow. The 2-step flow will work as before.

@xkvkraoSICKAG
Copy link

xkvkraoSICKAG commented Aug 20, 2024

I have another observation to report. I tried changing the library folder format from vlib -type flat which is the default to vlib -type directory. This annoyingly almost became thread safe but unfortunately Questa still makes a tiny modification to the _info file in the library folder when running vopt. It seems to greatly reduce the probability of a simulation error due to parallel file modifications in the library though.

Regarding library duplication. To reduce the overhead it could maybe use a smart approach based on https://docs.python.org/3/library/filecmp.html to only copy over what has changed.

Another approach would be to run vopt as part of the compile step of VUnit which is already running in single threaded mode.
This would involve refactoring to also extract more information from the dependency scanner to be able to incrementally know which top level test benches need to be re-optimized.

@LarsAsplund
Copy link
Collaborator

I also tried with -type directory and found that the libs are modified if I'm running tests on the same testbench that differ in the top-level generics. It looks like there are modifications to take into account beyond the small _info change. I will start with the assumption that everything needs to be copied. That can always be improved if Siemens can provide some valuable insights into the details of the vopt behavior.

xkvkraoSICKAG pushed a commit to xkvkraoSICKAG/vunit that referenced this issue Aug 21, 2024
Known limitation is that parallel threads aren't supported.
@xkvkraoSICKAG
Copy link

I pushed my local changes I used to test to a fork:
https://github.com/xkvkraoSICKAG/vunit/commits/three-step-flow2/

I think this commit may be of interest, it ensures the library mapping arguments are deterministic between calls to vcom/vlog and vopt. Before this change they were subject to the random iteration order of dictionary keys:
xkvkraoSICKAG@be87271

@LarsAsplund
Copy link
Collaborator

@xkvkraoSICKAG Can you make a pull request to the VUnit repo?

@xkvkraoSICKAG
Copy link

@LarsAsplund Yes if you can rebase the three-step-flow branch on VUnit master I can make a PR to three-step-flow.

@LarsAsplund
Copy link
Collaborator

I started to build on a solution where the first test running a testbench performs the optmization which is then used by the other tests. The optimization is still part of the simulation step so the second test has to wait for both the optimization and simulation of the first test to complete before it can proceed. That will be fixed later but I did see something that needs more investigation

Below is a debug log from a testbench with an single test that has 5 different configurations. I'm using two threads for my test run.

The first test run gets to optimize the testbench:

2024-08-31 19:20:04,426 - DEBUG - (lib.tb_example.0.test) Optimizing lib.tb_example(tb)

Since the second test starts simultaneously in another thread, it blocks while waiting for the first test:

2024-08-31 19:20:04,426 - DEBUG - (lib.tb_example.1.test) Waiting for lib.tb_example(tb) to be optimized.
2024-08-31 19:20:04,429 - DEBUG - Starting lib.tb_example.0.test simulation
2024-08-31 19:20:07,808 - DEBUG - lib.tb_example.0.test simulation completed
2024-08-31 19:20:07,808 - DEBUG - lib.tb_example(tb) optimization completed

Now the second test case can proceed:

2024-08-31 19:20:07,811 - DEBUG - Starting lib.tb_example.1.test simulation

With the first test completed, there is one simulation thread available and the third test
can start. At this point there is no need to wait for the optimized testbench:

2024-08-31 19:20:07,826 - DEBUG - (lib.tb_example.2.test) Reusing optimized lib.tb_example(tb)
2024-08-31 19:20:07,829 - DEBUG - Starting lib.tb_example.2.test simulation

For every test completed, a new one can start:

2024-08-31 19:20:10,670 - DEBUG - lib.tb_example.1.test simulation completed
2024-08-31 19:20:10,687 - DEBUG - (lib.tb_example.3.test) Reusing optimized lib.tb_example(tb)
2024-08-31 19:20:10,691 - DEBUG - Starting lib.tb_example.3.test simulation
2024-08-31 19:20:13,107 - DEBUG - lib.tb_example.3.test simulation completed
2024-08-31 19:20:13,127 - DEBUG - (lib.tb_example.4.test) Reusing optimized lib.tb_example(tb)
2024-08-31 19:20:13,130 - DEBUG - Starting lib.tb_example.4.test simulation
2024-08-31 19:20:15,640 - DEBUG - lib.tb_example.4.test simulation completed

But what happened to the third test case? It takes forever to complete and is overtaken by the tests starting after it.

2024-08-31 19:20:27,080 - DEBUG - lib.tb_example.2.test simulation completed

Regardless how many configurations I create of the test, there is always one which completes much
later than the others. That is something I have to investigate further.

@LarsAsplund
Copy link
Collaborator

It should be said that a single-thread test run works as expected. The first test takes a bit longer to run since it's doing the optimization:

image

Some test runs with two threads work well. In this case, the second test also takes some extra time since it's waiting for the first to complete. After that everything runs smoothly:

image

This is what a bad run with two threads looks like:

image

@LarsAsplund
Copy link
Collaborator

What I see is that it is the simulation process that takes time and the problem is intermittent. This is the execution time for 500 configurations of the same test:

image

Considering that I once got this message, I'm suspecting this has to do with the license server. In my case it sits on my computer so there is no network delay.

image

After trying for 30 seconds it simply fails. I will cleanup my code so that you can test on your computers.

@xkvkraoSICKAG
Copy link

I started to build on a solution where the first test running a testbench performs the optmization which is then used by the other tests. The optimization is still part of the simulation step so the second test has to wait for both the optimization and simulation of the first test to complete before it can proceed.

Based on my investigations running vopt on any design in a top level will mutate the library_folder/_info file. So running vopt in a library has to lock the entire library. Thus care has to be taken if there are several test benches in the same library, they cannot have vopt run in parallel.

@LarsAsplund
Copy link
Collaborator

Agree, there will be multiple conditions for when vsim and vopt can be run. vsim waits for the testbench to be optimized if it hasn't already and vopt waits for the lib to be available. I hope that a second vopt on a library doesn't invalidate previous vopts on that library just because of the altered _info file.

@xkvkraoSICKAG
Copy link

Agree, there will be multiple conditions for when vsim and vopt can be run. vsim waits for the testbench to be optimized if it hasn't already and vopt waits for the lib to be available. I hope that a second vopt on a library doesn't invalidate previous vopts on that library just because of the altered _info file.

Unfortunately I think the altered _info file does cause problems. I am running our company internal simulations on my three-step-flow branch. On this branch only the _info file is mutated during vopt and still it causes test cases to fail with a low probability. To mitigate this every test bench in the same library must be sequentially vopt:ed which kind of defeats the common vunit style of having multiple test benches in the same library.

@LarsAsplund
Copy link
Collaborator

I was thinking about the case where you vopt every testbench sequentially. If you vopt A and then B, will you then have to vopt A again before running just because _info changed? Even if the designs didn't change?

The vopt lock would be a problem if you have testbenches without test cases. They would run in series. If you have test cases, only the first test's vopt will run on it's own. All the vsims will run in parallel. If scheduling is optimised, the vopt for the next testbench could run in parallel with the vsims of the previous.

@xkvkraoSICKAG
Copy link

I was thinking about the case where you vopt every testbench sequentially.

Yes the problem is the _info mutation forces you to run vopt sequentially for all testbenches within a library before starting any simulation. This becomes a problem if you have a lot of test benches within the same library.

@LarsAsplund
Copy link
Collaborator

@xkvkraoSICKAG Ok, so what I have now is a prototype that manages locks for the libraries as well. There are corners which I have yet to handle but it should be useful for testing in some different projects. I've tested it for myself with dummy testbenches and two threads and also with a client that has a single license. What I found was that using two threads improved performance even if there was only one license. Not sure why but maybe vopt is allowed to run concurrently with vsim on a single licens.

Running vopt on one testbench in one thread while running vsim on a testbench already optimized in another thread hasn't caused any problems for me. This is when I run with two licenses. Running two vopts at the same time on different libraries also works.

If you have a library with 10 testbenches and simulation time is much longer than optimization time you will eventually have 10 simulations running concurrently, provided you have that many licenses. The problem is if simulation time is relatively small compared to the optimization time. But is optimization needed in those cases?

I will push what I have tomorrow so you can try it.

@tasgomes
Copy link

@LarsAsplund

A good run for your reference:
good_5.log

A bad run with -novopt:
error_5a.log

A bad run without -novopt:
error_5b.log

@LarsAsplund
Copy link
Collaborator

@tasgomes Ok, now I see it. There is a bug in my code which causes a race condition. Please try the latest push.

@LarsAsplund
Copy link
Collaborator

I also started to remove the things I added lately before finding what I think was the real bug. I suggest testing the last three commits one at a time to see if removing everything was too optimistic

@LarsAsplund
Copy link
Collaborator

@tasgomes @xkvkraoSICKAG Have any of you had the chance to test the latest commit with your projects?

@tasgomes
Copy link

@LarsAsplund I am out of office this week. I can retry this again next Monday.

@tasgomes
Copy link

@LarsAsplund I am back. Below you find the last three commits one at a time. I ran each several times without problems, except for the last one. This one has an issue that occurs only sometimes.

00_3a976cd.log
01_28743da.log
02_7032b5e.log

@LarsAsplund
Copy link
Collaborator

@tasgomes Thanks @tasgomes. I suspect there can be a slight delay before files owned by one vopt call (lock files or other files) are properly released on the file system. A second vopt call, made after the first one returns, may run into that.

However, in this case vopt fails on the first call on lib2. The only previous call was on lib1. There should not be any prior activity on lib2 files unless:

  1. You had a previous run that crashed somehow such that some files were left open and/or lock files were not deleted
  2. Our assumption that vopt on one library doesn't affect other libraries was wrong. In this case the examples are standalone but what would happen if a testbench in lib1 uses a component from lib2.

I'll reach out to Siemens for some more support. As long as we are guessing how it works, we cannot be sure we have a stable solution

@tasgomes
Copy link

@LarsAsplund I restarted my laptop to make sure everything is clean. Then I executed the test twice. The first time was successful, but the second time fails:

03_7032b5e.log

Could it also be that the previous run did not close or delete the lock files properly?

@SzymonHitachi
Copy link

SzymonHitachi commented Sep 25, 2024

Is there any PR for those changes to look at? Especially with the fact that questasim introduced qrun command that wraps all the other commands in one step instead of 3-step build https://www.linkedin.com/pulse/improve-your-compilation-flow-questasim-mikael-andersson/

And second question: What about questa visaliser offline debug support? Did anyone try running that with VUnit?

@LarsAsplund
Copy link
Collaborator

@SzymonHitachi You can find the work in the https://github.com/VUnit/vunit/tree/three-step-flow branch. There is a simple testbench https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow which we use to get a first proof of concept. It works for me but not for @tasgomes and I'm assuming we have some race conditions.

I'm aware of qrun but initially I want to follow a flow that is more generic and applies to all simulators we support. That makes our code cleaner. It should be possible to do what is done in qrun though. Apart from making Questa run in a multi-threaded setup, we are also planning to add compile groups which allow for compiling files in groups. I know that Mikael prefers that approach to compilation but there are more use cases we need to support so we need to keep our design flexible. I got an email from Mikael today (no coincidence I assume 😄). I'm hoping he will join the discussion here as he knows Questa under the hood.

@LarsAsplund
Copy link
Collaborator

What I think we need from Siemens at this point is:

  1. Under what circumstances can two vopt runs be considered independent such that they can run concurrently.
  2. At what point can VUnit assume that vopt has "released" all files.

@SzymonHitachi
Copy link

SzymonHitachi commented Sep 25, 2024

@SzymonHitachi You can find the work in the https://github.com/VUnit/vunit/tree/three-step-flow branch. There is a simple testbench https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow which we use to get a first proof of concept. It works for me but not for @tasgomes and I'm assuming we have some race conditions.

I'm aware of qrun but initially I want to follow a flow that is more generic and applies to all simulators we support. That makes our code cleaner. It should be possible to do what is done in qrun though. Apart from making Questa run in a multi-threaded setup, we are also planning to add compile groups which allow for compiling files in groups. I know that Mikael prefers that approach to compilation but there are more use cases we need to support so we need to keep our design flexible. I got an email from Mikael today (no coincidence I assume 😄). I'm hoping he will join the discussion here as he knows Questa under the hood.

Thanks for the links. It seems it still needs modelsim defined as the simulator, so I guess it requires to have either modelsim/questasim installed or some ENV var defined to use one or another?

@tasgomes
Copy link

@SzymonHitachi Currently, VUnit does not distinguish Modelsim from Questa. To use Questa you need to define modelsim as simulator and then provide the path to the Questa installation folder. For instance:

environ["VUNIT_SIMULATOR"] = "modelsim"
environ["VUNIT_MODELSIM_PATH"] = "C:/intelFPGA_pro/21.2/questa_fe/win64"

@outdoorsweden
Copy link

outdoorsweden commented Oct 4, 2024

Mikael Andersson, Siemens EDA here!
I have not read the whole thread. I've just browsed through it.
I'd recommend to use the three step flow (compile, optimize, simulate).

Here you need to make some choices when it comes to optimizations. The alternatives are:

NOTE! I've tried the first alternative and that is not working. So it is the second alternative that is the best option.

  • Optimize each test. Pro: Best performance on each simulation. Drawback: You might get issues with locks when multiple vopt are started at the same time. But this can be handled by using the undocumented switch "-nolock" to vopt and by making sure that the output from vopt has a unique name. So for each test it would run:
>vopt tb -nolock -g <test generics> -o tb_<testname>_opt -debug -designfile design_<test>.bin
>vsim tb_<testname>_opt -qwave=+signal+msg+assertion=pass ....

  • Create one common optimized version. Pro: Just one optimization. Drawback: Could affect performance significantly if
    you apply floatgenerics to aggressive. Also you will need to use a hybrid elaboration flow for Visualizer (Not hard).
>vcom .... 
>vopt tb -floatgenerics runner_cfg -g <default test generics> -o tb_dbg_opt -debug -designfile design.bin

And then foreach test:

>vsim tb_dbg_opt -g <test generics> -qwave=+signal+msg+assertion=pass ....

You can also generate different optimizations with vopt. Like in the second case, you could generate one for performance in regression only and one for debug:

>vcom .... 
>vopt tb -floatgenerics runner_cfg -g <default test generics> -o tb_dbg_opt -debug -designfile design.bin
>vopt tb -floatgenerics runner_cfg -g <default test generics> -o tb_opt 

And if you want best performance, you use the tb_opt version without logging:
foreach test:

>vsim tb_opt -g <test generics> ....

Hope this is helpful!
BR
Mikael
Siemens EDA
Siemens Digital Industries Software

Evenemangsgatan 21
SE-169 03 Solna, Sweden
Mobile: +46 70 932 9516
[email protected]
www.sw.siemens.com

@outdoorsweden
Copy link

Mikael Andersson, Siemens EDA here again!
I have now used our tools to show the effect of using qrun + the flow "Create one common optimized version" described above.

I have used the axi_dma example. The only change is that the "Random AXI configuration" test has been extended to run a bit longer. And I choose to measure the effect on a "clean" start. The way you would do in a Continuous Integration environment.

This picture is a screenshot from Questa Run Manager where I defined a flow that is suitable for regression of Vunit based testbenches:
image
Notice that I compile and optimize for each testbench. Since the compile step is using qrun, the compile step is so fast that it really does not matter that we do three parallel compile. The fact that we use a library location for each testbench also allowes optimization in parallel without any lock files.
The actual compile and optimize command I used was this:

qrun -64 -f .../qrun.files -work axi_dma_lib -optimize -top tb_axi_dma  -snapshot tb_axi_dma_opt 
-vopt.options  -floatgenerics+runner_cfg  +cover=bcesf+axi_dma. -end -outdir .../VRMDATA/my_run/testbench~tb_axi_dma/compile_and_optimize/qrun.out 

or from the Makefile:
image

And in simulations I used:

qrun -64 -work axi_dma_lib -simulate -snapshot tb_axi_dma_opt -vsim.options -f .../tests/Perform_split_transfers.vunit.args 
-coverage -end -onfinish stop -do "coverage save Perform_split_transfers.ucdb -onexit;run -all;exit -f" -t ps -error vsim-3040

or from the Makefile:
image

So what about performance comparison with current Vunit flow and the flow above? The blue is first run, the orange one is the second run.
image

The main reason that multiple cpu's does not make a bigger difference for Vunit is that each simulation does optimization and creates a lock file which next simulation needs to wait until it is removed.

A comparison on only the compile time qrun vs Vunit:
image

Hope this is helpful!
BR
Mikael
Siemens EDA
Siemens Digital Industries Software

Evenemangsgatan 21
SE-169 03 Solna, Sweden
Mobile: +46 70 932 9516
[email protected]
www.sw.siemens.com

@LarsAsplund
Copy link
Collaborator

Hi @outdoorsweden,

The main reason that multiple cpu's does not make a bigger difference for VUnit is that each simulation does optimization and creates a lock file which next simulation needs to wait until it is removed.

VUnit currently runs vopt as part of the internal "simulation" step as it is a simpler first modification of the current VUnit structure. It is still reusing previous vopt runs though. If we have a testbench running 5 times (with different generics), there will only be one vopt run. That is visible in the debug logs of @tasgomes tests.

Currently, I wait for the lock file to be removed before releasing the thread such that a new simulation can begin. That can be improved by checking for lock files before beginning a simulation instead. If the next simulation is towards another library, there will be no wait time at all. Is that what you mean with

The fact that we use a library location for each testbench also allowes optimization in parallel without any lock files.

Or are you actively using the -nolock feature?

Regarding the difference between the 1 CPU run and the 5 CPU run. If I interpret your measurements correctly, there is no difference between qrun and VUnit for the simulation runs. In both cases, the orange bars becomes 24 time units faster in the 5 CPU case. In the best of worlds, the 5 CPU run would be 5x faster but in short tests like these, the simulator startup time becomes dominant.

I think you've confirmed that simultaneous vopts on the same library isn't possible but how deep does an optimization go? If testbench A and B are optimized towards different libraries but they both use module C, will they both try to optimize C? Or is vopt limited the the top-level and whatever C design already existing (optimized or not optimized) is the one being used?

@outdoorsweden
Copy link

VUnit currently runs vopt as part of the internal "simulation" step as it is a simpler first modification of the current VUnit structure. It is still reusing previous vopt runs though. If we have a testbench running 5 times (with different generics), there will only be one vopt run. That is visible in the debug logs of @tasgomes tests.

This does indeed look like a race condition. I have never seen anything like it. How do you check that vopt has finished before you start vsim? Because what you normally would do is:

  1. compile
  2. optimize
  3. Launch simulations in parallel when optimize has finished
    If you can recreate the problem, file a SR on https://support.sw.siemens.com/en-US/ and provide the test case.

Or are you actively using the -nolock feature?
No , it is undocumented and does not work the way I expected it.

The fact that we use a library location for each testbench also allowes optimization in parallel without any lock files.

So this is the directory structure that I get with Questa Run Manager (I have filtered away some stuff):
image

So each testbench have a qrun.out directory that contains all the libraries. And this is why I can run all the optimizations in parallel.

I think you've confirmed that simultaneous vopts on the same library isn't possible but how deep does an optimization go? If testbench A and B are optimized towards different libraries but they both use module C, will they both try to optimize C? Or is vopt limited the the top-level and whatever C design already existing (optimized or not optimized) is the one being used?

No, so all the machine code generated ends up in the optimized version. So if you want to optimize to different libraries, this might work (have not tested if it generates a lock file in the design lib or not:

image

I will update my Questa vrun application so that it tests this concept.

BR
Mikael

@LarsAsplund
Copy link
Collaborator

@outdoorsweden Just to be clear. There were race conditions that we fixed and none of us experience any problems at this point. However, since we weren't sure about the inner workings it was hard to be fully confident that it would work for everyone.

Initially I used OS synchronization mechanisms to make sure that a vopt operation performed by one thread returns before another thread starts doing a new vopt in the same lib. A mistake in that code was the cause of one race condition.

That approach was not enough since the lock file may remain on the file system after the vopt call has returned. Probably due to delays in the file system. A second vopt to the same lib may see that lock file before it is deleted and then it fails. That was the cause of the other race condition we've seen. It was fixed by also checking for the presence of the lock file and wait for it to disappear before letting the next thread run.

I don't think checking the lock file alone is enough. I've observed that vopt can create and delete a lock file several times during its execution. For that reason, VUnit will use an OS lock to prevent other threads from starting a vopt on a lib already used by another vopt. That OS lock is removed when the first vopt call returns and there are no remaining lock files.

This means that we do follow your suggestion of 1. compile, 2. optimize, and 3 simulate for each testbench. However, step 2 and 3 are performed in several concurrent threads. No constraint are placed on what can be simulated in parallel. The only constraint now is that no concurrent vopt is allowed on the same lib.

We discussed compiling all libraries into multiple directories or simply make multiple copies of the original set of compiled libs. However, that doesn't feel like a nice solution considering we can have hundreds of testbenches that need their own library copy. I much rather give up the idea of concurrent vopts and have a single library set.

Optimizing each design to a separate lib sounds more interesting as it doesn't involve multiple copies. I will try that and see what lock files are being generated.

Final question about vopt. I feel I don't quite understand the basics. What is being optimized? From what I understand we only call vopt on the testbench (test_counter in your example) but never on the design being tested (the counter)

@LarsAsplund
Copy link
Collaborator

@outdoorsweden I made a quick test where the design is optimized into another directory. It looks like the lock files only appear in the new directory. I think it would work as a workaround if the current solution, which seems to work right now, proves to have some yet to be seen issues.

One issue that remains though is the issue with the slow test as discussed earlier in this thread:

image

Sometimes a random test case takes much longer time than it should but sometimes all tests run as expected. This problem is present with multiple threads even if I completely disable optimization. I looked briefly at it before and concluded that it is vsim that adds the extra time. Do you have any idea why that is? Currently all threads running vsim is doing that from the same working directory. I recall we have had discussions about that in the past. Is that a problem? Should all threads run in separate directories?

@outdoorsweden
Copy link

Final question about vopt. I feel I don't quite understand the basics. What is being optimized? From what I understand we only call vopt on the testbench (test_counter in your example) but never on the design being tested (the counter)

Vopt will optimize testbench and everything in it, including the design.

@outdoorsweden
Copy link

@LarsAsplund

Sometimes a random test case takes much longer time than it should but sometimes all tests run as expected. This problem is present with multiple threads even if I completely disable optimization. I looked briefly at it before and concluded that it is vsim that adds the extra time. Do you have any idea why that is? Currently all threads running vsim is doing that from the same working directory. I recall we have had discussions about that in the past. Is that a problem? Should all threads run in separate directories?
Do you have the example so I can play with it?

@LarsAsplund
Copy link
Collaborator

@outdoorsweden This happens on the dummy testbench used in this thread https://github.com/VUnit/vunit/blob/three-step-flow/examples/vhdl/three_step_flow/tb_example.vhd which we run with 5 different settings for the value generic:

for value in range(5):
    test.add_config(name=f"{value}", generics=dict(value=value))

I don't think the test itself is significant so you could also test with the DMA example you had. Run all simulations in parallel in the same directory but remove optimization so that it doesn't play a role.

@tasgomes hasn't experience this so you may not experience anything either. That would suggest that there is another timing depending conflict over shared resources that has to be taken into account. modelsim.ini?

@LarsAsplund
Copy link
Collaborator

@outdoorsweden I forgot to "check the plug". Initially I got two one-month eval licenses from Innofour which was later "converted" to one-year licenses... I thought. Looks like my license file only contains a single license so the delay I see is simply that one vsim call gets stuck while waiting for a license. I will get in touch with Innofour and see if I can get this fixed.

@LarsAsplund
Copy link
Collaborator

I now got another license and the problems I saw disappeared as expected. I think our concept is good to go. I will review the work and take it from prototype to release quality. If nothing new shows up I will release it.

@outdoorsweden
Copy link

Sounds good! Will you include support for visualizer as well?

@LarsAsplund
Copy link
Collaborator

@outdoorsweden Eventually we should have the visualizer fully supported but I will release this feature first.

LarsAsplund added a commit that referenced this issue Nov 16, 2024
Known limitation is that parallel threads aren't supported.
@LarsAsplund
Copy link
Collaborator

@outdoorsweden I tried to invoke Visualizer post-simulation. This works if I add the following vopt and vsim flags to the run script such that the input files needed by Visualizer are generated:

vu.set_sim_option("modelsim.vsim_flags", ["-qwavedb=+signal"])
vu.set_sim_option("modelsim.vopt_flags", ["-debug", "-designfile", "design.bin"])

When trying the live-simulation mode, I run into some problems. Our normal approach to GUI-based simulations is to call vsim like this:

vsim -gui -l path_to_transcript_file -do source path_to_a_do_file 

The do file contains the actual vsim call with the design to simulate. One of the reasons for this approach is that the user has the option to define do/tcl files to be sourced before the simulation starts, i.e. such files are sourced before the vsim call in the do file.

If I try running Visualizer by replacing -gui with -visualizer, I get the following error:

An error occured while processing the vsim arguments, a design unit was not specified.

Looks like the -visualizer option needs the design to be specified on the same command line and not be "hidden" in a do file.

If I keep -gui and move -visualizer to the vsim call within the do file, there is no complaint but there is no Visualizer GUI popping up (only the VSIM GUI).

We use the same do file approach when running batch mode simulations (-gui is replaced by -c). I tested that with the same result. Visualizer doesn't start but the simulation completes as normal.

Any idea how this can be fixed?

@LarsAsplund
Copy link
Collaborator

I updated the example to emulate what embedded post-simulation Visualizer support would look like. If you run the run script with the --gui option every simulation will start the Visualizer after completion. Works without problems when running multiple threads in parallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants