XLA compilation error of a TARF payoff with MC #73

arthurpham · 2022-05-16T15:22:49Z

I'm trying to replicate the TARF payoff from this paper.
So far, i've been able to write a payoff that can be evaluated with MC and GPU, but i get an error with the XLA compilation.
https://colab.research.google.com/github/arthurpham/google_colab/blob/7d5715c2b0f988c8d7d544fea0ff741e63272056/TARF_MC_Performance_TQF.ipynb
I understand that the payoff code style might not be the best, but i wasn't able to find an example of a path-dependent payoff with MC.
Any pointer would be helpful.

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-7-945d7282d2b1> in <module>()
      1 device = "/gpu:0"
      2 with tf.device(device):
----> 3     tarf_price = price_eu_options_xla(strikes, spot, sigma)
      4     print('price', tarf_price)

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53     ctx.ensure_initialized()
     54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:
     57     if name is not None:

InvalidArgumentError: Detected unsupported operations when trying to compile graph __inference_price_eu_options_9107[_XlaMustCompile=true,config_proto=6001324581131673121,executor_type=11160318154034397263] on XLA_GPU_JIT: TensorListReserve (No registered 'TensorListReserve' OpKernel for XLA_GPU_JIT devices compatible with node {{function_node __inference_while_fn_9086}}{{node while_init/TensorArrayV2_7}}
	 (OpKernel was found, but attributes didn't match) Requested Attributes: element_dtype=DT_VARIANT, shape_type=DT_INT32){{function_node __inference_while_fn_9086}}{{node while_init/TensorArrayV2_7}}
The op is created at: 
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
  "__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
  exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
  app.launch_new_instance()
File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
  app.start()
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 499, in start
  self.io_loop.start()
File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
  self.asyncio_loop.run_forever()
File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
  self._run_once()
File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
  handle._run()
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
  self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
  ret = callback()
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
  return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 536, in <lambda>
  self.io_loop.add_callback(lambda: self._handle_events(self.socket, 0))
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 452, in _handle_events
  self._handle_recv()
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 481, in _handle_recv
  self._run_callback(callback, msg)
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 431, in _run_callback
  callback(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
  return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
  return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
  handler(stream, idents, msg)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
  user_expressions, allow_stdin)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 208, in do_execute
  res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 537, in run_cell
  return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
  interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
  if self.run_code(code, result):
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
  exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-945d7282d2b1>", line 3, in <module>
  tarf_price = price_eu_options_xla(strikes, spot, sigma)
File "<ipython-input-5-1d20baab715c>", line 89, in price_eu_options
  payoffs = tf.vectorized_map(my_function, reshaped_paths) [Op:__inference_price_eu_options_9107]

The text was updated successfully, but these errors were encountered:

cyrilchim · 2022-05-17T10:18:31Z

Hi @arthurpham ,

Thanks for reaching out! The issue you are referring to is a TensorFlow (TF) issue as you implicitly have a while loop inside a vectorized map. I just checked internally, it seems that someone is on it but feel free filing the bug with TF team directly.
On the other hand, even if the issue is resolved, I don't think that the code is well-optimized for a GPU device because of all the if statements branching.

Could you please give me edit rights for the colab? Otherwise, I have fiddled a bit with your my_function adding batched version of the calculation, making the code a bit more inefficient on a CPU device but clearly improving GPU performance. Also, it is XLA-compatible now (runs in < 70 ms for me on a T4 device).

        def my_function_new(paths):
           # Shape [num_timesteps, num_samples]
            paths = tf.transpose(paths)
            cur_spot = paths[0]
            total = tf.zeros_like(cur_spot)
            discounted_payoff = tf.zeros_like(cur_spot)
            df = tf.constant(1.0, dtype=tf.float64)
            is_active = tf.ones([num_samples], dtype=tf.bool)
            cashflow = tf.zeros_like(cur_spot)
            i = tf.constant(0, dtype=tf.int32)
           # Explicitly define the while_loop 
            def cond(i, is_active, cashflow, total, discounted_payoff):
              return i < num_timesteps

            def body(i, is_active, cashflow, total, discounted_payoff):
              # Here Tensors are of shape `[num_samples]`
              cur_spot = paths[i]
              add_cashflow = False
              new_is_active = K_knockout > cur_spot
              add_cashflow = tf.where(
                  tf.logical_or(K_upper <= cur_spot, cur_spot < K_lower),
                  True, False)

              new_cashflow = tf.where(
                 K_upper <= cur_spot,
                 cur_spot - strike,
                 cashflow
              )
              new_cashflow = tf.where(cur_spot < K_lower,
                                  step_up_ratio*(cur_spot - strike),
                                  new_cashflow)
              new_is_active = tf.where(
                  add_cashflow,
                  tf.where(total + new_cashflow >= tarf_target,
                           False, new_is_active),
                  new_is_active)

              new_cashflow = tf.where(
                  add_cashflow,
                  tf.where(total + new_cashflow >= tarf_target,
                           tarf_target - total, new_cashflow),
                  new_cashflow
                  )

              new_total = tf.where(
                  add_cashflow,
                  total + new_cashflow,
                  total
                  )
              new_discounted_payoff = tf.where(
                  add_cashflow,
                  discounted_payoff + df * new_cashflow,
                  discounted_payoff)
              # Update values only if active
              new_cashflow = tf.where(is_active, 
                                      new_cashflow,
                                      cashflow)
              new_total = tf.where(is_active,  new_total, total)
              new_discounted_payoff = tf.where(is_active, 
                                      new_discounted_payoff,
                                      discounted_payoff)
              new_is_active = tf.where(is_active, 
                                      new_is_active,
                                      is_active)
              return (i + 1, new_is_active, new_cashflow,
                      new_total, new_discounted_payoff)
            _, is_active, cashflow, total, discounted_payoff = tf.while_loop(
                cond, body, (i, is_active, cashflow, total, discounted_payoff)
            )
            return discounted_payoff

Then no need to use vectorized_map. Simply

payoffs = my_function_new(reshaped_paths)

Please let me know if that makes sense

Would you be interested cleaning up the code and contributing it to the library? I have not checked the maths so testing is needed.

arthurpham · 2022-05-25T15:51:26Z

Thank you for the help.

I reused your payoff and that worked for the price.
When i try to compute the delta with XLA compilation, i get an error :
https://colab.research.google.com/github/arthurpham/google_colab/blob/494d8a07b8eb39b2c93dbe4dbe8f519730be0030/TARF_MC_Performance_TQF.ipynb

@tf.function(jit_compile=True,
             input_signature=[tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64)
                               ])
def delta_fn_xla(strikes, spot, sigma):
    fn = lambda spot: price_eu_options_xla(strikes, spot, sigma)
    return tff.math.fwd_gradient(fn, spot, use_gradient_tape=True)

InternalError: Propagate: Cannot find body function while_body_5075_grad_16381_grad_16947_const_0 for While node gradients/PartitionedCall_2/gradients/gradients/PartitionedCall_grad/PartitionedCall_15_grad/PartitionedCall/gradients/gradients/while_grad/while_grad_grad/gradients/while_grad/while_grad_grad [Op:__inference_delta_fn_xla_17607]

Also for the non-xla version, is there anything i can do to improve the performance ?

t = time.time()
tarf_price = price_eu_options_xla(strikes, spot, sigma)
tarf_delta = delta_fn(strikes, spot, sigma)
tarf_vega = vega_fn(strikes, spot, sigma)
time_tqf = time.time() - t

With fwd_gradients:

TQF gpu TARF price+delta+vega
wall time + tracing:  31.278723001480103
options per second + tracing:  0.03197061465561366
wall time:  7.026975631713867
options per second:  0.142308733146425
------------------------
price tf.Tensor(-246.927357070608, shape=(), dtype=float64)
delta tf.Tensor(21.37942877385511, shape=(), dtype=float64)
vega tf.Tensor(-331.98229464223164, shape=(), dtype=float64)

With gradients:

TQF gpu TARF price+delta+vega
wall time + tracing:  15.258108139038086
options per second + tracing:  0.0655389246745136
wall time:  5.396306037902832
options per second:  0.1853119509857581
------------------------
price tf.Tensor(-246.927357070608, shape=(), dtype=float64)
delta tf.Tensor(21.37942877385511, shape=(), dtype=float64)
vega tf.Tensor(-331.98229464223175, shape=(), dtype=float64)

For reference, here is the same TARF payoff implemented in c++, using Antoine Savine library from the book "Modern Computational Finance: AAD and Parallel Simulations".
https://github.com/arthurpham/CompFinance/blob/e3f2c4e804901fe39d9a33d6fb01c30c8880baee/xlSpreadheets/CompFinance_TARF.xlsx?raw=true
This is run on a regular laptop : 243 ms in parallel (6 cores), 1020 ms in serial.

cyrilchim · 2022-05-26T09:07:12Z

Thank you, Arthur!

The issue you are having is nested jit_compilation. Here you'd need to do a few changes:

In my_function_new, please add maximum_iterations to the while_loop:

            _, is_active, cashflow, total, discounted_payoff = tf.while_loop(
                cond, body, (i, is_active, cashflow, total, discounted_payoff),
                maximum_iterations=num_timesteps,
            )

Greeks can be computed using backward gradient

@tf.function(jit_compile=True,
             input_signature=[tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64)
                               ])
def greeks_fn_xla(strikes, spot, sigma):
    with tf.GradientTape() as tape:
      tape.watch([spot, sigma])
      prices = price_eu_options(strikes, spot, sigma)
    return prices, tape.gradient(prices, [spot, sigma])

As for the perfomance, just for the reference, GPU performance I get is ~130 ms for greeks_fn_xla function (Please check how long it takes on your CPU, also when running with XLA, check CPU utilization as this might as well be running on a single thread).

We can try to improve performance separately.

Does 1020 ms include greek computation? If so, what is pricing speed separately? Could you please point to the source code?

cyrilchim · 2022-05-26T11:44:40Z

Also, do you need Euler sampling for the Geometric Brownian Motion? This can be sampled more efficiently with the designated sampler

arthurpham · 2022-05-26T12:25:39Z

The 1020ms include the price+greeks (1st derivative with respect to the spot, vol, risk free rate, dividend yield) in serial (no parallel computation).
For the price alone :
.
It's 94 ms in parallel (using the 6 cores of the cpu on my laptop), and 452 ms in serial.
As for the code, it's in the same git repo as the spreadsheet : arthurpham/CompFinance@e118228
https://github.com/arthurpham/CompFinance
I just cloned the git repo and added the TARF payoff, and an xll function to create the Tarf product object.

I will include your suggestions and measure again the timing.
As for the GeometricBrownianMotion vs GenericItoProcess, reason i used GenericItoProcess is because that's what was in the notebook samples for Monte Carlo, and i assumed that with the same number of timesteps, we should get the same performance (i might be wrong), but get different accuracy.

arthurpham · 2022-05-26T14:48:27Z

GPU performance I get is ~130 ms for greeks_fn_xla function

Running on Google Colab Pro with a Tesla T4, i get 1.27s with GPU.
https://colab.research.google.com/github/arthurpham/google_colab/blob/0586ac3a8fb9d477cdd752fcda69fadd4ec3bb0e/TARF_MC_Performance_TQF.ipynb#scrollTo=98958e3a

Running on Google Colab Pro with a Tesla P100, i get 780ms with GPU.
https://colab.research.google.com/github/arthurpham/google_colab/blob/44cb4956ba87ec5cc3a04323afd548cee91d9772/TARF_MC_Performance_TQF.ipynb#scrollTo=5CMM52A4Wvqy

What kind of gpu do you use to get 130 ms ?

cyrilchim · 2022-05-26T20:14:53Z

I am using Tesla T4.

So given it is working, I looked a bit in performance details. I would still recommend using Geometric Brownian motion directly as that has a different implementation than the GenericItoProcess. The latter relies on TensorArrays (basically a list) to store location values, so differentiating through it is a bit slow. Nevertheless,

Since you have 53 time points it is more efficient not to use watch_params (set it to False).
Also, you do not need time_step in GenericItoProcess as you are stepping though times (basically, the algorithm steps through all your times + the grid formed with time_step). Instead, use times_grid directly

        paths = process.sample_paths(
           times=times, num_samples=num_samples,
            initial_state=log_spot, 
            watch_params=watch_params_list,
            times_grid=times,  # the algorithms steps only through times.
            # Select a random number generator
            random_type=tff.math.random.RandomType.SOBOL,
            )

This improves things quite a bit. On public colab with Tesla T4 I get ~100 ms on a GPU to get all Greeks and ~3 secs on a CPU (that has 2 processors, see !cat /proc/cpuinfo).

If needed, I can try further optimize CPU performance

cyrilchim · 2022-05-26T21:08:20Z

Also, could you please report the random number generation speed? I assume you'd need [200_000, 53] samples from Sobol that you then covert to normals. Could you please let me know how long that takes?

cyrilchim · 2022-05-27T11:29:07Z

I have also changed Sobol sequence to use tf.math.qmc instead (The default RandomType.SOBOL is tied to an old implementation)

        sobol_seq = tff.math.qmc.sobol_sample(num_results=num_samples,
                                                 dim=num_timesteps, 
                                                 dtype=tf.float64)
        # Shape [num_samples, num_timesteps, 1]
        normal_draws = tf.math.erfinv((sobol_seq[..., tf.newaxis] - 0.5) * 2)* np.sqrt(2)
        paths = process.sample_paths(
           times=times, num_samples=num_samples,
            initial_state=log_spot,  
            watch_params=watch_params_list,
            times_grid=times,
            normal_draws=normal_draws)

On Tesla T4, I seem to compute price + greeks is 90 ms.

The colab CPUs are a bit weak. On my 2020 intel macbook pro (with enforced single-threading) I get 1.69 secs for price with greeks and 0.61 secs for pricing. I think this is more or less in the ballpark of your results.

arthurpham · 2022-05-27T21:03:17Z

Ok that's surprising to see that watch_params=False would be faster as the documentation says the opposite.
Just changing that gives better result, but how would you know in advance if it should be True or False ?

Another question: https://colab.research.google.com/github/arthurpham/google_colab/blob/f02d64e65e9ba59ca9ce33048f3459560cef7fa5/TARF_MC_Performance_TQF.ipynb

Let's assume i'm trying to determine what would be a realistic latency and cost (assuming a $/sec for gpu) of a pricing service relying on TQF. So let's say i receive a new trade with different inputs (the request arrives at different time, so no batching possible), i should be able to reuse the optimized function (maybe the input_signature of the tf.function is not configured properly ?), but i don't observe that.
I would expect faster timing where i put the <======== (strike 18.2, 20.0 and 25.0).

So, is it possible to avoid retracing with xla when changing the inputs (assuming no change in shapes) ?

for spot in [15.0, 18.2, 20.0, 25.0]:
  spot = tf.constant(spot, dtype=dtype)
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf0 = time.time() - t

  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf = time.time() - t

------------------------
TQF gpu TARF XLA price+delta+vega spot: 15.0
wall time + tracing:  2.576292037963867
options per second + tracing:  0.38815475313518205
wall time:  0.06866049766540527
options per second:  14.564415260623022
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 18.2
wall time + tracing:  2.5903542041778564 <========
options per second + tracing:  0.3860475908611836
wall time:  0.06723594665527344
options per second:  14.872996510737284
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 20.0
wall time + tracing:  2.6097450256347656 <========
options per second + tracing:  0.3831791957364766
wall time:  0.06735658645629883
options per second:  14.846358056591885
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 25.0
wall time + tracing:  3.6820600032806396 <========
options per second + tracing:  0.27158710045708667
wall time:  0.0670621395111084
options per second:  14.911543343086402
------------------------

When i call print(greeks_fn_xla.pretty_printed_concrete_signatures()), i only see:

greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  Args:
    strikes: float64 Tensor, shape=()
    spot: float64 Tensor, shape=()
    sigma: float64 Tensor, shape=()
    rate: float64 Tensor, shape=()
    dividend: float64 Tensor, shape=()
  Returns:
    (<1>, [<2>, <3>, <4>, <5>])
      <1>: float64 Tensor, shape=()
      <2>: float64 Tensor, shape=()
      <3>: float64 Tensor, shape=()
      <4>: float64 Tensor, shape=()
      <5>: float64 Tensor, shape=()

arthurpham · 2022-05-27T21:19:23Z

Also, could you please report the random number generation speed? I assume you'd need [200_000, 53] samples from Sobol that you then covert to normals. Could you please let me know how long that takes?

The library that i forked to add the payoff is not mine, it comes from a book and the code is shared on github.
https://github.com/asavine/CompFinance/blob/6538e90c95993eebac6b728f1bf28b877bec9b5d/mcBase.h#L300

I don't have an easy way to measure what you are asking, but if i comment the path generation and payoff, i think it that the random number generation takes roughly 40% of the total time in serial mode :

auto cRng = rng.clone();
cRng->init(cMdl->simDim());        
//	Iterate through paths	
for (size_t i = 0; i<nPath; i++)
{
    //  Next Gaussian vector, dimension D
    cRng->nextG(gaussVec);                        
    //  Generate path, consume Gaussian vector
    ////cMdl->generatePath(gaussVec, path);     
    //	Compute result
    ////prd.payoffs(path, results[i]);
}

cyrilchim · 2022-05-29T09:24:34Z

Hi Arthur,

For TF I checked that the random numbers also take roughly 40% of the compute time.
As for your questions:

Compiled functions expect tensor inputs, not numpy objects, so should be something like

for spot in [tf.convert_to_tensor(x, tf.float64) for x in [15.0, 18.2, 20.0, 25.0]]:
...

watch_params works well when there is a single expiry in the options (so you do not record intermediate values). I could try fixing that as I would need to adjust how gradient flows through tf.TensorArray. The idea was to mock XLA-friendly support for forward gradients.

cyrilchim · 2022-05-31T11:02:51Z

I've pushed a change so that watch_params should work fine now. Could you please double check? On my mac CPU I get ~1.13 secs for greeks and price calculation (on a single thread).

arthurpham · 2022-05-31T13:09:59Z

Compiled functions expect tensor inputs, not numpy objects, so should be something like

In my example above the spot was converted with spot = tf.constant(spot, dtype=dtype)
I tried your suggestion, and i still don't get the correct behavior: each time the strike value is changed, the first call to greeks_fn_xla is very slow.

https://colab.research.google.com/github/arthurpham/google_colab/blob/dff028c368a615c585e7a0e92c87c287c08b6bb4/TARF_MC_Performance_TQF.ipynb#scrollTo=cb03ad7b

for spot in [tf.convert_to_tensor(x, tf.float64) for x in [15.0, 18.2, 20.0, 25.0]]:
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf0 = time.time() - t
  
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf = time.time() - t

------------------------
TQF gpu TARF XLA price+delta+vega spot: 15.0
wall time + tracing:  2.9289374351501465 <=============
options per second + tracing:  0.34142074460144173
wall time:  0.12999844551086426
options per second:  7.692399675013251
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 18.2
wall time + tracing:  2.884617805480957 <=============
options per second + tracing:  0.346666375732666
wall time:  0.13253283500671387
options per second:  7.5453000001798936
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 20.0
wall time + tracing:  2.8701701164245605 <=============
options per second + tracing:  0.34841140400615833
wall time:  0.13067197799682617
options per second:  7.652750155999693
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 25.0
wall time + tracing:  3.2776947021484375 <=============
options per second + tracing:  0.3050924783642991
wall time:  0.13060975074768066
options per second:  7.656396205302135
------------------------

I've pushed a change so that watch_params should work fine now. Could you please double check? On my mac CPU I get ~1.13 secs for greeks and price calculation (on a single thread).

Thank you for the quick fix, seems to make a difference from a quick test.

cyrilchim · 2022-05-31T13:48:41Z

You need to remove @tf.function decorator from tarf_payoff. Not sure why this causing a problem here but generally one should avoid nested tf.functions for XLA compilation.

Just in case, here is a version of the pricer incorporating all suggestions above:

def set_up_pricer_xla(times, watch_params=False):
    def price_eu_options(strikes, spot, sigma, rate, dividend):
        # Define drift and volatility functions. 
        def drift_fn(t, x):
          del t, x
          return rate - dividend - 0.5 * sigma**2
        def vol_fn(t, x):
          del t, x
          return tf.reshape(sigma, [1, 1])
        # Use GenericItoProcess class to set up the Ito process
        process = tff.models.GenericItoProcess(
            dim=1,
            drift_fn=drift_fn,
            volatility_fn=vol_fn,
            dtype=dtype)
        log_spot = tf.math.log(tf.reduce_mean(spot))
        if watch_params:
            watch_params_list = [sigma, rate, dividend]
        else:
            watch_params_list = None
        # Feed a new version of Sobol numbers skipping the 1st element which is `0.0`
        sobol_seq = tff.math.qmc.sobol_sample(num_results=num_samples + 1,
                                                 dim=num_timesteps, 
                                                 dtype=tf.float64)[1:]
        # Convert uniform draws to normals
        normal_draws = tf.math.erfinv((sobol_seq[..., tf.newaxis] - 0.5) * 2)* np.sqrt(2)
        paths = process.sample_paths(
           times=times, num_samples=num_samples,
            initial_state=log_spot,  
            watch_params=watch_params_list,
            times_grid=times,
            normal_draws=normal_draws)
        def tarf_payoff(paths):
            # Shape [num_timesteps, num_samples]
            paths = tf.transpose(paths)
            cur_spot = paths[0]
            total = tf.zeros_like(cur_spot)
            discounted_payoff = tf.zeros_like(cur_spot)
            df = tf.constant(1.0, dtype=tf.float64)
            is_active = tf.ones([num_samples], dtype=tf.bool)
            i = tf.constant(0, dtype=tf.int32)
            # Explicitly define the while_loop 
            def cond(i, is_active, total, discounted_payoff, df):
                return i < num_timesteps
            def body(i, is_active, total, discounted_payoff, df):
                # Here Tensors are of shape `[num_samples]`
                cur_spot = paths[i]
                cashflow = tf.zeros_like(cur_spot)
                new_is_active = K_knockout > cur_spot
                add_cashflow = tf.where(tf.logical_or(K_upper <= cur_spot, cur_spot < K_lower),
                    True, 
                    False)
                new_cashflow = tf.where(K_upper <= cur_spot,
                    cur_spot - strike,
                    cashflow
                )
                new_cashflow = tf.where(cur_spot < K_lower,
                                    step_up_ratio*(cur_spot - strike),
                                    new_cashflow)
                new_is_active = tf.where(add_cashflow,
                    tf.where(total + new_cashflow >= tarf_target,
                            False, new_is_active),
                    new_is_active)
                new_cashflow = tf.where(add_cashflow,
                    tf.where(total + new_cashflow >= tarf_target,
                            tarf_target - total, new_cashflow),
                    new_cashflow
                    )
                new_total = tf.where(add_cashflow,
                    total + new_cashflow,
                    total
                    )
                new_discounted_payoff = tf.where(add_cashflow,
                    discounted_payoff + df * new_cashflow,
                    discounted_payoff)
                # Update values only if active
                new_cashflow = tf.where(is_active, 
                                        new_cashflow,
                                        cashflow)
                new_total = tf.where(is_active, new_total, total)
                new_discounted_payoff = tf.where(is_active, 
                                        new_discounted_payoff,
                                        discounted_payoff)
                new_is_active = tf.where(is_active, new_is_active, is_active)
                return (i + 1, new_is_active, new_total, new_discounted_payoff, df)
            _, is_active, total, discounted_payoff, _ = tf.while_loop(
                cond, body, (i, is_active, total, discounted_payoff, df),
                maximum_iterations=num_timesteps,
            )
            return discounted_payoff
        reshaped_paths = tf.reshape(tf.math.exp(paths), [num_samples, num_timesteps])
        payoffs = tarf_payoff(reshaped_paths)
        prices = tf.reduce_mean(payoffs)
        return prices
    return price_eu_options

Now I can define

price_eu_options = set_up_pricer_xla(times, watch_params=True)

@tf.function(jit_compile=True,
             input_signature=[tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64)
                               ])
def greeks_fn_xla(strikes, spot, sigma, rate, dividend):
    with tf.GradientTape() as tape:
      tape.watch([spot, sigma, rate, dividend])
      prices = price_eu_options(strikes, spot, sigma, rate, dividend)
    return prices, tape.gradient(prices, [spot, sigma, rate, dividend])

and try running for different spot values:

for spot in [tf.convert_to_tensor(x, tf.float64) for x in [25.0, 28.2, 30.0, 45.0]]:
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  print(time.time() - t)

This works as expected for me

arthurpham · 2022-06-02T12:24:48Z

You need to remove @tf.function decorator from tarf_payoff. Not sure why this causing a problem here but generally one should avoid nested tf.functions for XLA compilation.

Yes that was the problem, thanks a lot for being patient.

arthurpham closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLA compilation error of a TARF payoff with MC #73

XLA compilation error of a TARF payoff with MC #73

arthurpham commented May 16, 2022

cyrilchim commented May 17, 2022 •

edited

Loading

arthurpham commented May 25, 2022

cyrilchim commented May 26, 2022 •

edited

Loading

cyrilchim commented May 26, 2022 •

edited

Loading

arthurpham commented May 26, 2022

arthurpham commented May 26, 2022 •

edited

Loading

cyrilchim commented May 26, 2022

cyrilchim commented May 26, 2022

cyrilchim commented May 27, 2022 •

edited

Loading

arthurpham commented May 27, 2022

arthurpham commented May 27, 2022

cyrilchim commented May 29, 2022

cyrilchim commented May 31, 2022

arthurpham commented May 31, 2022

cyrilchim commented May 31, 2022 •

edited

Loading

arthurpham commented Jun 2, 2022

XLA compilation error of a TARF payoff with MC #73

XLA compilation error of a TARF payoff with MC #73

Comments

arthurpham commented May 16, 2022

cyrilchim commented May 17, 2022 • edited Loading

arthurpham commented May 25, 2022

cyrilchim commented May 26, 2022 • edited Loading

cyrilchim commented May 26, 2022 • edited Loading

arthurpham commented May 26, 2022

arthurpham commented May 26, 2022 • edited Loading

cyrilchim commented May 26, 2022

cyrilchim commented May 26, 2022

cyrilchim commented May 27, 2022 • edited Loading

arthurpham commented May 27, 2022

arthurpham commented May 27, 2022

cyrilchim commented May 29, 2022

cyrilchim commented May 31, 2022

arthurpham commented May 31, 2022

cyrilchim commented May 31, 2022 • edited Loading

arthurpham commented Jun 2, 2022

cyrilchim commented May 17, 2022 •

edited

Loading

cyrilchim commented May 26, 2022 •

edited

Loading

cyrilchim commented May 26, 2022 •

edited

Loading

arthurpham commented May 26, 2022 •

edited

Loading

cyrilchim commented May 27, 2022 •

edited

Loading

cyrilchim commented May 31, 2022 •

edited

Loading