Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state machine : Dynamic passing of bucket and key to distributed map #29409

Open
2 tasks done
AmylDV opened this issue Mar 8, 2024 · 5 comments · May be fixed by #30836
Open
2 tasks done

state machine : Dynamic passing of bucket and key to distributed map #29409

AmylDV opened this issue Mar 8, 2024 · 5 comments · May be fixed by #30836
Assignees
Labels
@aws-cdk/aws-s3 Related to Amazon S3 effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2

Comments

@AmylDV
Copy link

AmylDV commented Mar 8, 2024

Describe the feature

Currently there is no means of passing a bucket and key dynamically to a distributed Map state using CDK. This functionality is available in the states language using JSONPath along the lines of

"ItemReader": {
"Resource": "arn:aws:states:::s3:getObject",
"ReaderConfig": {
"InputType": "JSON"
},
"Parameters": {
"Bucket.$": "$.Payload.bucket",
"Key.$": "$.Payload.key"
}
}

Use Case

I want to define my state machine using CDK rather than the states language

Proposed Solution

Create an IItemReader that can take a string in place of a bucket.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.128.0

Environment details (OS name and version, etc.)

MacOs Sonoma

@AmylDV AmylDV added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Mar 8, 2024
@github-actions github-actions bot added the @aws-cdk/aws-s3 Related to Amazon S3 label Mar 8, 2024
@pahud
Copy link
Contributor

pahud commented Mar 8, 2024

Thank you for the feature request. We welcome and appreciate your PR for this.

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Mar 8, 2024
@JannikWempe
Copy link

I faced the same issue and still wanted to use the DistributedMap class instead of using CustomState.

I came up with a dirty, hacky workaround and extended the S3CsvItemReader.

export interface ExtendedS3CsvItemReaderProps extends sfn.S3CsvItemReaderProps {
	readonly "key.$"?: string;
}

class ExtendedS3CsvItemReader extends sfn.S3CsvItemReader {
	readonly dynamicKey: string | undefined;

	constructor(props: ExtendedS3CsvItemReaderProps) {
		super(props);
		this.dynamicKey = props["key.$"];
	}

	public render() {
		const rendered = super.render();

		if (this.dynamicKey) {
			// ignores the provided `key` and uses `key.$` instead
			rendered.Parameters = {
				Bucket: rendered.Parameters.Bucket,
				"Key.$": this.dynamicKey,
			};
		}

		return rendered;
	}
}

You can use it just like the existing S3CsvItemReader but pass in a "key.$" prop (you still have to pass a key but that will be ignored). You could do the same with Bucket.$.

I may dig deeper and provide a PR (with a polished implementation of course 😜).

ChakshuGupta13 pushed a commit to ChakshuGupta13/aws-cdk that referenced this issue Jul 12, 2024
**Why are these changes required?**
- For `DistributedMap` state of StepFunctions, `IItemReader` currently only allows S3 bucket as input source to be declared statically in CDK.
- In other words, current CDK implementation only caters to static use-case where we know either `bucket` or `bucketName` (from which we can create `IBucket`) and pass it to `IItemReader`.
- Whereas via AWS Console, if we create `DistributedMap` manually, then we can also convey S3 source dynamically using State Input / JsonPath.
- In other words, for dynamic use-case, we will neither have `bucket` nor `bucketName` i.e. we only know state input variable which will convey `bucketName` e.g. `$.bucketName`.
- So, if we want to use `IItemReader` for dynamic use case also, then we will:
  - (1) need to make `bucket: IBucket` optional (which will be breaking - how? e.g. if some dev is currently accessing `bucket` via `reader.bucket` then dev now needs to add check for `undefined`)
  - (2) then add another optional fields to convey state input variable names (e.g. $.bucketName $.key $.prefix)
- Therefore, to avoid introducing breaking change, we can follow `*Path` convention of StepFunctions and introduce `IItemReaderPath` for dynamic use-case. (closes aws#29409)

**What changes are being made?**
- Add `IItemReaderPath` interface (and its pros interface)
- Add `S3ObjectsItemReaderPath` as one of many concrete classes of `IItemReaderPath` - this class also helps with unit-testing and integration-testing.
- Modify `index` to export new constructs
- Modify `DistributedMap` (and its props) to allow `itemReaderPath?` (which will be mutually exclusive with `itemReader?` and `itemsPath?`) and utilise it for render

**How are these changes tested?**
- Via new unit-tests for `DistributedMap` (via `yarn build+test`)
- Via new integration test for `S3ObjectsItemReaderPath` (with snapshot created)
  - Via `yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js && yarn integ-runner --update-on-failed`
  - Verified expected step function execution result during snapshot creation
@ChakshuGupta13 ChakshuGupta13 linked a pull request Jul 12, 2024 that will close this issue
1 task
@ChakshuGupta13
Copy link

Hi @pahud, can you please assign this issue to me?
(Discussed one possible solution with @GavinZZ and raised above PR for proposed initial changes.)

@GavinZZ
Copy link
Contributor

GavinZZ commented Jul 23, 2024

Assigned this issue to @ChakshuGupta13, I will try to review your PR by end of this month. If I don't, feel free to ping me directly.

ChakshuGupta13 pushed a commit to ChakshuGupta13/aws-cdk that referenced this issue Jul 25, 2024
**Why are these changes required?**
- For `DistributedMap` state of StepFunctions, `IItemReader` currently only allows S3 bucket as input source to be declared statically in CDK.
- In other words, current CDK implementation only caters to static use-case where we know either `bucket` or `bucketName` (from which we can create `IBucket`) and pass it to `IItemReader`.
- Whereas via AWS Console, if we create `DistributedMap` manually, then we can also convey S3 source dynamically using State Input / JsonPath.
- In other words, for dynamic use-case, we will neither have `bucket` nor `bucketName` i.e. we only know state input variable which will convey `bucketName` e.g. `$.bucketName`.
- So, if we want to use `IItemReader` for dynamic use case also, then we will:
  - (1) need to make `bucket: IBucket` optional (which will be breaking - how? e.g. if some dev is currently accessing `bucket` via `reader.bucket` then dev now needs to add check for `undefined`)
  - (2) then add another optional fields to convey state input variable names (e.g. $.bucketName $.key $.prefix)
- Therefore, to avoid introducing breaking change, we can follow `*Path` convention of StepFunctions and introduce `IItemReaderPath` for dynamic use-case. (closes aws#29409)

**What changes are being made?**
- Add `IItemReaderPath` interface (and its pros interface)
- Add `S3ObjectsItemReaderPath` as one of many concrete classes of `IItemReaderPath` - this class also helps with unit-testing and integration-testing.
- Modify `index` to export new constructs
- Modify `DistributedMap` (and its props) to allow `itemReaderPath?` (which will be mutually exclusive with `itemReader?` and `itemsPath?`) and utilise it for render

**How are these changes tested?**
- Via new unit-tests for `DistributedMap` (via `yarn build+test`)
- Via new integration test for `S3ObjectsItemReaderPath` (with snapshot created)
  - Via `yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js && yarn integ-runner --update-on-failed`
  - Verified expected step function execution result during snapshot creation
ChakshuGupta13 pushed a commit to ChakshuGupta13/aws-cdk that referenced this issue Jul 25, 2024
**Why are these changes required?**
- For `DistributedMap` state of StepFunctions, `IItemReader` currently only allows S3 bucket as input source to be declared statically in CDK.
- In other words, current CDK implementation only caters to static use-case where we know either `bucket` or `bucketName` (from which we can create `IBucket`) and pass it to `IItemReader`.
- Whereas via AWS Console, if we create `DistributedMap` manually, then we can also convey S3 source dynamically using State Input / JsonPath.
- In other words, for dynamic use-case, we will neither have `bucket` nor `bucketName` i.e. we only know state input variable which will convey `bucketName` e.g. `$.bucketName`.
- So, if we want to use `IItemReader` for dynamic use case also, then we will:
  - (1) need to make `bucket: IBucket` optional (which will be breaking - how? e.g. if some dev is currently accessing `bucket` via `reader.bucket` then dev now needs to add check for `undefined`)
  - (2) then add another optional fields to convey state input variable names (e.g. $.bucketName $.key $.prefix)
- Therefore, to avoid introducing breaking change, we can follow `*Path` convention of StepFunctions and introduce `IItemReaderPath` for dynamic use-case. (closes aws#29409)

**What changes are being made?**
- Add `IItemReaderPath` interface (and its pros interface)
- Add `S3ObjectsItemReaderPath` as one of many concrete classes of `IItemReaderPath` - this class also helps with unit-testing and integration-testing.
- Modify `index` to export new constructs
- Modify `DistributedMap` (and its props) to allow `itemReaderPath?` (which will be mutually exclusive with `itemReader?` and `itemsPath?`) and utilise it for render

**How are these changes tested?**
- Via new unit-tests for `DistributedMap` (via `yarn build+test`)
- Via new integration test for `S3ObjectsItemReaderPath` (with snapshot created)
  - Via `yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js && yarn integ-runner --update-on-failed`
  - Verified expected step function execution result during snapshot creation
@ChakshuGupta13
Copy link

ChakshuGupta13 commented Jul 29, 2024

FYI: We can still currently utilise dynamic parameters except bucket:

const myBucket = new Bucket(stack, 'MyBucket', ...);
const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
    itemReader: new sfn.S3ObjectsItemReader({
      bucket: myBucket,
      prefix: JsonPath.stringAt('$.prefix'),
    }),
  });
  distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));

This code snippet produces following cdk.out template:

...
"Parameters\":{\"Bucket\":\"",
       {
        "Ref": "MyBucket..."
       },
       "\",\"Prefix.$\":\"$.prefix\"}}}}}"
      ]
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-s3 Related to Amazon S3 effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants