-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement request: regex filters #432
Comments
@pkoppstein Agreed! It should be pretty simple to write jq-coded builtins to do this. |
To clarify:
|
So, I don't think that |
@wtlangford You can't modify strings no matter what, and yes, you can slice strings (with
EDIT: removed "g" from call to match (which was wrong anyways since I'd used a comma instead of a semi-colon). |
And:
|
|
Oh. I don't know where I got no slicing from. |
EDIT: fix cut-n-paste error. |
Please bang on these, let me know if they work for you! |
@nicowilliams wrote:
That description just covers the basic case. Here's a fuller description (taken from ruby docs) that also covers the case when the regex contains groups:
|
@pkoppstein Ah, OK. Can I leave this to you and split to you or @wtlangford ? |
Can you do better than this? It doesn't seem as elegant as it could be... My
|
@nicowilliams, @wtlangford -- I just noticed that "offset" is not being correctly set (according to my understanding) as illustrated by:
|
@nicowilliams - This is a slightly revised/simplified version of what you wrote that behaves as I'd expect:
Examples:
|
Apropos
I have a version ready for testing once |
@pkoppstein Try s/,/;/ :)
Yeah, the need to use semi-colons to separate arguments lists kills me too sometimes...
jq does the right thing with commas for arrays. It should have as well for arguments lists. It's too late to fix it. |
@pkoppstein Actually, my
I just hadn't tested that nor thought it through. Those empty strings aren't wrong either. They're quite correct I'd say. Any reason not to call |
Alright, I have implementations of all four utility functions requested. Plus I just need to write docs and tests and I'll push. |
@nicowilliams wrote:
Indeed. Thanks. In my defense (?), it was late :-( Here's my now-tested candidate for
Mainly because
|
p.s. Please don't forget split/0, e.g.
|
@pkoppstein @wtlangford Ooops, the new I'm not going to break compatibility. And thinking about it I think I'd like to have a split builtin that doesn't use regexps anyways: for performance reasons. I'm leaning towards renaming the RE-based split to something else. Suggestions? |
As to I'm rather bummed out about |
@nicowilliams wrote:
There must be a lesson here somewhere :-) I don't understand why you think that string#split/1 returning an array is a mistake -- that's what every other comparable split/1 does; also .[] is easy enough to read and write, and [ .... ] is often quite awkward to read. But I suppose that goes to show my jq enlightenment score is very low. However, if split/1 is to change, then one way out would be to take the view that the introduction of regex is a big enough leap to warrant a jump to the next "major version" (on the theory that "breaking backwards compatibility on minor versions is evil" but not otherwise). Would that require jumping to 2.0? Using the name "split" instead of nwise/take is fine by me. An alternative to consider would be group(n). |
@pkoppstein jq is a language where expressions generate results. like Python iterators with |
@nicwilliams -- Maybe I'd be more easily persuaded if jq had a stream-oriented version of For example, if we have a long pipeline, S, that produces a stream of JSON objects and wish to add these objects using +, we have either to use the form Here's an example motivated by "named captures":
Example:
|
By the way, I'd like to propose adding capture/1 as a builtin, so here's the "reduce S" version suitable for builtin.c:
Example: |
@pkoppstein I agree that we need better/more iterators. That's no excuse to sully other parts of jq. |
@nicowilliams - I'm not sure what might have elicited your remark about sullying jq. I was merely saying that, given jq as-it-is, the case for breaking backwards-compatibility in order to change string#split seems less than compelling to me. I hope you were not suggesting that I was proposing some kind of hack along the lines of " ... | It does, however, seem to me that the key to providing a syntactic alternative to "[S] | ...." is to introduce a new kind of "pipe". Let us therefore assume for the moment that we could use the symbol "|>" for a pipe that slurps its input. This would, for example, allow us to write:
instead of:
Would that work? |
@pkoppstein I meant that given support for generators, builtins should preferably generate than produce a single array output. It's easier to collect generated results than to iterate arrays -- in particular there's no way to process arrays in an online manner, but streams can be processed in an online manner. |
@nicowilliams - This set of extra builtins (#432) also seems to have fallen off you radar screen. I believe the only real sticking point was that the existing split/1 returns an array. You were contemplating changing the existing definition so that it would instead be a generator. In the interests of moving forward, I would propose one of the following alternatives:
|
@nicowilliams - One other question, and one concern, as I put all the pieces together: Are you ok with "nwise" by that name as a new public builtin? It could be just a helper subfunction, but it is generic enough. What about "cleave"? There are two considerations which make me concerned about the regex filters blindly producing streams rather than arrays. In a nutshell:
|
@pkoppstein Way, will you be sending a PR? If so I'll wait until you send it. As to the boundary problem:
As you can see, it's easy to work around. |
Here's what I had before, rebased to master: https://github.com/nicowilliams/jq/compare/scan?expand=1 Please review. |
That is, I've not done much more than rebase. |
@nicowillliams wrote:
Thanks. One of the things that I've done is ensure that sub and gsub handle named captures properly (i.e. in the "tostring"). Things are working nicely, e.g.
Yes, I know; that's why I said "So one typically has to wrap things up in an array anyway." Of course I understand that streams have their place. It's a Very Good Thing that jq can handle a stream of JSON objects, and I like generators (especially those that go from null to N and beyond). However, for the regex filters, I seem only to be able to see the downsides of the stream-oriented approach. After all, map/1 could have been implemented to produce a stream of mapped items, but it doesn't. In short, since jq is JSON-oriented, and given the ease and (so far as I can tell) the speed) of [...] and .[], I still don't see the great harm in string-to-array filters, but I'd be happy to be enlightened on this as on many other topics. And I apologize if I've failed to understand the significance of a point you've already made on this topic. |
@pkoppstein I'm unmoved on this. I made a mistake with Suppose Oniguruma gave us a streaming interface (maybe it does already; idk). Would it be possible to apply a regex to very large (streaming!) inputs and process the outputs in a streaming fashion? Wouldn't you want to? But if we make the interface always collect, we lose this ability. The principle is: stream first, we can always collect when you want an array. |
@pkoppstein Can you review the https://github.com/nicowilliams/jq/compare/scan?expand=1 |
Also, can you add examples for the manual? |
@pkoppstein Actually, I'm concerned about Something like:
This points out that This sort of thing has been bothering me about this commit. I'm thinking: leave out Also, |
@nicowilliams wrote:
They're all documented in the same commit in a long addition that starts at scan(regex). Perhaps you missed this addition because they're in the regex section? (One advantage of SourceTree is that you can't miss the changes.) I believe the additions are adequate -- I didn't think it was necessary (or desirable) to go into exhaustive detail, partly because that's not the style of manual.yml, but mainly because there's ample relevant documentation about sub and gsub (with captures) out there. I'll certainly take a look again at these filters in light of your comments, but they are the way they are because of the captures. Actually, I think to write all this in most other languages would be either very long or very convoluted. |
Oh, I guess I fell behind your branch. |
@nicowilliams wrote:
The relevant commit has no "take".
This has test cases, too (largely borrowed from the ones you had written). |
My |
Hi, @nicowilliams. I only mentioned I think the best resolution of the issue is, as you suggested, to make it private. That way, when a suitably brilliant implementation of nwise or take becomes available, it will be trivial to incorporate it into builtin. As I've said before, since you're the Decider and since you know what you want better than I do, it would probably be simplest at this point if you make any changes that you believe are appropriate. I am of course speaking of #522 On the other hand, if you want me to submit another PR, I'd need to know which changes you'd specifically like. Thanks. |
I pushed your commit too soon. There were problems with the docs. Also, these two tests fail: [.[] | scan(", ")] [.[] | split(", ")] I'll push a fix soon. |
BTW, please always run make check before submitting a PR. Of course, I should have as well. I always do, but slipped this time. |
@nicowilliams wrote:
Of course I always run "make" but, when I ran "make check" once I got this message:
So I've assumed that it's not worthwhile. Also, I don't have the ruby environment necessary to "make" the documents. For a previous commit, I believe I made it explicit that I was not vouching for anything other than the correctness of the changes to builtin.c but I should have been explicit on this occasion as well. Sorry about that. But thank you for installing the updates and attending to the imperfections. |
I want to validate the filename. If the file name contains the \ / : * ? " < > | I should get the result as false. I tried many things but unable to make it work. CVOuld you help me please on this. how I can validate? |
@guruprasannaps - One answer:
In short, the regex must be a JSON string. In future, please ask usage questions at stackoverflow.com with the jq tag: https://stackoverflow.com/questions/tagged/jq Thank you. |
jq now has basic regex support in the form of
match
andtest
(see #431 and #164) but I believe still lacks "builtin" support for the following:The text was updated successfully, but these errors were encountered: