regex - How to do a complex negative lookbehind to split tokens in Java? -



regex - How to do a complex negative lookbehind to split tokens in Java? -

i have number of lines in edifact format need tokenized on +. however, according edifact spec, characters can escaped ?. example: ?? ?, ?+ +, ?: :. ?+ part of field , should hence not considered delimiter.

i used negative lookbehind deal +'s followed ?:

delimiter = "\\+"; string[] tokens = data.split("(?<!\\?)" + delimiter);

this split up

a+b+c a, b , c

a?+b+c a?+b , c

however, fails when ?? escape sequence involved:

a??+b+c yields 2 tokens: a??+b, c

whereas should 3 tokens: a?, b , c

on other hand: a???+b+c should yield 2 tokens: a???+b , c

is there way accomplish using negative lookbehind?

here's runnable test play around if wish.

import java.util.arrays; public class main { public static void main(string[] args) { asserttokens("a+b+c", "a", "b", "c"); asserttokens("a?+b+c", "a?+b", "c"); asserttokens("a??+b+c", "a??", "b", "c"); asserttokens("a???+b+c", "a???+b", "c"); } private static void asserttokens(string data, string... expectedtokens) { string delimiter = "\\+"; string[] tokens = data.split("(?<!\\?)" + delimiter); if(!arrays.deepequals(tokens, expectedtokens)) { throw new illegalstateexception("not equals " + data); } }

}

rather splitting, tokenization easier using matching. in case, split work you'd have utilize variable-length lookbehind java doesn't support.

try next regex:

(?:[^+:?]++|\?.)+

demo

(i've used possessive quantifier (++) purely optimization avoid useless backtracking)

if want match empty tokens (a++b yielding, a, empty string , b), regex gets more complicated:

(?:[^+:?\r\n]++|\?.)+|(?<=[+:]|^)(?=[+:]|$)

demo

which means

either match same above (i've added \r\n grouping newlines don't match) or empty string is: preceded token delimiter or start of line and followed token delimiter or end of line

i've added m alternative work, meaning ^ , $ match start , end of each line.

java regex tokenize regex-negation

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -