# regular expression example

We have many sql alike code to patch the system. The purpose is to clean the code to do two things: 1) to find what variables are used; 2) to find the value used in the code. That is, transfer to key-value form so that it can be easily analyzed in python dataFrame.

Some examples are:

The input string is s = 'segment == 5 && age > 30', the output will be (segment, == 5) and (age, > 30)

The input string is s = 'imatches("^(US|)$", ctry_cd) && (age >= 10)', the output will be (ctry_cd, ^(US|)$) and (age, >= 10)

Let's start from an example:

cond = ['$seg == 5 ', ' imatches("^228$",$gls) ', '$ipage > 30 ', ' $fpage < 2 ', '$dayob > 200 ',
' $ccage > 30 ', '$val > 99 ', ' $day7GLRepeatCount > 0 ', '$opvCustomerGCOrderCount90Days == 0 ',
' imatches("US",$ictry_cd) ', ' imatches("US",$bctry_cd) ', ' imatches("^(US|)$",$cctry_cd) ', ' $sqPercentile > 0.7'] def replaceMultiple(s, rep = {'(':'', ')':''}): ''' re multile pattern in s ''' rep = dict((re.escape(k), v) for k, v in rep.items()) pattern = re.compile("|".join(rep.keys())) #print(pattern) return pattern.sub(lambda m: rep[re.escape(m.group(0))], s) def getVarFromCondition2(string): ''' return the variables and its value in the rule # rep1 -- lookforward anything followed by pattern ==,<=,>=,<,> but pattern not included in output, 也就是pattern ==,<=,>=,<,> 的左边 # rep2 -- lookbehind anything before pattern ==,<=,>=,<,> but patten not included in output, 也就是 pattern ==,<=,>=,<,> 的右边 ''' res = [] rep1 = r'.*(?===)|.*(?=<=)|.*(?=>=)|.*(?=<)|.*(?=>)' rep2 = r'(?<===).*|(?<=<=).*|(?<=>=).*|(?<=<).*|(?<=>).*' rep3 = r'==.+ | <.+ | >.* | <=.+ | >=.+' #rep1 = r'[^ -][^==]*$'
for s in string:
# remove blanks
s = "".join(s.split())
print(s)
if 'match' in s.lower():
res.append(replaceMultiple(re.findall(r'$.+$', s)[0]).split(',')[::-1])
else:
res.append((re.findall(rep1, s)[0], re.findall(rep2, s)[0]))
return res

getVarFromCondition2(cond)

Out[38]:
[('$seg', '5'), ['$gls', '"^228$"'], ('$ipage', '30'),
('$fpage', '2'), ('$dayob', '200'),
('$ccage', '30'), ('$val', '99'),
('$day7GLRepeatCount', '0'), ('$opvCustomerGCOrderCount90Days', '0'),
['$ictry_cd', '"US"'], ['$bctry_cd', '"US"'],
['$cctry_cd', '"^US|$"'],
('\$sqPercentile', '0.7')]


## Reference

1. How to replace multiple substrings of a string?
2. Regex get text before and after a hyphen
3. How to strip all whitespace from string
4. https://www.regular-expressions.info/lookaround.html