regular expression example
We have many sql alike code to patch the system. The purpose is to clean the code to do two things: 1) to find what variables are used; 2) to find the value used in the code. That is, transfer to key-value form so that it can be easily analyzed in python dataFrame.
Some examples are:
The input string is s = 'segment == 5 && age > 30', the output will be (segment, == 5) and (age, > 30)
The input string is s = 'imatches("^(US|)\(", ctry_cd) && (age >= 10)', the output will be (ctry_cd, ^(US|)\)) and (age, >= 10)
Let's start from an example:
cond = ['$seg == 5 ', ' imatches("^228$",$gls) ', ' $ipage > 30 ', ' $fpage < 2 ', ' $dayob > 200 ',
' $ccage > 30 ', ' $val > 99 ', ' $day7GLRepeatCount > 0 ', ' $opvCustomerGCOrderCount90Days == 0 ',
' imatches("US",$ictry_cd) ', ' imatches("US",$bctry_cd) ', ' imatches("^(US|)$",$cctry_cd) ', ' $sqPercentile > 0.7']
def replaceMultiple(s, rep = {'(':'', ')':''}):
'''
re multile pattern in s
'''
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile("|".join(rep.keys()))
#print(pattern)
return pattern.sub(lambda m: rep[re.escape(m.group(0))], s)
def getVarFromCondition2(string):
'''
return the variables and its value in the rule
# rep1 -- lookforward anything followed by pattern ==,<=,>=,<,> but pattern not included in output, 也就是pattern ==,<=,>=,<,> 的左边
# rep2 -- lookbehind anything before pattern ==,<=,>=,<,> but patten not included in output, 也就是 pattern ==,<=,>=,<,> 的右边
'''
res = []
rep1 = r'.*(?===)|.*(?=<=)|.*(?=>=)|.*(?=<)|.*(?=>)'
rep2 = r'(?<===).*|(?<=<=).*|(?<=>=).*|(?<=<).*|(?<=>).*'
rep3 = r'==.+ | <.+ | >.* | <=.+ | >=.+'
#rep1 = r'[^ -][^==]*$'
for s in string:
# remove blanks
s = "".join(s.split())
print(s)
if 'match' in s.lower():
res.append(replaceMultiple(re.findall(r'\(.+\)', s)[0]).split(',')[::-1])
else:
res.append((re.findall(rep1, s)[0], re.findall(rep2, s)[0]))
return res
getVarFromCondition2(cond)
Out[38]:
[('$seg', '5'),
['$gls', '"^228$"'],
('$ipage', '30'),
('$fpage', '2'),
('$dayob', '200'),
('$ccage', '30'),
('$val', '99'),
('$day7GLRepeatCount', '0'),
('$opvCustomerGCOrderCount90Days', '0'),
['$ictry_cd', '"US"'],
['$bctry_cd', '"US"'],
['$cctry_cd', '"^US|$"'],
('$sqPercentile', '0.7')]
Reference
- How to replace multiple substrings of a string?
- Regex get text before and after a hyphen
- How to strip all whitespace from string
- https://www.regular-expressions.info/lookaround.html