regular expression example
We have many sql alike code to patch the system. The purpose is to clean the code to do two things: 1) to find what variables are used; 2) to find the value used in the code. That is, transfer to key-value form so that it can be easily analyzed in python dataFrame.
Some examples are:
The input string is s = 'segment == 5 && age > 30', the output will be (segment, == 5) and (age, > 30)
The input string is s = 'imatches("^(US|)\(", ctry_cd) && (age >= 10)', the output will be (ctry_cd, ^(US|)\)) and (age, >= 10)
Let's start from an example:
cond = ['$seg == 5 ', ' imatches("^228$",$gls) ', ' $ipage > 30 ', ' $fpage < 2 ', ' $dayob > 200 ', 
        ' $ccage > 30 ', ' $val > 99 ', ' $day7GLRepeatCount > 0 ', ' $opvCustomerGCOrderCount90Days == 0 ', 
        ' imatches("US",$ictry_cd) ', ' imatches("US",$bctry_cd) ', ' imatches("^(US|)$",$cctry_cd) ', ' $sqPercentile > 0.7']
def replaceMultiple(s, rep = {'(':'', ')':''}):
    '''
    re multile pattern in s
    '''
    rep = dict((re.escape(k), v) for k, v in rep.items())
    pattern = re.compile("|".join(rep.keys()))
    #print(pattern)
    return pattern.sub(lambda m: rep[re.escape(m.group(0))], s)
def getVarFromCondition2(string):
    '''
    return the variables and its value in the rule
    # rep1 -- lookforward anything followed by pattern ==,<=,>=,<,> but pattern not included in output, 也就是pattern ==,<=,>=,<,> 的左边
    # rep2 -- lookbehind anything before pattern ==,<=,>=,<,> but patten not included in output, 也就是 pattern ==,<=,>=,<,> 的右边
    '''
    res = []
    rep1 = r'.*(?===)|.*(?=<=)|.*(?=>=)|.*(?=<)|.*(?=>)' 
    rep2 = r'(?<===).*|(?<=<=).*|(?<=>=).*|(?<=<).*|(?<=>).*'  
    rep3 = r'==.+ | <.+ | >.* | <=.+ | >=.+'
    #rep1 = r'[^ -][^==]*$'
    for s in string:
        # remove blanks
        s = "".join(s.split())
        print(s)
        if 'match' in s.lower():
            res.append(replaceMultiple(re.findall(r'\(.+\)', s)[0]).split(',')[::-1])
        else:
            res.append((re.findall(rep1, s)[0], re.findall(rep2, s)[0]))
    return res
getVarFromCondition2(cond)
    Out[38]:
    [('$seg', '5'),
     ['$gls', '"^228$"'],
     ('$ipage', '30'),
     ('$fpage', '2'),
     ('$dayob', '200'),
     ('$ccage', '30'),
     ('$val', '99'),
     ('$day7GLRepeatCount', '0'),
     ('$opvCustomerGCOrderCount90Days', '0'),
     ['$ictry_cd', '"US"'],
     ['$bctry_cd', '"US"'],
     ['$cctry_cd', '"^US|$"'],
     ('$sqPercentile', '0.7')]
Reference
- How to replace multiple substrings of a string?
- Regex get text before and after a hyphen
- How to strip all whitespace from string
- https://www.regular-expressions.info/lookaround.html