valid numbers
Table of Contents
ValidNumber is an interview style question on leetcode. It's marked as hard with only a 19% acceptance rate. 19% is pretty low so it must be impossible to solve because of some weird corner case or bad instructions. It turns out the problem is much easier to solve than you would expect if you use the right approach.
1. Problem Statement
A valid number can be split up into these components (in order): A decimal number or an integer. (Optional) An 'e' or 'E', followed by an integer. A decimal number can be split up into these components (in order): (Optional) A sign character (either '+' or '-'). One of the following formats: One or more digits, followed by a dot '.'. One or more digits, followed by a dot '.', followed by one or more digits. A dot '.', followed by one or more digits. An integer can be split up into these components (in order): (Optional) A sign character (either '+' or '-'). One or more digits. For example, all the following are valid numbers: ["2", "0089", "-0.1", "+3.14", "4.", "-.9", "2e10", "-90E3", "3e+7", "+6e-1", "53.5e93", "-123.456e789"], while the following are not valid numbers: ["abc", "1a", "1e", "e3", "99e2.5", "--6", "-+3", "95a54e53"]. Given a string s, return true if s is a valid number.
2. Approach
The problem statement takes a few minutes to wrap your head around. It's easier to understand if you look at it as a context free grammar:
# Solution validNumber = decimal | integer | exponent # Non-terminals exponent = integer exp integer | decimal exp integer decimal = sign? (digit+ dot | digit+ dot digit+ | dot digit+) integer = sign? digit+ # Terminals digit = 0-9 sign = + | - exp = e | E dot = .
Regular expressions are a quick and easy way to implement this.
Another approach that would be even cleaner is to use parser combinators. Parser combinators add more safety to the solution because it would allow you to utilize a compiler to check if your grammar is valid. Check out the bonus section at the bottom for an example. Also, if your interested in grammars, check out Backus Naur Form. It's a formal way to define grammars.
3. Solution
digits = "\d+" dot = "\." sign = "[+-]" exp = "[eE]" integer = f"({sign}?{digits})" decimal_1 = f"{digits}{dot}" decimal_2 = f"{digits}{dot}{digits}" decimal_3 = f"{dot}{digits}" decimal = f"({sign}?({decimal_1}|{decimal_2}|{decimal_3}))" exponent_1 = f"{integer}{exp}{integer}" exponent_2 = f"{decimal}{exp}{integer}" exponent = f"({exponent_1}|{exponent_2})" valid_number_re = f"^{decimal}$|^{integer}$|^{exponent}$" def main(s): return bool(re.match(valid_number_re, s))
One of my favorite things about this solution is that it maps closely to the psuedocode for the grammar above which makes it easy to reason about. A couple of notes:
- Don't forget to wrap combinations of statements into groups using ().
- It's important to use ^ and $ to mark the beginning and end of a whole regex, otherwise re.match will return true for partial matches.
Related: If you like this approach you might also like instaparse, a cool clojure library for writing grammars.
4. Bonus: Janet Solution
Janet is a compiled lisp aimed at being a scripting language for systems programming. It has a built in feature for parsing grammars called PEG.
(def expression-peg ~{:main (choice :decimal :integer :exponent) :exponent (sequence (choice :integer :decimal) :exp :integer) :decimal1 (sequence :digits :dot) :decimal2 (sequence :digits :dot :digits) :decimal3 (sequence :dot :digits) :decimal (sequence (opt :sign) (choice :decimal1 :decimal2 :decimal3)) :integer (sequence (opt :sign) :digits) :digits (some :d) :dot "." :sign (set "+-") :exp (set "eE")}) (defn solution [input] (peg/match expression-peg input))