Text Processing =============== Regex ----- https://docs.python.org/3/library/re.html#regular-expression-syntax find all instances of a single match - All adverbs .. code-block:: python import re s = "I am fully, and totally confident that" \ "programming and developing software is completely my thing" adverbs = re.findall(r"\w{2,}ly", s) print(adverbs) ``re.search()`` matches the first occurrence of a pattern in a string find the first pattern match from the beginning of the string, so the second email before is never found ?? for non-greedy matching .. code-block:: python import re s = "my work email is mk@plataux.com, and that is my work email, mk.mahfouz@gmail.com" m = re.search(r"(\w+)@(\w+\.\w+)", s) # the two matched groups in parentheses print("sub-groups: ", m.groups()) print("email: ", m.group()) print("user: ", m.group(1)) print("domain: ", m.group(2)) ``re.match()`` find the first pattern match from the beginning of the string Multi-group capturing can be used .. code-block:: python s = "781-521-4520 is my phone number" m = re.match(r"^(\d{3,4}[-\s]??){3}", s) print(m.group()) # We can also match a phone number in three groups m = re.match(r"^(\d{3})([-\s])(\d{3})\2(\d{4})", s) print(m.groups()) Use group capturing Grab keys and values from a single level JSON string using match groups .. code-block:: python import re s = """{ "a": "apples", "b": "berries","c": "cherries", "pi": 3.14, "x": "Xenon" }""" m = re.findall(r'\"(\w+)\":\s*(\"?)([\w\d.]+)\2', s) d = {mx[0]: mx[2] if mx[1] else float(mx[2]) for mx in m} print(d) Text Formatting --------------- Format Specification Mini-Language https://docs.python.org/3/library/string.html#formatstrings .. code-block:: format_spec ::= [[fill]align][sign][#][0][width][grouping_option][.precision][type] fill ::= align ::= "<" | ">" | "=" | "^" sign ::= "+" | "-" | " " width ::= digit+ grouping_option ::= "_" | "," precision ::= digit+ type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%" Text formatting can be done with several constructs * ``format(value, fmt_str)`` builtin function. It can only one value and format it * ``"{}, {}, {}".format(1, 2, 3)`` the ``str.format(*vars)`` function that can format multiple arguments * ``f"{x}, {y}, {z}"`` modern f-strings * ``"%d, %d, %d" % (2,4,8)`` old-style (C-style) format strings with modulo iterable parameters Floating Number Formatting ************************** .. code-block:: python print(f"{2345.491:_^-20,.2f}") # or print(format(2345.491,"_^-20,.2f")) This format spec breakdown * ``_`` underscore padded * ``^`` Centered within the given width * ``-`` means sign only appears for negative numbers * ``20`` the total width of the string * ``,`` the separator to be a comma * ``.2`` precision * ``f`` the type of the value .. code-block:: python # the speed of light in km/sec format(3 * 10**8, ".3E") This format spec breakdown * ``.3`` decimal places precision * ``E`` scientific notation with a capital E Integer Formatting ****************** .. code-block:: python format(1020, "b") format(1020, "X") * ``b`` for binary representation * ``X`` for Hexadecimal representation, with Capital Letters Datetime Formatting ******************* https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes .. code-block:: python import datetime as dtx from zoneinfo import ZoneInfo now = dtx.datetime.now() print(format(now, '%A')) print(f"{now:%A %d-%m-%Y}") now = dtx.datetime.now(tz=ZoneInfo("localtime")) print("right now it is weekday {0:%A} and " "in two days it will be weekday " "{1:%A} timezone {0:%Z}".format(now, now + dtx.timedelta(days=2)))