Text Processing
Regex
https://docs.python.org/3/library/re.html#regular-expression-syntax
find all instances of a single match - All adverbs
import re
s = "I am fully, and totally confident that" \
"programming and developing software is completely my thing"
adverbs = re.findall(r"\w{2,}ly", s)
print(adverbs)
re.search() matches the first occurrence of a pattern in a string
find the first pattern match from the beginning of the string, so the second email before is never found
?? for non-greedy matching
import re
s = "my work email is mk@plataux.com, and that is my work email, mk.mahfouz@gmail.com"
m = re.search(r"(\w+)@(\w+\.\w+)", s)
# the two matched groups in parentheses
print("sub-groups: ", m.groups())
print("email: ", m.group())
print("user: ", m.group(1))
print("domain: ", m.group(2))
re.match() find the first pattern match from the beginning of the string
Multi-group capturing can be used
s = "781-521-4520 is my phone number"
m = re.match(r"^(\d{3,4}[-\s]??){3}", s)
print(m.group())
# We can also match a phone number in three groups
m = re.match(r"^(\d{3})([-\s])(\d{3})\2(\d{4})", s)
print(m.groups())
Use group capturing Grab keys and values from a single level JSON string using match groups
import re
s = """{
"a": "apples",
"b": "berries","c": "cherries",
"pi": 3.14,
"x": "Xenon"
}"""
m = re.findall(r'\"(\w+)\":\s*(\"?)([\w\d.]+)\2', s)
d = {mx[0]: mx[2] if mx[1] else float(mx[2]) for mx in m}
print(d)
Text Formatting
Format Specification Mini-Language https://docs.python.org/3/library/string.html#formatstrings
format_spec ::= [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill ::= <any character>
align ::= "<" | ">" | "=" | "^"
sign ::= "+" | "-" | " "
width ::= digit+
grouping_option ::= "_" | ","
precision ::= digit+
type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
Text formatting can be done with several constructs
format(value, fmt_str)builtin function. It can only one value and format it"{}, {}, {}".format(1, 2, 3)thestr.format(*vars)function that can format multiple argumentsf"{x}, {y}, {z}"modern f-strings"%d, %d, %d" % (2,4,8)old-style (C-style) format strings with modulo iterable parameters
Floating Number Formatting
print(f"{2345.491:_^-20,.2f}")
# or
print(format(2345.491,"_^-20,.2f"))
This format spec breakdown
_underscore padded^Centered within the given width-means sign only appears for negative numbers20the total width of the string,the separator to be a comma.2precisionfthe type of the value
# the speed of light in km/sec
format(3 * 10**8, ".3E")
This format spec breakdown
.3decimal places precisionEscientific notation with a capital E
Integer Formatting
format(1020, "b")
format(1020, "X")
bfor binary representationXfor Hexadecimal representation, with Capital Letters
Datetime Formatting
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
import datetime as dtx
from zoneinfo import ZoneInfo
now = dtx.datetime.now()
print(format(now, '%A'))
print(f"{now:%A %d-%m-%Y}")
now = dtx.datetime.now(tz=ZoneInfo("localtime"))
print("right now it is weekday {0:%A} and "
"in two days it will be weekday "
"{1:%A} timezone {0:%Z}".format(now, now + dtx.timedelta(days=2)))