Python String Utils: Overview¶
This is a handy library to validate, manipulate and generate strings, which is:
Simple and “pythonic”
Fully documented and with examples!
100% code coverage!
Tested against all officially supported Python versions: 3.5, 3.6, 3.7, 3.8.
Fast (mostly based on compiled regex)
Free from external dependencies
PEP8 compliant
Installing¶
>>> pip install python-string-utils
Checking installed version¶
>>> import string_utils
>>> string_utils.__version__
>>> '1.0.0' # (if '1.0.0' is the installed version)
Library structure¶
The library basically consists in the python package string_utils, containing the following modules:
validation.py (contains string check api)
manipulation.py (contains string transformation api)
generation.py (contains string generation api)
errors.py (contains library-specific errors)
_regex.py (contains compiled regex FOR INTERNAL USAGE ONLY)
Plus a secondary package tests which includes several submodules. Specifically one for each test suite and named according to the api to test (eg. tests for is_ip() will be in test_is_ip.py and so on). All the public API are importable directly from the main package string_utils, so this:
>>> from string_utils.validation import is_ip
can be simplified as:
>>> from string_utils import is_ip
Modules¶
String Validation¶
-
string_utils.validation.
is_string
(obj: Any) → bool¶ Checks if an object is a string.
Example:
>>> is_string('foo') # returns true >>> is_string(b'foo') # returns false
- Parameters
obj – Object to test.
- Returns
True if string, false otherwise.
-
string_utils.validation.
is_full_string
(input_string: Any) → bool¶ Check if a string is not empty (it must contains at least one non space character).
Examples:
>>> is_full_string(None) # returns false >>> is_full_string('') # returns false >>> is_full_string(' ') # returns false >>> is_full_string('hello') # returns true
- Parameters
input_string (str) – String to check.
- Returns
True if not empty, false otherwise.
-
string_utils.validation.
is_number
(input_string: str) → bool¶ Checks if a string is a valid number.
The number can be a signed (eg: +1, -2, -3.3) or unsigned (eg: 1, 2, 3.3) integer or double or use the “scientific notation” (eg: 1e5).
Examples:
>>> is_number('42') # returns true >>> is_number('19.99') # returns true >>> is_number('-9.12') # returns true >>> is_number('1e3') # returns true >>> is_number('1 2 3') # returns false
- Parameters
input_string (str) – String to check
- Returns
True if the string represents a number, false otherwise
-
string_utils.validation.
is_integer
(input_string: str) → bool¶ Checks whether the given string represents an integer or not.
An integer may be signed or unsigned or use a “scientific notation”.
Examples:
>>> is_integer('42') # returns true >>> is_integer('42.0') # returns false
- Parameters
input_string (str) – String to check
- Returns
True if integer, false otherwise
-
string_utils.validation.
is_decimal
(input_string: str) → bool¶ Checks whether the given string represents a decimal or not.
A decimal may be signed or unsigned or use a “scientific notation”.
>>> is_decimal('42.0') # returns true >>> is_decimal('42') # returns false
- Parameters
input_string (str) – String to check
- Returns
True if integer, false otherwise
-
string_utils.validation.
is_url
(input_string: Any, allowed_schemes: Optional[List[str]] = None) → bool¶ Check if a string is a valid url.
Examples:
>>> is_url('http://www.mysite.com') # returns true >>> is_url('https://mysite.com') # returns true >>> is_url('.mysite.com') # returns false
- Parameters
input_string (str) – String to check.
allowed_schemes (Optional[List[str]]) – List of valid schemes (‘http’, ‘https’, ‘ftp’…). Default to None (any scheme is valid).
- Returns
True if url, false otherwise
-
string_utils.validation.
is_email
(input_string: Any) → bool¶ Check if a string is a valid email.
Reference: https://tools.ietf.org/html/rfc3696#section-3
Examples:
>>> is_email('my.email@the-provider.com') # returns true >>> is_email('@gmail.com') # returns false
- Parameters
input_string (str) – String to check.
- Returns
True if email, false otherwise.
-
string_utils.validation.
is_credit_card
(input_string: Any, card_type: str = None) → bool¶ Checks if a string is a valid credit card number. If card type is provided then it checks against that specific type only, otherwise any known credit card number will be accepted.
Supported card types are the following:
VISA
MASTERCARD
AMERICAN_EXPRESS
DINERS_CLUB
DISCOVER
JCB
- Parameters
input_string (str) – String to check.
card_type (str) – Card type. Default to None (any card).
- Returns
True if credit card, false otherwise.
-
string_utils.validation.
is_camel_case
(input_string: Any) → bool¶ Checks if a string is formatted as camel case.
A string is considered camel case when:
it’s composed only by letters ([a-zA-Z]) and optionally numbers ([0-9])
it contains both lowercase and uppercase letters
it does not start with a number
Examples:
>>> is_camel_case('MyString') # returns true >>> is_camel_case('mystring') # returns false
- Parameters
input_string (str) – String to test.
- Returns
True for a camel case string, false otherwise.
-
string_utils.validation.
is_snake_case
(input_string: Any, separator: str = '_') → bool¶ Checks if a string is formatted as “snake case”.
A string is considered snake case when:
it’s composed only by lowercase/uppercase letters and digits
it contains at least one underscore (or provided separator)
it does not start with a number
Examples:
>>> is_snake_case('foo_bar_baz') # returns true >>> is_snake_case('foo') # returns false
- Parameters
input_string (str) – String to test.
separator (str) – String to use as separator.
- Returns
True for a snake case string, false otherwise.
-
string_utils.validation.
is_json
(input_string: Any) → bool¶ Check if a string is a valid json.
Examples:
>>> is_json('{"name": "Peter"}') # returns true >>> is_json('[1, 2, 3]') # returns true >>> is_json('{nope}') # returns false
- Parameters
input_string (str) – String to check.
- Returns
True if json, false otherwise
-
string_utils.validation.
is_uuid
(input_string: Any, allow_hex: bool = False) → bool¶ Check if a string is a valid UUID.
Example:
>>> is_uuid('6f8aa2f9-686c-4ac3-8766-5712354a04cf') # returns true >>> is_uuid('6f8aa2f9686c4ac387665712354a04cf') # returns false >>> is_uuid('6f8aa2f9686c4ac387665712354a04cf', allow_hex=True) # returns true
- Parameters
input_string (str) – String to check.
allow_hex (bool) – True to allow UUID hex representation as valid, false otherwise (default)
- Returns
True if UUID, false otherwise
-
string_utils.validation.
is_ip_v4
(input_string: Any) → bool¶ Checks if a string is a valid ip v4.
Examples:
>>> is_ip_v4('255.200.100.75') # returns true >>> is_ip_v4('nope') # returns false (not an ip) >>> is_ip_v4('255.200.100.999') # returns false (999 is out of range)
- Parameters
input_string (str) – String to check.
- Returns
True if an ip v4, false otherwise.
-
string_utils.validation.
is_ip_v6
(input_string: Any) → bool¶ Checks if a string is a valid ip v6.
Examples:
>>> is_ip_v6('2001:db8:85a3:0000:0000:8a2e:370:7334') # returns true >>> is_ip_v6('2001:db8:85a3:0000:0000:8a2e:370:?') # returns false (invalid "?")
- Parameters
input_string (str) – String to check.
- Returns
True if a v6 ip, false otherwise.
-
string_utils.validation.
is_ip
(input_string: Any) → bool¶ Checks if a string is a valid ip (either v4 or v6).
Examples:
>>> is_ip('255.200.100.75') # returns true >>> is_ip('2001:db8:85a3:0000:0000:8a2e:370:7334') # returns true >>> is_ip('1.2.3') # returns false
- Parameters
input_string (str) – String to check.
- Returns
True if an ip, false otherwise.
-
string_utils.validation.
is_isbn_10
(input_string: str, normalize: bool = True) → bool¶ Checks if the given string represents a valid ISBN 10 (International Standard Book Number). By default hyphens in the string are ignored, so digits can be separated in different ways, by calling this function with normalize=False only digit-only strings will pass the validation.
Examples:
>>> is_isbn_10('1506715214') # returns true >>> is_isbn_10('150-6715214') # returns true >>> is_isbn_10('150-6715214', normalize=False) # returns false
- Parameters
input_string – String to check.
normalize – True to ignore hyphens (“-“) in the string (default), false otherwise.
- Returns
True if valid ISBN 10, false otherwise.
-
string_utils.validation.
is_isbn_13
(input_string: str, normalize: bool = True) → bool¶ Checks if the given string represents a valid ISBN 13 (International Standard Book Number). By default hyphens in the string are ignored, so digits can be separated in different ways, by calling this function with normalize=False only digit-only strings will pass the validation.
Examples:
>>> is_isbn_13('9780312498580') # returns true >>> is_isbn_13('978-0312498580') # returns true >>> is_isbn_13('978-0312498580', normalize=False) # returns false
- Parameters
input_string – String to check.
normalize – True to ignore hyphens (“-“) in the string (default), false otherwise.
- Returns
True if valid ISBN 13, false otherwise.
-
string_utils.validation.
is_isbn
(input_string: str, normalize: bool = True) → bool¶ Checks if the given string represents a valid ISBN (International Standard Book Number). By default hyphens in the string are ignored, so digits can be separated in different ways, by calling this function with normalize=False only digit-only strings will pass the validation.
Examples:
>>> is_isbn('9780312498580') # returns true >>> is_isbn('1506715214') # returns true
- Parameters
input_string – String to check.
normalize – True to ignore hyphens (“-“) in the string (default), false otherwise.
- Returns
True if valid ISBN (10 or 13), false otherwise.
-
string_utils.validation.
is_palindrome
(input_string: Any, ignore_spaces: bool = False, ignore_case: bool = False) → bool¶ Checks if the string is a palindrome (https://en.wikipedia.org/wiki/Palindrome).
Examples:
>>> is_palindrome('LOL') # returns true >>> is_palindrome('Lol') # returns false >>> is_palindrome('Lol', ignore_case=True) # returns true >>> is_palindrome('ROTFL') # returns false
- Parameters
input_string (str) – String to check.
ignore_spaces (bool) – False if white spaces matter (default), true otherwise.
ignore_case (bool) – False if char case matters (default), true otherwise.
- Returns
True if the string is a palindrome (like “otto”, or “i topi non avevano nipoti” if strict=False), False otherwise
-
string_utils.validation.
is_pangram
(input_string: Any) → bool¶ Checks if the string is a pangram (https://en.wikipedia.org/wiki/Pangram).
Examples:
>>> is_pangram('The quick brown fox jumps over the lazy dog') # returns true >>> is_pangram('hello world') # returns false
- Parameters
input_string (str) – String to check.
- Returns
True if the string is a pangram, False otherwise.
-
string_utils.validation.
is_isogram
(input_string: Any) → bool¶ Checks if the string is an isogram (https://en.wikipedia.org/wiki/Isogram).
Examples:
>>> is_isogram('dermatoglyphics') # returns true >>> is_isogram('hello') # returns false
- Parameters
input_string (str) – String to check.
- Returns
True if isogram, false otherwise.
-
string_utils.validation.
is_slug
(input_string: Any, separator: str = '-') → bool¶ Checks if a given string is a slug (as created by slugify()).
Examples:
>>> is_slug('my-blog-post-title') # returns true >>> is_slug('My blog post title') # returns false
- Parameters
input_string (str) – String to check.
separator (str) – Join sign used by the slug.
- Returns
True if slug, false otherwise.
-
string_utils.validation.
contains_html
(input_string: str) → bool¶ Checks if the given string contains HTML/XML tags.
By design, this function matches ANY type of tag, so don’t expect to use it as an HTML validator, its goal is to detect “malicious” or undesired tags in the text.
Examples:
>>> contains_html('my string is <strong>bold</strong>') # returns true >>> contains_html('my string is not bold') # returns false
- Parameters
input_string (str) – Text to check
- Returns
True if string contains html, false otherwise.
-
string_utils.validation.
words_count
(input_string: str) → int¶ Returns the number of words contained into the given string.
This method is smart, it does consider only sequence of one or more letter and/or numbers as “words”, so a string like this: “! @ # % … []” will return zero! Moreover it is aware of punctuation, so the count for a string like “one,two,three.stop” will be 4 not 1 (even if there are no spaces in the string).
Examples:
>>> words_count('hello world') # returns 2 >>> words_count('one,two,three.stop') # returns 4
- Parameters
input_string (str) – String to check.
- Returns
Number of words.
String Manipulation¶
-
string_utils.manipulation.
camel_case_to_snake
(input_string, separator='_')¶ Convert a camel case string into a snake case one. (The original string is returned if is not a valid camel case string)
Example:
>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'
- Parameters
input_string (str) – String to convert.
separator (str) – Sign to use as separator.
- Returns
Converted string.
-
string_utils.manipulation.
snake_case_to_camel
(input_string: str, upper_case_first: bool = True, separator: str = '_') → str¶ Convert a snake case string into a camel case one. (The original string is returned if is not a valid snake case string)
Example:
>>> snake_case_to_camel('the_snake_is_green') # returns 'TheSnakeIsGreen'
- Parameters
input_string (str) – String to convert.
upper_case_first (bool) – True to turn the first letter into uppercase (default).
separator (str) – Sign to use as separator (default to “_”).
- Returns
Converted string
-
string_utils.manipulation.
reverse
(input_string: str) → str¶ Returns the string with its chars reversed.
Example:
>>> reverse('hello') # returns 'olleh'
- Parameters
input_string (str) – String to revert.
- Returns
Reversed string.
-
string_utils.manipulation.
shuffle
(input_string: str) → str¶ Return a new string containing same chars of the given one but in a randomized order.
Example:
>>> shuffle('hello world') # possible output: 'l wodheorll'
- Parameters
input_string (str) – String to shuffle
- Returns
Shuffled string
-
string_utils.manipulation.
strip_html
(input_string: str, keep_tag_content: bool = False) → str¶ Remove html code contained into the given string.
Examples:
>>> strip_html('test: <a href="foo/bar">click here</a>') # returns 'test: ' >>> strip_html('test: <a href="foo/bar">click here</a>', keep_tag_content=True) # returns 'test: click here'
- Parameters
input_string (str) – String to manipulate.
keep_tag_content (bool) – True to preserve tag content, False to remove tag and its content too (default).
- Returns
String with html removed.
-
string_utils.manipulation.
prettify
(input_string: str) → str¶ Reformat a string by applying the following basic grammar and formatting rules:
String cannot start or end with spaces
The first letter in the string and the ones after a dot, an exclamation or a question mark must be uppercase
String cannot have multiple sequential spaces, empty lines or punctuation (except for “?”, “!” and “.”)
Arithmetic operators (+, -, /, *, =) must have one, and only one space before and after themselves
One, and only one space should follow a dot, a comma, an exclamation or a question mark
Text inside double quotes cannot start or end with spaces, but one, and only one space must come first and after quotes (foo” bar”baz -> foo “bar” baz)
Text inside round brackets cannot start or end with spaces, but one, and only one space must come first and after brackets (“foo(bar )baz” -> “foo (bar) baz”)
Percentage sign (“%”) cannot be preceded by a space if there is a number before (“100 %” -> “100%”)
Saxon genitive is correct (“Dave’ s dog” -> “Dave’s dog”)
Examples:
>>> prettify(' unprettified string ,, like this one,will be"prettified" .it\' s awesome! ') >>> # -> 'Unprettified string, like this one, will be "prettified". It's awesome!'
- Parameters
input_string – String to manipulate
- Returns
Prettified string.
-
string_utils.manipulation.
asciify
(input_string: str) → str¶ Force string content to be ascii-only by translating all non-ascii chars into the closest possible representation (eg: ó -> o, Ë -> E, ç -> c…).
Bear in mind: Some chars may be lost if impossible to translate.
Example:
>>> asciify('èéùúòóäåëýñÅÀÁÇÌÍÑÓË') # returns 'eeuuooaaeynAAACIINOE'
- Parameters
input_string – String to convert
- Returns
Ascii utf-8 string
-
string_utils.manipulation.
slugify
(input_string: str, separator: str = '-') → str¶ Converts a string into a “slug” using provided separator. The returned string has the following properties:
it has no spaces
all letters are in lower case
all punctuation signs and non alphanumeric chars are removed
words are divided using provided separator
all chars are encoded as ascii (by using asciify())
is safe for URL
Examples:
>>> slugify('Top 10 Reasons To Love Dogs!!!') # returns: 'top-10-reasons-to-love-dogs' >>> slugify('Mönstér Mägnët') # returns 'monster-magnet'
- Parameters
input_string (str) – String to convert.
separator (str) – Sign used to join string tokens (default to “-“).
- Returns
Slug string
-
string_utils.manipulation.
booleanize
(input_string: str) → bool¶ Turns a string into a boolean based on its content (CASE INSENSITIVE).
A positive boolean (True) is returned if the string value is one of the following:
“true”
“1”
“yes”
“y”
Otherwise False is returned.
Examples:
>>> booleanize('true') # returns True >>> booleanize('YES') # returns True >>> booleanize('nope') # returns False
- Parameters
input_string (str) – String to convert
- Returns
True if the string contains a boolean-like positive value, false otherwise
-
string_utils.manipulation.
strip_margin
(input_string: str) → str¶ Removes tab indentation from multi line strings (inspired by analogous Scala function).
Example:
>>> strip_margin(''' >>> line 1 >>> line 2 >>> line 3 >>> ''') >>> # returns: >>> ''' >>> line 1 >>> line 2 >>> line 3 >>> '''
- Parameters
input_string (str) – String to format
- Returns
A string without left margins
-
string_utils.manipulation.
compress
(input_string: str, encoding: str = 'utf-8', compression_level: int = 9) → str¶ Compress the given string by returning a shorter one that can be safely used in any context (like URL) and restored back to its original state using decompress().
Bear in mind: Besides the provided compression_level, the compression result (how much the string is actually compressed by resulting into a shorter string) depends on 2 factors:
The amount of data (string size): short strings might not provide a significant compression result or even be longer than the given input string (this is due to the fact that some bytes have to be embedded into the compressed string in order to be able to restore it later on)
The content type: random sequences of chars are very unlikely to be successfully compressed, while the best compression result is obtained when the string contains several recurring char sequences (like in the example).
Behind the scenes this method makes use of the standard Python’s zlib and base64 libraries.
Examples:
>>> n = 0 # <- ignore this, it's a fix for Pycharm (not fixable using ignore comments) >>> # "original" will be a string with 169 chars: >>> original = ' '.join(['word n{}'.format(n) for n in range(20)]) >>> # "compressed" will be a string of 88 chars >>> compressed = compress(original)
- Parameters
input_string (str) – String to compress (must be not empty or a ValueError will be raised).
encoding (str) – String encoding (default to “utf-8”).
compression_level (int) – A value between 0 (no compression) and 9 (best compression), default to 9.
- Returns
Compressed string.
-
string_utils.manipulation.
decompress
(input_string: str, encoding: str = 'utf-8') → str¶ Restore a previously compressed string (obtained using compress()) back to its original state.
- Parameters
input_string (str) – String to restore.
encoding (str) – Original string encoding.
- Returns
Decompressed string.
-
string_utils.manipulation.
roman_encode
(input_number: Union[str, int]) → str¶ Convert the given number/string into a roman number.
The passed input must represents a positive integer in the range 1-3999 (inclusive).
Why this limit? You may be wondering:
zero is forbidden since there is no related representation in roman numbers
the upper bound 3999 is due to the limitation in the ascii charset (the higher quantity sign displayable in ascii is “M” which is equal to 1000, therefore based on roman numbers rules we can use 3 times M to reach 3000 but we can’t go any further in thousands without special “boxed chars”).
Examples:
>>> roman_encode(37) # returns 'XXXVIII' >>> roman_encode('2020') # returns 'MMXX'
- Parameters
input_number (Union[str, int]) – An integer or a string to be converted.
- Returns
Roman number string.
-
string_utils.manipulation.
roman_decode
(input_string: str) → int¶ Decode a roman number string into an integer if the provided string is valid.
Example:
>>> roman_decode('VII') # returns 7
- Parameters
input_string (str) – (Assumed) Roman number
- Returns
Integer value
String Generation¶
-
string_utils.generation.
uuid
(as_hex: bool = False) → str¶ Generated an UUID string (using uuid.uuid4()).
Examples:
>>> uuid() # possible output: '97e3a716-6b33-4ab9-9bb1-8128cb24d76b' >>> uuid(as_hex=True) # possible output: '97e3a7166b334ab99bb18128cb24d76b'
- Parameters
as_hex – True to return the hex value of the UUID, False to get its default representation (default).
- Returns
uuid string.
-
string_utils.generation.
random_string
(size: int) → str¶ Returns a string of the specified size containing random characters (uppercase/lowercase ascii letters and digits).
Example:
>>> random_string(9) # possible output: "cx3QQbzYg"
- Parameters
size (int) – Desired string size
- Returns
Random string
-
string_utils.generation.
secure_random_hex
(byte_count: int) → str¶ Generates a random string using secure low level random generator (os.urandom).
Bear in mind: due to hex conversion, the returned string will have a size that is exactly the double of the given byte_count.
Example:
>>> secure_random_hex(9) # possible output: 'aac4cf1d1d87bd5036'
- Parameters
byte_count (int) – Number of random bytes to generate
- Returns
Hexadecimal string representation of generated random bytes
-
string_utils.generation.
roman_range
(stop: int, start: int = 1, step: int = 1) → Generator¶ Similarly to native Python’s range(), returns a Generator object which generates a new roman number on each iteration instead of an integer.
Example:
>>> for n in roman_range(7): print(n) >>> # prints: I, II, III, IV, V, VI, VII >>> for n in roman_range(start=7, stop=1, step=-1): print(n) >>> # prints: VII, VI, V, IV, III, II, I
- Parameters
stop – Number at which the generation must stop (must be <= 3999).
start – Number at which the generation must start (must be >= 1).
step – Increment of each generation step (default to 1).
- Returns
Generator of roman numbers.