Clarity Python SDK



centercenterIntuli?? Clarity Python SDK8820090900Intuli?? Clarity Python SDKContents TOC \o "1-3" \h \z \u Clarity PAGEREF _Toc531764915 \h 4Getting Started PAGEREF _Toc531764916 \h 4Credentials PAGEREF _Toc531764917 \h 4StringFunctions Module PAGEREF _Toc531764918 \h 4after PAGEREF _Toc531764919 \h 4Summary: PAGEREF _Toc531764920 \h 4Parameters: PAGEREF _Toc531764921 \h 4Examples: PAGEREF _Toc531764922 \h 5before PAGEREF _Toc531764923 \h 5Summary: PAGEREF _Toc531764924 \h 5Parameters: PAGEREF _Toc531764925 \h 5Examples: PAGEREF _Toc531764926 \h 5best_match PAGEREF _Toc531764927 \h 5Summary: PAGEREF _Toc531764928 \h 5Parameters: PAGEREF _Toc531764929 \h 5Examples: PAGEREF _Toc531764930 \h 5clean_htmL PAGEREF _Toc531764931 \h 6Summary: PAGEREF _Toc531764932 \h 6Parameters: PAGEREF _Toc531764933 \h 6Examples: PAGEREF _Toc531764934 \h 6html_decode PAGEREF _Toc531764935 \h 6Summary: PAGEREF _Toc531764936 \h 6Parameters: PAGEREF _Toc531764937 \h 6Examples: PAGEREF _Toc531764938 \h 6html_encode PAGEREF _Toc531764939 \h 6Summary: PAGEREF _Toc531764940 \h 6Parameters: PAGEREF _Toc531764941 \h 6Examples: PAGEREF _Toc531764942 \h 6lev_distance PAGEREF _Toc531764943 \h 7Summary: PAGEREF _Toc531764944 \h 7Parameters: PAGEREF _Toc531764945 \h 7Example: PAGEREF _Toc531764946 \h 7md5 PAGEREF _Toc531764947 \h 7Summary: PAGEREF _Toc531764948 \h 7Parameters: PAGEREF _Toc531764949 \h 7Examples: PAGEREF _Toc531764950 \h 7regex_chunk PAGEREF _Toc531764951 \h 7Summary: PAGEREF _Toc531764952 \h 7Parameters: PAGEREF _Toc531764953 \h 7Examples: PAGEREF _Toc531764954 \h 8regex_count PAGEREF _Toc531764955 \h 8Summary: PAGEREF _Toc531764956 \h 8Parameters: PAGEREF _Toc531764957 \h 8Examples: PAGEREF _Toc531764958 \h 8regex_escape PAGEREF _Toc531764959 \h 8Summary: PAGEREF _Toc531764960 \h 8Parameters: PAGEREF _Toc531764961 \h 8Examples: PAGEREF _Toc531764962 \h 8regex_extract PAGEREF _Toc531764963 \h 9Summary: PAGEREF _Toc531764964 \h 9Parameters: PAGEREF _Toc531764965 \h 9Examples: PAGEREF _Toc531764966 \h 9regex_extract_all PAGEREF _Toc531764967 \h 9Summary: PAGEREF _Toc531764968 \h 9Parameters: PAGEREF _Toc531764969 \h 9Examples: PAGEREF _Toc531764970 \h 10regex_extract_each PAGEREF _Toc531764971 \h 10Summary: PAGEREF _Toc531764972 \h 10Parameters: PAGEREF _Toc531764973 \h 10Examples: PAGEREF _Toc531764974 \h 10regex_extract_int PAGEREF _Toc531764975 \h 10Summary: PAGEREF _Toc531764976 \h 10Parameters: PAGEREF _Toc531764977 \h 11Examples: PAGEREF _Toc531764978 \h 11regex_is_match PAGEREF _Toc531764979 \h 11Summary: PAGEREF _Toc531764980 \h 11Parameters: PAGEREF _Toc531764981 \h 11Examples: PAGEREF _Toc531764982 \h 11regex_replace PAGEREF _Toc531764983 \h 11Summary: PAGEREF _Toc531764984 \h 11Parameters: PAGEREF _Toc531764985 \h 11Examples: PAGEREF _Toc531764986 \h 12remove_duplicate_whitespace_chars PAGEREF _Toc531764987 \h 12Summary: PAGEREF _Toc531764988 \h 12Parameters: PAGEREF _Toc531764989 \h 12Examples: PAGEREF _Toc531764990 \h 12remove_extra_whitespace PAGEREF _Toc531764991 \h 12Summary: PAGEREF _Toc531764992 \h 12Parameters: PAGEREF _Toc531764993 \h 12Examples: PAGEREF _Toc531764994 \h 12remove_whitespace PAGEREF _Toc531764995 \h 12Summary: PAGEREF _Toc531764996 \h 12Parameters: PAGEREF _Toc531764997 \h 13Examples: PAGEREF _Toc531764998 \h 13sha PAGEREF _Toc531764999 \h 13Summary: PAGEREF _Toc531765000 \h 13Parameters: PAGEREF _Toc531765001 \h 13Examples: PAGEREF _Toc531765002 \h 13NameCleaning Module PAGEREF _Toc531765003 \h 13Description PAGEREF _Toc531765004 \h 13clean_name PAGEREF _Toc531765005 \h 13Summary: PAGEREF _Toc531765006 \h 13Parameters: PAGEREF _Toc531765007 \h 13Examples: PAGEREF _Toc531765008 \h 14Objects PAGEREF _Toc531765009 \h 14NameEntity PAGEREF _Toc531765010 \h 14Description: PAGEREF _Toc531765011 \h 14Properties: PAGEREF _Toc531765012 \h 14ClarityClarity is a serverless, cloud hosted collection of data cleaning and extraction tools. Along with the developer portal documentation, we also have Clarity SDK documentation for Python, .NET, and JavaScript. These are used for easy API calls, and better integration with your internal systems.Getting StartedThe Python SDK is hosted on PyPI as Clarity-SDK. To start using Clarity you must download the package using the CLI command pip install Clarity-SDK. The SDK is composed of a StringFunctions Module and NameCleaning Module.CredentialsSummary:This global API_KEY string value must be set to your registered API key. This will need to be done any time the Python Clarity SDK is imported.Example:# Import API and set API keyimport clarity_sdkclarity_sdk.API_KEY = "Your API key here"# Access String Functions ModuleFrom clarity_sdk import StringFunctions# Access Name Cleaning ModuleFrom clarity_sdk import NameCleaningStringFunctions Moduleafter Summary:The After function will return all text in a string which appears after the supplied regular expression pattern.Parameters:source_string – The string to extract from.regex_pattern – The pattern to use as the position for the beginning of the new string.Optional: return_all_on_no_match – If true, then when the pattern does not match the string, the entire string will be returned. By default, nothing is returned on a no match.Examples:StringFunctions.after("This is an example", "\san\s")StringFunctions.after("This is an example", "\san\s", True)beforeSummary:The Before function will return all text in a string which appears before the supplied regular expression pattern.Parameters:source_string – The string to extract from.regex_pattern – The pattern to use as the position for the end of the new string.Optional: return_all_on_no_match – If true, then when the pattern does not match the string, the entire string will be returned. By default, nothing is returned on a no match.Examples:StringFunctions.before("This is an example", "\san\s")StringFunctions.before("This is an example", "\san\s", True)best_match Summary:The best_match function will return the string from a list of strings which most closely matches the match string, using a Levenstein distance calculation to determine the similarity of each string.Parameters:match_string – The string to extract from.string_list – The list of strings to compare to the match_string.Examples:StringFunctions.best_match("I like pizza.", ["I like pizzas.", "I like ice cream."])clean_htmLSummary:The clean_html function will return a string which has all HTML encodings and tags removed. This can be used to convert an HTML document into a plain text document which will show only the display text and will attempt to preserve spacing. Special characters encoded in the HTML (Ex. &gt;, &amp;, &#146;, etc.) will be converted to their Unicode equivalents.Parameters:html_data – The HTML formatted string to convert.Examples:StringFunctions.clean_html(‘This removes <sup>superscripts<sup> &amp; other HTML content’)html_decodeSummary:The html_decode function will return a string which replaces all HTML special characters with their Unicode equivalents. Parameters:Text – The string to decode.Examples:StringFunctions.html_decode(“2 &gt; 2 = 4”)html_encodeSummary:The html_encode function will return a string which replaces all reserved HTML characters and extended characters with their HTML Encoded values (Ex. > is converted to &gt;)Parameters:text – The string to encode.Examples:StringFunctions.html_encode(“2 > 1”)lev_distanceSummary:The lev_distance function will return the Levenshtein distance between two strings. If strings are over 1000 characters, only the first 1000 characters will be evaluated.Parameters:source_string – The first comparison stringcompare_string – The second comparison stringExample:StringFunctions.lev_distance(“The car is green.”,”The car is red.”)md5 Summary:The md5 function returns a string which is an MD5 hash of the supplied string. This can be used to create a unique hash to represent a string value as it is very rare that two, non-identical strings would produce the same MD5 hash.Parameters:source_string – The string to be hashed.Examples:StringFunctions.md5(“An example article title”) regex_chunkSummary:The regex_chunk function returns an array of strings created by splitting the source string into chunks defined by alternating the matching regular expression pattern and the text between the matches.Parameters:source_string – The string to be broken into chunks.regex_pattern – The regular expression pattern used to identify the break positions of the chunks.Optional: case_sensitive – If true, the pattern will be case sensitive.Optional: remove_empty – If true, any empty array values will be removed.Examples:StringFunctions.regex_chunk("This will return all words, separated by space", "\\s")StringFunctions.regex_chunk("This will return all words, separated by space", "\\s", True)StringFunctions.regex_chunk("This will return all words, separated by space", "\\s", True, False)regex_countSummary:The regex_count function returns an integer indicating the number of times the regular expression matches the source string.Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.Optional: case_sensitive – If true, the pattern will be case sensitive.Examples:StringFunctions.regex_count("This will return the count of all regex matches", " \\w+")StringFunctions.regex_count("This will return the count of all regex matches", " \\w+", True)regex_escapeSummary:The regex_escape function returns a string which escapes any special characters for a regular expression pattern. The result is a regular expression which will exactly match the supplied string and will not create any errors.Parameters:source_string – The regular expression pattern to be escaped.Examples:StringFunctions.regex_escape("This will escape any regex characters. Like ? and \\”)regex_extractSummary:The regex_extract function returns a string containing the capture of the first instance in the source string which matches the regular expression pattern. If no capture groups are defined in the pattern, then the entire pattern is returned as capture group one.Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.Optional: output_string – If specified the output will be the string and will treat \<number> as back references to capture groups. (Ex. “Test \1” would output “Test “, followed by whatever was captured by the first capture group of the pattern.Optional case_sensitive – If true, the pattern will be case sensitive.Optional reverse_search – If true, the string will be searched for matches starting at the end of the string and working backward.Optional return_nothing_on_pattern_fail – If true Nothing will be returned if there is no match, by default an empty string is returned on no match.Examples:StringFunctions.regex_extract("This will return the captured string", "this(.*?)string") StringFunctions.regex_extract("This will return the captured string", "this(.*?)string", "\\1")StringFunctions.regex_extract("This will return the captured string", "this(.*?)string"," \\1", False)StringFunctions.regex_extract("This will return the captured string", "this(.*?)string", "\\1", False, True)StringFunctions.regex_extract("This will return the captured string", "this(.*?)string", "\\1", False, True, True)regex_extract_allSummary:The regex_extract_all function returns an array of strings containing the capture of each instance in the source string which matches the regular expression pattern. Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.Optional: case_sensitive – If true, the pattern will be case sensitive.Examples:StringFunctions.regex_extract_all("This will return all regex matches", "(\\w+)") StringFunctions.regex_extract_all("This will return all regex matches", "(\\w+)", True)regex_extract_eachSummary:The regex_extract_each function returns an array of strings containing the capture of each capture group from the first match in the source string. Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.Optional: case_sensitive – If true, the pattern will be case sensitive.Optional: reverse_search – If true, the string will be searched for matches starting at the end of the string and working backward.Optional return_nothing_on_pattern_fail – If true Nothing will be returned if there is no match, by default an empty string is returned on no match.Examples:StringFunctions.regex_extract_each("Position 1 here", "Position\\s(\\d)\\s+(\\w+)")StringFunctions.regex_extract_each ("Position 1 here", "Position\\s(\\d)\\s+(\\w+)", True)StringFunctions.regex_extract_each("Position 1 here", "Position\\s(\\d)\\s+(\\w+)", True, False)StringFunctions.regex_extract_each("Position 1 here", "Position\\s(\\d)\\s+(\\w+)", True, False, False)regex_extract_intSummary:The regex_extract_int function returns an integer containing the capture of the first instance in the source string which matches the regular expression pattern. If the captured data is not numeric then a -1 is returned. If no capture groups are defined in the pattern, then the entire pattern is returned as capture group one.Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.OPTIONAL: case_sensitive – If true, the pattern will be case sensitive.Examples:StringFunctions.regex_extract_int("This will return the integer(1) match", " \\((\\d+)\\)")StringFunctions.regex_extract_int("This will return the integer(1) match", " \\((\\d+)\\)", False)regex_is_matchSummary:The regex_is_match function returns a Boolean indicating if the regular expression pattern matches any part of the source string.Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.OPTIONAL: case_sensitive – If true, the pattern will be case sensitive.Examples:StringFunctions.regex_is_match("This will return true if there is a match", "true") StringFunctions.regex_is_match("This will return true if there is a match", "true", False)regex_replaceSummary:The regex_replace function returns a string with all matches of the regex_pattern replaced with the replace_string. The replacement string supports backslash delimited backreferences, so you can include capture group 1 by including \1 in the string.Parameters:source_string – The string to be matched.regex_pattern – The regular expression pattern to match.replace_string – The string representing what to replace each pattern match with.Optional: case_sensitive – If true, the pattern will be case sensitive.Examples:StringFunctions.regex_replace(“This will replace this true, with false”, “true”, “false”)StringFunctions.regex_replace(“This will replace this true, with false”, “true”, “false”, True)remove_diacriticsSummary:The RemoveDiacritics function removes characters with diacritics and replaces them with the closest ASCII character(s)Parameters:Text – The text from which to remove diacritics.Examples:StringFunctions.remove_diacritics("W?é?eeňe\\eee?!")remove_duplicate_whitespace_chars Summary:The remove_duplicate_whitespace_chars function returns a string where the whitespace has been cleaned up to remove duplicate spacing. Newlines, tabs, and spaces are preserved, but are only included once per chunk of whitespace. The result is that large spaces are removed and replaced with single whitespace characters.Parameters:text – The string to be cleaned.Examples:StringFunctions.remove_duplicate_whitespace_chars(“This will remove extra spacing”)remove_extra_whitespace Summary:The remove_extra_whitespace function returns a string where all whitespace areas are replaced with single spaces. This removes any whitespace gaps, newlines, and tabs from the string while preserving a single space for each whitespace area.Parameters:text – The string to be cleaned.Examples:StringFunctions.remove_extra_whitespace(“This will remove extra spacing”)remove_whitespaceSummary:The remove_whitespace function returns a string where all whitespace areas are removed.Parameters:text – The string to be cleaned.Examples:StringFunctions.remove_whitespace(“This removes all whitespace extra spacing”) sha Summary:The sha function returns a string which is a SHA512 hash of the input string. This can be used as a unique identifier for a string because two matching strings are extremely unlikely to have matching SHA hash valuesParameters:string_to_Convert – The string to hash.Examples:StringFunctions.sha(“An example article title”) NameCleaning ModuleDescriptionThe NameCleaning module contains functionality for breaking names apart into their respective parts. The module is composed of the clean_name Function and the NameEntity object which are used for extracting and storing data pertaining to names.clean_name Summary:The clean_name function takes a string containing a person’s name and converts it into a NameEntity object which allows the name parts to be accessed by individual properties. It identifies the parts of the name, even when they are in different formats.Parameters:name – A string containing the full name to be cleaned.optional: exist_prefixes – If false, then it is assumed there are no prefixes (Ex. Dr, Mr, Mrs, etc.)optional: exist_suffixes – If false, then it is assumed there are no suffixes (Ex. Jr, Sr, III, etc.)optional: exist_middle_name – If False, then it is assumed there are no middles names, only compound first names or compound last names.Examples:NameCleaning.clean_name("Dr. Alfred James Von Schmidt III") NameCleaning.clean_name("Dr. Alfred James Von Schmidt III", True) NameCleaning.clean_name("Dr. Alfred James Von Schmidt III", True, False) NameCleaning.clean_name("Dr. Alfred James Von Schmidt III", True, False, True) print(NameCleaning.clean_name("Dr. Alfred James Von Schmidt III").first_name) ObjectsNameEntityDescription:The NameEntity object is a strictly property-based object which holds data specific to names parsed with the clean_name function in the NameCleaning module.Properties: first_name – Returns first name of inputted name first_initial - Returns first initial of inputted name last_name - Returns last name of inputted name last_initial - Returns last I initial of inputted name middle_name - Returns middle name of inputted name middle_initial - Returns middle initial of inputted name name_string - Returns full name string prefix – Returns name prefix suffix – Returns name suffix ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download