![]() We’ll use this technique to group the misspelled company names with their correct counterparts in the “golden” copy. In Tableau Prep, we’ll use the fuzzy grouping capability to group together similar values, even if they’re not exact matches. Remember that both the golden copy and the misspelled names are stacked into one column. Next, we need to perform a fuzzy algorithm on the list of company names. The union will combine the two lists of company names into one. Once we have imported both data sets, we can append them using a union step. Within Tableau Prep I start with two groups of data, a clean “golden copy” of company names and a list of company names that contain poor data quality (i.e., user-entered data, which may have misspelled or unstandardized company names). We want to join these two lists together using fuzzy matching (although technically we employ fuzzy grouping options to enable traditional matching). The other is a list of companies that may be misspelled or unstandardized. One is a master list of companies that we want to use as our reference, or “golden” copy. To get started, let’s say we have two lists of companies. Tableau Prep allows you to automatically group values together using fuzzy-match algorithms that find similar values. ![]() However, we can use a workaround to achieve a somewhat similar result. In Tableau Prep, unfortunately fuzzy matching is not a straightforward process like it is in Excel or other tools like Power BI or Alteryx. For example, if you have two data sets with company names, one may list a company as “Apple Inc.” while the other may list the same company as “Apple Incorporated.” Fuzzy matching would help you match these two records, even though the names are slightly different. It’s a technique used to match data when there are slight differences in how the data is presented (most likely as a result of bad data governance). In the video above, I walk you through a fuzzy match join that I recently performed using Tableau Prep, a data preparation tool, to reconcile these variations.įirst, I want to introduce you to the concept of fuzzy matching. One of the challenges I face is dealing with different variations of company names that may exist in different datasets. How I Passed the Tableau Certified Data Analyst ExamĪs a working data professional, one of my main responsibilities is to ensure the accuracy and consistency of the data that I work with.
0 Comments
Leave a Reply. |