{"id":12966,"date":"2020-12-31T09:00:00","date_gmt":"2020-12-31T08:00:00","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=12966"},"modified":"2022-01-05T20:03:25","modified_gmt":"2022-01-05T19:03:25","slug":"data-cleaning","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/","title":{"rendered":"8 Techniques for Efficient Data Cleaning"},"content":{"rendered":"\t\t\t\t<div class=\"wp-block-uagb-table-of-contents uagb-toc__align-left uagb-toc__columns-1  uagb-block-a982d31a      \"\n\t\t\t\t\tdata-scroll= \"1\"\n\t\t\t\t\tdata-offset= \"30\"\n\t\t\t\t\tstyle=\"\"\n\t\t\t\t>\n\t\t\t\t<div class=\"uagb-toc__wrap\">\n\t\t\t\t\t\t<div class=\"uagb-toc__title\">\n\t\t\t\t\t\t\tTable Of Contents\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"uagb-toc__list-wrap \">\n\t\t\t\t\t\t<ol class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#what-is-data-cleaning\" class=\"uagb-toc-link__trigger\">What is Data Cleaning?<\/a><li class=\"uagb-toc__list\"><a href=\"#is-data-cleaning-essential\" class=\"uagb-toc-link__trigger\">Is Data Cleaning Essential?<\/a><li class=\"uagb-toc__list\"><a href=\"#what-are-the-benefits-of-data-cleaning\" class=\"uagb-toc-link__trigger\">What are the Benefits of Data Cleaning?\u00a0<\/a><li class=\"uagb-toc__list\"><a href=\"#remove-unwanted-observations\" class=\"uagb-toc-link__trigger\">Remove Unwanted Observations<\/a><li class=\"uagb-toc__list\"><a href=\"#filter-unwanted-outliers\" class=\"uagb-toc-link__trigger\">Filter Unwanted Outliers<\/a><li class=\"uagb-toc__list\"><a href=\"#avoid-errors-like-typos\" class=\"uagb-toc-link__trigger\">Avoid Errors Like Typos<\/a><li class=\"uagb-toc__list\"><a href=\"#convert-numbers-stored-as-text-into-numbers\" class=\"uagb-toc-link__trigger\">Convert Numbers Stored as Text Into Numbers<\/a><li class=\"uagb-toc__list\"><a href=\"#deal-with-missing-values\" class=\"uagb-toc-link__trigger\">Deal with Missing Values<\/a><li class=\"uagb-toc__list\"><a href=\"#convert-data-types\" class=\"uagb-toc-link__trigger\">Convert Data Types<\/a><li class=\"uagb-toc__list\"><a href=\"#get-rid-of-extra-spaces\" class=\"uagb-toc-link__trigger\">Get Rid of Extra Spaces<\/a><li class=\"uagb-toc__list\"><a href=\"#delete-all-formatting\" class=\"uagb-toc-link__trigger\">Delete All Formatting\u00a0<\/a><li class=\"uagb-toc__list\"><a href=\"#data-cleaning-recap\" class=\"uagb-toc-link__trigger\">Data cleaning: recap<\/a><\/ol>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\r\n\r\n\r\n<p>Data is an essential part of <a class=\"ek-link\" href=\"https:\/\/www.codemotion.com\/magazine\/dev-hub\/big-data-analyst\/data-analyst-career\/\">data analytics<\/a>, <a class=\"ek-link\" href=\"https:\/\/www.codemotion.com\/magazine\/dev-hub\/blockchain-dev\/blockchain-data-security\/\">data security<\/a>, and <a class=\"ek-link\" href=\"https:\/\/www.codemotion.com\/magazine\/Glossary\/data-scientist\/\">data science<\/a>. That\u2019s obvious.\u00a0Sometimes, however, that <span id=\"urn:enhancement-181e6a99\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data\">data<\/span> can get a little dirty. No, not like in a gangster film. More like where suddenly we are having to deal with \u2018dirty data\u2019 after a hold up at a data centre. When there is a mistake in the spelling, arrangement, formatting, or construction which has made that data unclear. For these reasons, every so often you need to apply <strong>data cleaning<\/strong>.<\/p>\r\n\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"538\" class=\"wp-image-12976\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image6.jpg\" alt=\"\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image6.jpg 1000w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image6-300x161.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image6-768x413.jpg 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n<p>Data cleaning may seem like an alien concept to some. But actually, it\u2019s a vital part of data science. Using different techniques to clean data will help with the <strong>data analysis process<\/strong>. It also helps <a class=\"ek-link\" href=\"https:\/\/www.davincivirtual.com\/blog\/4723\/tips-to-improve-communication-within-your-remote-team\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">improve communication<\/a> with your teams and with end-users. As well as preventing any further IT issues along the line.<\/p>\r\n\r\n\r\n\r\n<p>Unfortunately, data cleaning can take up a huge chunk of time for data scientists. Yet, as having poor or wrong data can be detrimental to a task, it\u2019s an important thing to do. It\u2019s not all bad, though. High-quality data that has been cleaned can make your job so much easier.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>So, professionals must know techniques to perform it properly and efficiently. Then you can get on with other work. Like developing a <a class=\"ek-link\" href=\"https:\/\/www.ringcentral.com\/cloud-contact-center.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">cloud contact center<\/a>. Or making amazing AI.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Of course, <strong>different types of data require different types of cleaning<\/strong>. But there are general approaches that make a good starting point. Here are eight techniques for essential data cleaning.\u00a0<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-what-is-data-cleaning\">What is Data Cleaning?<\/h2>\r\n\r\n\r\n\r\n<p>Before we jump in, it\u2019s important to know what data cleaning actually is. It\u2019s the process of <strong><span id=\"urn:enhancement-43a8eed8\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/personally_identifiable_information\">identifying<\/span> and removing or fixing \u2018bad\u2019 data<\/strong>. This is usually inaccurate, unreliable, or unfinished data from databases or tables.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>The data then needs restoring, removing, or remodelling. Sometimes, if the data is dirty or crude, it needs removing completely.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>For example, say it is your job to handle the data on <a class=\"ek-link\" href=\"https:\/\/www.bigcommerce.com\/blog\/ecommerce-platforms\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">platforms for eCommerce<\/a> sites. If the data you put out is bad, this can create problems on the site that lead to a loss in profit and reputation. Such as the wrong items being advertised next to the wrong description.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Data cleaning can be done either interactively with data cleansing tools or as <span id=\"urn:enhancement-bd42fd64\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/batch_processing\">batch<\/span> processing through scripting. After it has been cleaned, the data needs to match up with other related <span id=\"urn:enhancement-e1d61b94\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">datasets<\/span> in operation.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-is-data-cleaning-essential\">Is Data Cleaning Essential?<\/h2>\r\n\r\n\r\n\r\n<p>Although it isn\u2019t spoken about as often as it should be, <strong>data cleaning is an essential part of a data scientist\u2019s job<\/strong>. Especially as more industries than ever are adopting some sort of cloud <span id=\"urn:enhancement-1aaba947\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_storage_device\">storage<\/span>. As the use of data storage grows, the more likely there is to be a problem.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>For example, say a company uses a <a class=\"ek-link\" href=\"https:\/\/www.ringcentral.com\/predictive-dialer.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">hosted predictive dialer<\/a> to contact clients. They will have a large volume of customer information stored as data. If the data stored is not clean &#8211; i.e., the wrong name is next to the wrong number &#8211; agents run the risk of making mistakes when contacting clients. Which can lead to a few disgruntled customers, to say the least.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>This means that as a professional in the IT industry, it\u2019s your job to make sure things run smoothly in this area. And a huge part of that involves data cleaning.\u00a0<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-what-are-the-benefits-of-data-cleaning\">What are the Benefits of Data Cleaning?\u00a0<\/h2>\r\n\r\n\r\n\r\n<p>As well as helping other companies, data cleaning makes your job as a data professional easier too. Whether you are working on <a class=\"ek-link\" href=\"https:\/\/www.codemotion.com\/magazine\/articles\/news\/using-deep-learning-to-control-the-unconsciousness-level-of-patients-in-an-anesthetic-state\/\">deep learning<\/a> or developing a site, these are just a few ways in which it will help you in your work:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong><span id=\"urn:enhancement-78c53858\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/efficiency\">Efficiency<\/span><\/strong> &#8211; Cleaning data helps you perform your <span id=\"urn:enhancement-1e4cdfa7\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/analysis\">analysis<\/span> faster. This is because having clean data means you avoid multiple <span id=\"urn:enhancement-bc5d87cd\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/errors_and_residuals_in_statistics\">errors<\/span>, and your results will be more accurate. Therefore, you won\u2019t have to re-do the whole task due to false results.<\/li>\r\n<li><strong><span id=\"urn:enhancement-78108681\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/error\">Error<\/span> Margin<\/strong> \u2013 Although you may be very eager to get results, if the data isn\u2019t clean, the results won\u2019t be accurate. That means when you present the work, the outcome may not be true. Therefore, getting used to cleaning data means that you adopt the practice of slowing down and fixing data before presenting it. Leaving less room for mistakes.\u00a0<\/li>\r\n<li><strong><span id=\"urn:enhancement-a96bbd82\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/accuracy_and_precision\">Accuracy<\/span><\/strong> \u2013 As data cleaning takes up so much time, you will soon learn to be more accurate with the data entered in the first place. Of course, data cleaning will still be needed for other reasons, but doing it gets you used to being more precise in the first place.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"999\" height=\"441\" class=\"wp-image-12973\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image3-2.jpg\" alt=\"why to use data cleaning\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image3-2.jpg 999w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image3-2-300x132.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image3-2-768x339.jpg 768w\" sizes=\"auto, (max-width: 999px) 100vw, 999px\" \/><\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n<p>Now that we have gone into a little extra detail about how important data cleaning is, let\u2019s take a look at the actual techniques.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-remove-unwanted-observations\">Remove Unwanted Observations<\/h2>\r\n\r\n\r\n\r\n<p>The first thing you need to do in setting up data cleaning is to <strong>remove unwanted observations<\/strong>. This includes removing duplicate or irrelevant observations.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Duplicate observations will most likely arise during <strong>data collection<\/strong>. They usually happen when you scrape data or combine datasets from multiple places. They can also occur when you receive data from clients or other departments. For instance, a user may have accidentally entered their details twice. Duplicates will only increase the amount of data you have and can end up wasting time.\u00a0<\/p>\r\n\r\n\r\n\r\n<p><strong>Irrelevant observations<\/strong> are ones that don\u2019t fit with the issues you are trying to solve. For example, say you are building a <a class=\"ek-link\" href=\"https:\/\/www.ringcentral.com\/virtual-phone-service.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">virtual office phone service<\/a>. You will want anything to do with phone numbers in there. But you won\u2019t want anything to do with social <span id=\"urn:enhancement-12605bcd\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/media_communication\">media<\/span>. Focussing on this point first will prevent any problems that may pop up down the line.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Make sure the data definitely is irrelevant and that you won\u2019t need it further down the line, say for something like correlated values. Once you are sure of that, get rid of it!<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Filter Unwanted Outliers<\/h2>\r\n\r\n\r\n\r\n<p>It\u2019s important to get rid of unwanted outliers because they can cause problems with certain models. Linear regression models, for example, are less robust to outliers than decision tree models.\u00a0<\/p>\r\n\r\n\r\n\r\n<p><strong>Removing outliers will help with the model\u2019s performance<\/strong>. But, there does have to be a legitimate reason to remove them.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Say, for example, you are creating a database of information connected to a <a class=\"ek-link\" href=\"https:\/\/flippingbook.com\/blog\/how-to-create-an-employee-handbook\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">digital handbook maker<\/a> and there are lots of facts and figures in there. Just because a number may be a big number to input, it doesn\u2019t make it an outlier. That large number may at some point become informative to your model.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>However, if there is a legitimate reason that it seems like the outlier should be removed, then it is important to do so. This could be something like a suspicious measurement that is likely not to be real. Like, if someone has entered their phone number as 012873839283228343273, you know it isn\u2019t a true value and is an outlier you can get rid of.<\/p>\r\n\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"638\" height=\"359\" class=\"wp-image-12975\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image5-5.png\" alt=\"data cleaning: types of outliers\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image5-5.png 638w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image5-5-300x169.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image5-5-400x225.png 400w\" sizes=\"auto, (max-width: 638px) 100vw, 638px\" \/><\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-avoid-errors-like-typos\">Avoid Errors Like Typos<\/h2>\r\n\r\n\r\n\r\n<p>Typos are easy mistakes to make. And without something like spellcheck, they can often go unnoticed. However, spelling is essential to fix, as models treat different values differently. Strings, for example, rely a lot on spelling and letter cases.<\/p>\r\n\r\n\r\n\r\n<p>Several pieces of<a class=\"ek-link\" href=\"https:\/\/www.novatech.co.uk\/blog\/5-technologies-to-bring-offices-into-future\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\"> essential technology<\/a> use algorithms and techniques that do <strong>fix typos<\/strong>. Mistakes can be mapped and converted into the correct spelling.<\/p>\r\n\r\n\r\n\r\n<p>Although it may not seem like a big deal to a human, a computer doesn\u2019t think like that. For example, there is a difference between putting in <em>Robert<\/em> and <em>robert<\/em>. The capitalization can have a significant impact.<\/p>\r\n\r\n\r\n\r\n<p>Another example is using the US spelling \u2018optimize\u2019 and the British spelling \u2018optimise\u2019. They are the same word but spelt differently.<\/p>\r\n\r\n\r\n\r\n<p>One more example is seen in making a mistake like spelling \u2018Mike\u2019 as \u2018Mice\u2019. They have the same number of letters but are spelt differently.<\/p>\r\n\r\n\r\n\r\n<p>You might need to also consider the string size. You might have to change them to make sure they are kept in the same format.<\/p>\r\n\r\n\r\n\r\n<p>It might be that your dataset requires you to have five digits only. So, if you have a digit like 3332, you will have to put a zero in front. This keeps your data uniform. You will also need to remove your whitespaces for the same reason. Removing them from strings keeps them consistent.<\/p>\r\n\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"263\" height=\"203\" class=\"wp-image-12974\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image4-8.png\" alt=\"\" \/><\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-convert-numbers-stored-as-text-into-numbers\">Convert Numbers Stored as Text Into Numbers<\/h2>\r\n\r\n\r\n\r\n<p>It doesn\u2019t matter if it\u2019s a <a class=\"ek-link\" href=\"https:\/\/blog.3dcart.com\/link-building-strategies\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">link building<\/a> article or a customer phone number, mistakes are often made when numbers are entered.<\/p>\r\n\r\n\r\n\r\n<p>For example, an address may be entered as:\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-center\"><em>12 3 House Street, New York<\/em><\/p>\r\n\r\n\r\n\r\n<p>or,<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-center\"><em>1A23 House Street, New York\u00a0<\/em><\/p>\r\n\r\n\r\n\r\n<p>or even,<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-center\"><em>%123 House Street, New York<\/em><\/p>\r\n\r\n\r\n\r\n<p>This makes \u2018unclean\u2019 data, and needs to be sorted out on the back end to keep things smooth.<\/p>\r\n\r\n\r\n\r\n<p>If there are any mistakes with numbers being entered, they need to be changed to actual readable data. All of the data here will need to be converted so the numbers are readable.<\/p>\r\n\r\n\r\n\r\n<p>To <strong>convert the numbers<\/strong>, you will need to go to the formatting box and type in \u201cgeneral\u201d. Alternatively, opening up a dialogue box and then copying, pasting and multiplying from blank cells should help with that issue.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Deal with Missing Values<\/h2>\r\n\r\n\r\n\r\n<p><strong>Missing values can\u2019t be ignored<\/strong>. Knowing how to handle them will keep the data clean. You may even have the problem of having too many missing values in a column. If this happens, there may not be enough data to work with, so it might just be easier to delete the column.<\/p>\r\n\r\n\r\n\r\n<p>However tempting it might be, missing values should not be ignored. If you want to <a class=\"ek-link\" href=\"https:\/\/blog.filestack.com\/thoughts-and-knowledge\/keep-end-users-engaged-software\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">keep end-users engaged<\/a>, then every step needs to be done properly. This means putting in the extra effort and doing your best to get accurate results with all data. Which includes <strong>dealing with missing values.<\/strong><\/p>\r\n\r\n\r\n\r\n<p>Otherwise, there are ways to input missing data values. This is done by estimating what this missing data might be. Linear regression or median will help calculate this. However, it won\u2019t be the real value, so still won\u2019t be accurate.<\/p>\r\n\r\n\r\n\r\n<p>Another method is to copy data from a similar dataset, but this might also record <strong>inaccurate results<\/strong>. So, you can always inform the algorithm that the data is unavailable or \u2018missing\u2019. You may have to select \u20180\u2019 in some cases.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Convert Data Types<\/h2>\r\n\r\n\r\n\r\n<p>All data types must be the same across the board. A numeric can\u2019t be a Boolean and a string can\u2019t be numeric.<\/p>\r\n\r\n\r\n\r\n<p>When converting data types, numeric values need to be kept as numeric values. Numerics shouldn\u2019t be entered as strings and data that can\u2019t be converted should be entered as N\/A. Don\u2019t forget to have the warning to say it is wrong.<br \/>Making sure that all data is converted helps anyone in the company who has to deal with the data. It even helps people like <a class=\"ek-link\" href=\"https:\/\/www.codemotion.com\/magazine\/articles\/stories\/cybersecurity-hiring-team\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">cybersecurity experts<\/a> encrypt and protect data properly, if there aren\u2019t mistakes that hackers can easily access.<\/p>\r\n\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"584\" class=\"wp-image-12971\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image1-7-1024x584.png\" alt=\"\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image1-7-1024x584.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image1-7-300x171.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image1-7-768x438.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image1-7.png 1299w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-get-rid-of-extra-spaces\">Get Rid of Extra Spaces<\/h2>\r\n\r\n\r\n\r\n<p>Spaces are missing from data more often than you may think. They often come from colleagues who send over work from text files imported from a database.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Say, for example, a teammate is trying to set up a <a class=\"ek-link\" href=\"https:\/\/www.ringcentral.com\/office\/features\/call-forwarding\/overview.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">call forwarding<\/a> service. They want you to collect and analyze the data. However, it looks like this:\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-left\"><em>Please call me on 0844123123<\/em><\/p>\r\n\r\n\r\n\r\n<p>Or,<\/p>\r\n\r\n\r\n\r\n<p><em>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Please call me on 0844123123<\/em><\/p>\r\n\r\n\r\n\r\n<p>Or even,\u00a0<\/p>\r\n\r\n\r\n\r\n<p><em>Please call me\u00a0\u00a0\u00a0\u00a0\u00a0<br \/>on 0844123123<\/em><\/p>\r\n\r\n\r\n\r\n<p>To deal with these issues, you will need to use the \u201ctrim\u201d function.\u00a0<\/p>\r\n\r\n\r\n\r\n<p><code>Syntax: =TRIM(Text)<\/code><\/p>\r\n\r\n\r\n\r\n<p>This function takes away all of the spaces and corrects the problem.\u00a0<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Delete All Formatting\u00a0<\/h2>\r\n\r\n\r\n\r\n<p>Often, computers tend to <strong>automatically format<\/strong> written information. But for the sake of keeping things uniform with data, you need to clear all the formatting.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>This is really great when dealing with something like a <a class=\"ek-link\" href=\"https:\/\/www.ringcentral.com\/lp\/vanitynumber.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">vanity number<\/a> for customers who want specialist contact information. This is because they don\u2019t use generic numbers, so although the computer may format their information, it will need to be unformatted to keep their information separate and special.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>There is a way you can delete all formatting. This is done by selecting all the data. Then go to <em>Home \u2013&gt; Clear \u2013&gt; Clear Formats<\/em>.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\" id=\"h-data-cleaning-recap\">Data cleaning: recap<\/h2>\r\n\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"989\" height=\"1024\" class=\"wp-image-12972\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image2-8-989x1024.png\" alt=\"data cleaning checklist\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image2-8-989x1024.png 989w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image2-8-290x300.png 290w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image2-8-768x795.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/image2-8.png 1280w\" sizes=\"auto, (max-width: 989px) 100vw, 989px\" \/><\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n<p>Knowing how to do data cleaning properly is all part of being a great data scientist. Getting data cleaning correct prevents any issues from occurring in the future. Data cleaning helps you to do your job properly and, in turn, allows you to do the best job you can to help companies move forward with their goals.<\/p>\r\n\r\n\r\n","protected":false},"excerpt":{"rendered":"<p>Data is an essential part of data analytics, data security, and data science. That\u2019s obvious.\u00a0Sometimes, however, that data can get a little dirty. No, not like in a gangster film. More like where suddenly we are having to deal with \u2018dirty data\u2019 after a hold up at a data centre. When there is a mistake&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\">Read more<\/a><\/p>\n","protected":false},"author":115,"featured_media":12967,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":9,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[16],"tags":[5571,4446,7384],"collections":[],"class_list":{"0":"post-12966","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-big-data","8":"tag-big-data","9":"tag-data-analysis","10":"tag-data-mining","11":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>8 Techniques for Efficient Data Cleaning - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"In this article we&#039;ll learn what is data cleaning and why it represents a crucial part of any data scientist job.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"8 Techniques for Efficient Data Cleaning\" \/>\n<meta property=\"og:description\" content=\"In this article we&#039;ll learn what is data cleaning and why it represents a crucial part of any data scientist job.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-31T08:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-01-05T19:03:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"John Allen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"John Allen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\"},\"author\":{\"name\":\"John Allen\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/f4690aee511518093a50a9626a5d2051\"},\"headline\":\"8 Techniques for Efficient Data Cleaning\",\"datePublished\":\"2020-12-31T08:00:00+00:00\",\"dateModified\":\"2022-01-05T19:03:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\"},\"wordCount\":2138,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg\",\"keywords\":[\"Big Data\",\"Data Analysis\",\"Data Mining\"],\"articleSection\":[\"Big Data\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\",\"name\":\"8 Techniques for Efficient Data Cleaning - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg\",\"datePublished\":\"2020-12-31T08:00:00+00:00\",\"dateModified\":\"2022-01-05T19:03:25+00:00\",\"description\":\"In this article we'll learn what is data cleaning and why it represents a crucial part of any data scientist job.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg\",\"width\":1200,\"height\":675},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI\/ML\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Big Data\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"8 Techniques for Efficient Data Cleaning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/f4690aee511518093a50a9626a5d2051\",\"name\":\"John Allen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a8648030260b2144249a568d2dd0a03a52098d197c00a3b3bdd978d53e68b077?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a8648030260b2144249a568d2dd0a03a52098d197c00a3b3bdd978d53e68b077?s=96&d=mm&r=g\",\"caption\":\"John Allen\"},\"description\":\"John Allen is the \u201cBillion Dollar SEO,\u201d known for effectively scaling enterprise SEO teams. With over 14 years of experience and an extensive background in building and optimizing digital marketing programs he currently directs all SEO activity for RingCentral, a global UCaaS, VoIP, and call center solutions provider. He has written for websites such as Hubspot and Toolbox.\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/john-allen\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"8 Techniques for Efficient Data Cleaning - Codemotion Magazine","description":"In this article we'll learn what is data cleaning and why it represents a crucial part of any data scientist job.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/","og_locale":"en_US","og_type":"article","og_title":"8 Techniques for Efficient Data Cleaning","og_description":"In this article we'll learn what is data cleaning and why it represents a crucial part of any data scientist job.","og_url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2020-12-31T08:00:00+00:00","article_modified_time":"2022-01-05T19:03:25+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg","type":"image\/jpeg"}],"author":"John Allen","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"John Allen","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/"},"author":{"name":"John Allen","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/f4690aee511518093a50a9626a5d2051"},"headline":"8 Techniques for Efficient Data Cleaning","datePublished":"2020-12-31T08:00:00+00:00","dateModified":"2022-01-05T19:03:25+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/"},"wordCount":2138,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg","keywords":["Big Data","Data Analysis","Data Mining"],"articleSection":["Big Data"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/","url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/","name":"8 Techniques for Efficient Data Cleaning - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg","datePublished":"2020-12-31T08:00:00+00:00","dateModified":"2022-01-05T19:03:25+00:00","description":"In this article we'll learn what is data cleaning and why it represents a crucial part of any data scientist job.","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg","width":1200,"height":675},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/data-cleaning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"AI\/ML","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/"},{"@type":"ListItem","position":3,"name":"Big Data","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/"},{"@type":"ListItem","position":4,"name":"8 Techniques for Efficient Data Cleaning"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/f4690aee511518093a50a9626a5d2051","name":"John Allen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a8648030260b2144249a568d2dd0a03a52098d197c00a3b3bdd978d53e68b077?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a8648030260b2144249a568d2dd0a03a52098d197c00a3b3bdd978d53e68b077?s=96&d=mm&r=g","caption":"John Allen"},"description":"John Allen is the \u201cBillion Dollar SEO,\u201d known for effectively scaling enterprise SEO teams. With over 14 years of experience and an extensive background in building and optimizing digital marketing programs he currently directs all SEO activity for RingCentral, a global UCaaS, VoIP, and call center solutions provider. He has written for websites such as Hubspot and Toolbox.","url":"https:\/\/www.codemotion.com\/magazine\/author\/john-allen\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-600x400.jpg","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-600x600.jpg","author_info":{"display_name":"John Allen","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/john-allen\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg",1200,675,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-150x150.jpg",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-300x169.jpg",300,169,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-768x432.jpg",768,432,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-1024x576.jpg",1024,576,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg",1200,675,false],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg",1200,675,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash.jpg",100,56,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-180x128.jpg",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-896x504.jpg",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-400x225.jpg",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-600x400.jpg",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/12\/campaign-creators-774sCXD0dDU-unsplash-600x600.jpg",600,600,true]},"uagb_author_info":{"display_name":"John Allen","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/john-allen\/"},"uagb_comment_info":0,"uagb_excerpt":"Data is an essential part of data analytics, data security, and data science. That\u2019s obvious.\u00a0Sometimes, however, that data can get a little dirty. No, not like in a gangster film. More like where suddenly we are having to deal with \u2018dirty data\u2019 after a hold up at a data centre. When there is a mistake&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/12966","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=12966"}],"version-history":[{"count":7,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/12966\/revisions"}],"predecessor-version":[{"id":13181,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/12966\/revisions\/13181"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/12967"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=12966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=12966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=12966"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=12966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}