This week, we’ll be going over a quick way to determine what the most popular tags are in a series of tagged entries.
Files and Data Required for This Exercise
Files
- popular_tags.php — This will contain the dummy code and the function to process it
Data
- Tags — An array of comma-delimited strings containing tags
- Parsing Function — Function to break apart the array and determine which tags are the most popular
The Tag Sets
For this exercise, we’ll be using a comma-delimited string (i.e. “tag 1, tag 2, tag 3, etc.”) as a set, and we’ll assume that there are multiple entries, each with a set of tags generated by the user.
Pretty much every blog, bookmarking resource, and app these days features tagging in some capacity. Tags help users quickly identify what an entry is about, or what category an item belongs in.
Another benefit of using tags is that they allow us, as developers, to identify trends in entries. By determining what tags occur most frequently, we can let site visitors know what they’re most likely to find within a blog or collection of entries.
The Dummy Tag Sets for This Exercise
Normally, the tags would be stored in a database. However, for the sake of an easy-to-understand exercise, we’ll be using a plain array (the array will function just like the result set returned from a database query). In our new file for testing, called popular_tags.php, add our array, which should contain the following tag sets:
$tag_array = array( 'php, regular expressions, search', 'php, arrays, javascript', 'php, search, javascript', 'php, arrays', 'arrays, mysql' );
These are tag sets that might exist on this blog. In this case, they’re separated by commas, but they could be separated by anything, really.
The Parsing Function
This function is simple, but it has to do a lot. Let’s define the steps that need to be followed by this function, which we’ll call popularTags():
- Accept the array of tag sets as an argument
- Create a new array that will contain the processed tag information
- Loop through each set of tags
- Break each tag set apart at the commas and remove extra whitespace before and after the tags
- Loop through each separated tag individually
- Check if the current tag is already in the final array, and if so, increment the value of that tag by one; if not, create the tag in the array with a value of one
- Sort the array by occurrence, from highest to lowest, and return it
Step 1: Accept the Array of Tag Sets
To start, we need to declare our function and accept the array of tag sets as an argument. In popular_tags.php, let’s add the following:
function popularTags($tag_array) { // Process $tag_array }
Step 2: Create a New Array to Store Processed Tag Information
To avoid a notice about undeclared variables, we need to instantiate the variable that will contain the new array of processed tags. We’ll call this array $popular_tags, and it will be initialized as an empty array.
This is going to be our return value. When finished we want this array to hold each tag that occurred in the tag sets as an index, each containing a value that represents how many times that tag occurred in the supplied array. For example, if “tag1” occurs twice and “tag2” occurs once, $popular_tags would look like this:
Array ( [tag1] => 2 [tag2] => 1 )
Update popularTags() to contain the following:
function popularTags($tag_array) { // Instantiate the final tag array $popular_tags = array(); // Process $tag_array return $popular_tags; }
Steps 3 & 4: Loop Through Each Tag Set and Separate by Commas
With our function declared and accepting an array, and a new array instantiated to contain our processed tags, we need to start processing the tags themselves.
To start, we need to access each tag set individually. The easiest way to do this is to run a foreach loop, giving us each element of the array individually:
function popularTags($tag_array) { // Instantiate the final tag array $popular_tags = array(); // Loop through each set of tags foreach($tag_array as $tags) { // Process each tag set } // Return the array return $popular_tags; }
Next, we need to separate each tag set into an array of the tags contained within it. Because we know the tag sets are comma delimited, we can use the handy explode() function to break the string into an array, using the comma as the breaking point. Then, to make sure all the extra leading and trailing whitespace is eliminated from each tag, we use array_map() to call the trim() function on each element we just created from the string.
We can do all of this in one line by inserting the following just inside our loop:
function popularTags($tag_array) { // Instantiate the final tag array $popular_tags = array(); // Loop through each set of tags foreach($tag_array as $tags) { /* * Separate at the commas to get individual tags and * trim the whitespace from each tag */ $tags_arr = array_map('trim', explode(',', $tags)); // Process the tags in $tags_arr } // Return the array return $popular_tags; }
Steps 5 & 6: Loop Through Each Tag and Check Against the Final Array
Now that we have each tag stored in an individual array element, we can start checking which tags are the most popular. To do this, we’re going to loop through each tag in the $tags_arr array. If the tag we’re currently dealing with has already been added to the $popular_tags array, we increment it’s value by one (meaning the original occurrence of a tag sets the array element for that tag to one, and each subsequent occurrence increments that value, effectively giving us a tag count). Otherwise, we’ll simply add a new array element to the $popular_tags array and set its value to 1.
function popularTags($tag_array) { // Instantiate the final tag array $popular_tags = array(); // Loop through each set of tags foreach($tag_array as $tags) { /* * Separate at the commas to get individual tags and * trim the whitespace from each tag */ $tags_arr = array_map('trim', explode(',', $tags)); // Loop through each tag foreach($tags_arr as $tag) { /* * If the tag has already been added to the * $popular_tags array, increment its value by 1 */ if(array_key_exists($tag, $popular_tags)) { $popular_tags[$tag] += 1; } /* * Otherwise, add the tag to the array and * set its value to 1 */ else { $popular_tags[$tag] = 1; } } } // Return the array return $popular_tags; }
Step 7: Sort the Array and Return It
Finally, we need to sort our array in descending order by occurrence, so that the tags that occur most often are at the top, while the less popular tags are at the bottom. There’s a very handy array-handling function that does exactly what we need called arsort(), which means “array reverse sort”.
It’s worth noting that arsort() does not return the sorted array. Rather, it returns a boolean: TRUE on success and FALSE on failure. Because of this, we cannot directly call return arsort($popular_tags);, as this would be output as 1 instead of the expected array.
To sort our array, update popularTags() by adding the call to arsort() just above the return statement:
function popularTags($tag_array) { // Instantiate the final tag array $popular_tags = array(); // Loop through each set of tags foreach($tag_array as $tags) { /* * Separate at the commas to get individual tags and * trim the whitespace from each tag */ $tags_arr = array_map('trim', explode(',', $tags)); // Loop through each tag foreach($tags_arr as $tag) { /* * If the tag has already been added to the * $popular_tags array, increment its value by 1 */ if(array_key_exists($tag, $popular_tags)) { $popular_tags[$tag] += 1; } /* * Otherwise, add the tag to the array and * set its value to 1 */ else { $popular_tags[$tag] = 1; } } } // Sort the tags in the array in descending order arsort($popular_tags); // Return the array return $popular_tags; }
The Output
First and foremost, when we display the results of popularTags() when the dummy tag sets are passed using print_r(), we’ll see the following:
Array ( => 4 [arrays] => 3 => 2 [search] => 2 [mysql] => 1 [regular expressions] => 1 )
To use this information in a script, we might want to display each category with its popularity in parentheses to give our users some idea of what the entries on the site usually pertain to.
Add the following output script to popular_tags.php:
foreach(popularTagsC($tag_array) as $tag=>$num) { echo $tag, " (", $num, ")<br />"; }
To see the result, open the file in a browser. We should see the following:
php (4) arrays (3) javascript (2) search (2) mysql (1) regular expressions (1)
The Condensed Function
When the comments are removed and a couple of things are combined to save space, we can take popularTags() down to just 11 lines of code:
function popularTags($tag_array) { $p = array(); foreach($tag_array as $tags) { $tags_arr = array_map('trim', explode(',', $tags)); foreach($tags_arr as $tag) { $p[$tag] = array_key_exists($tag, $p) ? $p[$tag]+1 : 1; } } arsort($p); return $p; }
Summary
In this exercise, we learned how to break strings apart into arrays in order to determine the frequency with which certain tags or words are used.
Do you have a way to further compress this function? Let me know in the comments!