How to resolve the algorithm Anagrams step by step in the C programming language

Problem Statement
Step by Step Solution
Sourcecode

Problem Statement

When two or more words are composed of the same characters, but in a different order, they are called anagrams. Using the word list at http://wiki.puzzlers.org/pub/wordlists/unixdict.txt, find the sets of words that share the same characters that contain the most words in them.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Anagrams step by step in the C programming language

The first snippet of code is written in C and it implements a function that takes a word and returns the word sorted alphabetically. The function sortedWord receives a word and a buffer wbuf and it returns the sorted word. First it copies the word into the buffer and then it gets the end of the word. Then it iterates over the word and swaps the characters that are not in alphabetical order. The code is not efficient because it iterates over the word multiple times.

The second snippet of code implements a hash function that takes a key and returns an integer. The hash function Str_Hash receives a key and a maximum index and it returns an integer. First it iterates over the key and gets the hash value for each character. Then it computes the hash value for the key and returns it.

The third snippet of code implements a data structure that represents a dictionary. The data structure DictWord represents a word in the dictionary and it contains the word and a pointer to the next word in the dictionary. The data structure HashEntry represents an entry in the hash table and it contains the key, a pointer to the next entry in the hash table, a pointer to the first word in the dictionary, a pointer to the next entry in the linked list of entries with the same key, and the number of words in the dictionary.

The fourth snippet of code implements a function that builds the anagrams for a dictionary. The function buildAnagrams receives a file and it returns the maximum number of anagrams for a word in the dictionary. First it iterates over the words in the file and for each word it computes the sorted word. Then it gets the hash value for the sorted word and gets the entry in the hash table for that hash value. If the entry does not exist, it creates a new entry and adds it to the hash table. Then it adds the word to the dictionary for that entry and increases the number of words in the dictionary for that entry. If the number of words in the dictionary for that entry is greater than the maximum number of anagrams for a word in the dictionary, it updates the maximum number of anagrams and sets the entry as the most permuted entry. Finally it returns the maximum number of anagrams for a word in the dictionary.

The fifth snippet of code implements a main function that calls the function buildAnagrams and prints the most permuted words in the dictionary. The function main receives no arguments and it returns 0. First it calls the function buildAnagrams and gets the maximum number of anagrams for a word in the dictionary. Then it opens a file for writing and for each entry in the linked list of most permuted entries, it prints the number of words in the dictionary for that entry and the words in the dictionary for that entry. Finally it closes the file and returns 0.

The sixth and seventh snippets of code implement two different approaches to find the longest list of anagrams in a dictionary. The first approach uses a hash table to store the anagrams and then sorts the hash table by the number of anagrams. The second approach uses a bubble sort to sort the anagrams. Both approaches are not efficient because they iterate over the dictionary multiple times.

Explanation: The provided code is written in C programming language and it performs the following tasks:

Word Sorting Function (sortedWord):
- This function takes a word as input and sorts its characters in ascending order.
- It returns the sorted word in a character buffer (wbuf).
Hash Function (Str_Hash):
- This function takes a string (key) and an index maximum value (ix_max) as input.
- It uses a custom hash function (based on a character mapping array cxmap) to convert the string to a unique integer hash value.
- This hash value is used to determine the index position in a hash table.
Data Structures:
- DictWord: Represents a word and its next pointer in a linked list.
- HashEntry: Represents an entry in the hash table, containing a key, next pointer, linked list of words (words), and a word count.
Hash Table:
- An array hashTable of size 8192 is used as a hash table.
- Each entry in the hash table (HashEntry) is a linked list of words that have the same sorted form.
buildAnagrams Function:
- This function takes an input file containing words.
- It reads the words from the file and processes each word as follows:
  - Sorts the word using the sortedWord function.
  - Calculates the hash value for the sorted word.
  - Finds the corresponding entry in the hash table (he) by iterating through the linked list of entries.
  - If he does not exist, it creates a new entry with the sorted word as the key and a linked list for words.
  - Adds the word to the linked list of words in the entry.
  - Updates the word count in the entry.
  - Tracks the entry with the maximum number of words (mostPerms).
Main Function (main):
- Calls the buildAnagrams function to build the hash table of anagrams.
- Iterates through the entries in the hash table that have the maximum number of words.
- For each entry, prints the sorted word and its corresponding words in a text file.

Alternative Solution: The code also includes an alternative solution using the following additional functions:

lst_cmp: Compares two kw_t structures based on their key.
sort_letters: Sorts the letters in a character array.

This alternative solution reads the words from the file and processes them as follows:

Sorts the letters in each word.
Creates an array of structures (list) containing the sorted word as the key and the original word.
Sorts the array of structures based on the sorted word.
Iterates through the sorted array and counts the number of repetitions for each sorted word.
Prints the longest sequence of words with the same sorted form.

Source code in the c programming language

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <time.h>

char *sortedWord(const char *word, char *wbuf)
{
    char *p1, *p2, *endwrd;
    char t;
    int swaps;

    strcpy(wbuf, word);
    endwrd = wbuf+strlen(wbuf);
    do {
       swaps = 0;
       p1 = wbuf; p2 = endwrd-1;
       while (p1<p2) {
          if (*p2 > *p1) {
             t = *p2; *p2 = *p1; *p1 = t;
             swaps = 1;
          }
          p1++; p2--;
       }
       p1 = wbuf; p2 = p1+1;
       while(p2 < endwrd) {
           if (*p2 > *p1) {
             t = *p2; *p2 = *p1; *p1 = t;
             swaps = 1;
           }
           p1++; p2++;
       }
    } while (swaps);
    return wbuf;
}

static
short cxmap[] = {
    0x06, 0x1f, 0x4d, 0x0c, 0x5c, 0x28, 0x5d, 0x0e, 0x09, 0x33, 0x31, 0x56,
    0x52, 0x19, 0x29, 0x53, 0x32, 0x48, 0x35, 0x55, 0x5e, 0x14, 0x27, 0x24,
    0x02, 0x3e, 0x18, 0x4a, 0x3f, 0x4c, 0x45, 0x30, 0x08, 0x2c, 0x1a, 0x03,
    0x0b, 0x0d, 0x4f, 0x07, 0x20, 0x1d, 0x51, 0x3b, 0x11, 0x58, 0x00, 0x49,
    0x15, 0x2d, 0x41, 0x17, 0x5f, 0x39, 0x16, 0x42, 0x37, 0x22, 0x1c, 0x0f,
    0x43, 0x5b, 0x46, 0x4b, 0x0a, 0x26, 0x2e, 0x40, 0x12, 0x21, 0x3c, 0x36,
    0x38, 0x1e, 0x01, 0x1b, 0x05, 0x4e, 0x44, 0x3d, 0x04, 0x10, 0x5a, 0x2a,
    0x23, 0x34, 0x25, 0x2f, 0x2b, 0x50, 0x3a, 0x54, 0x47, 0x59, 0x13, 0x57,
   };
#define CXMAP_SIZE (sizeof(cxmap)/sizeof(short))


int Str_Hash( const char *key, int ix_max )
{
   const char *cp;
   short mash;
   int  hash = 33501551;
   for (cp = key; *cp; cp++) {
      mash = cxmap[*cp % CXMAP_SIZE];
      hash = (hash >>4) ^ 0x5C5CF5C ^ ((hash<<1) + (mash<<5));
      hash &= 0x3FFFFFFF;
      }
   return  hash % ix_max;
}

typedef struct sDictWord  *DictWord;
struct sDictWord {
    const char *word;
    DictWord next;
};

typedef struct sHashEntry *HashEntry;
struct sHashEntry {
    const char *key;
    HashEntry next;
    DictWord  words;
    HashEntry link;
    short wordCount;
};

#define HT_SIZE 8192

HashEntry hashTable[HT_SIZE];

HashEntry mostPerms = NULL;

int buildAnagrams( FILE *fin )
{
    char buffer[40];
    char bufr2[40];
    char *hkey;
    int hix;
    HashEntry he, *hep;
    DictWord  we;
    int  maxPC = 2;
    int numWords = 0;
    
    while ( fgets(buffer, 40, fin)) {
        for(hkey = buffer; *hkey && (*hkey!='\n'); hkey++);
        *hkey = 0;
        hkey = sortedWord(buffer, bufr2);
        hix = Str_Hash(hkey, HT_SIZE);
        he = hashTable[hix]; hep = &hashTable[hix];
        while( he && strcmp(he->key , hkey) ) {
            hep = &he->next;
            he = he->next;
        }
        if ( ! he ) {
            he = malloc(sizeof(struct sHashEntry));
            he->next = NULL;
            he->key = strdup(hkey);
            he->wordCount = 0;
            he->words = NULL;
            he->link = NULL;
            *hep = he;
        }
        we = malloc(sizeof(struct sDictWord));
        we->word = strdup(buffer);
        we->next = he->words;
        he->words = we;
        he->wordCount++;
        if ( maxPC < he->wordCount) {
            maxPC = he->wordCount;
            mostPerms = he;
            he->link = NULL;
        }
        else if (maxPC == he->wordCount) {
            he->link = mostPerms;
            mostPerms = he;
        }
         
        numWords++;
    }
    printf("%d words in dictionary max ana=%d\n", numWords, maxPC);
    return maxPC;
}


int main( ) 
{
    HashEntry he;
    DictWord  we;
    FILE *f1;
    
    f1 = fopen("unixdict.txt","r");
    buildAnagrams(f1);
    fclose(f1);
    
    f1 = fopen("anaout.txt","w");
//    f1 = stdout;

    for (he = mostPerms; he; he = he->link) {
        fprintf(f1,"%d:", he->wordCount);
        for(we = he->words; we; we = we->next) {
            fprintf(f1,"%s, ", we->word);
        }
        fprintf(f1, "\n");
    }

    fclose(f1);
    return 0;
}


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <string.h>

typedef struct { const char *key, *word; int cnt; } kw_t;

int lst_cmp(const void *a, const void *b)
{
	return strcmp(((const kw_t*)a)->key, ((const kw_t*)b)->key);
}

/* Bubble sort.  Faster than stock qsort(), believe it or not */
void sort_letters(char *s)
{
	int i, j;
	char t;
	for (i = 0; s[i] != '\0'; i++) {
		for (j = i + 1; s[j] != '\0'; j++)
			if (s[j] < s[i]) {
				t = s[j]; s[j] = s[i]; s[i] = t;
			}
	}
}

int main()
{
	struct stat s;
	char *words, *keys;
	size_t i, j, k, longest, offset;
	int n_word = 0;
	kw_t *list;

	int fd = open("unixdict.txt", O_RDONLY);
	if (fd == -1) return 1;
	fstat(fd, &s);
	words = malloc(s.st_size * 2);
	keys  = words + s.st_size;

	read(fd, words, s.st_size);
	memcpy(keys, words, s.st_size);

	/* change newline to null for easy use; sort letters in keys */
	for (i = j = 0; i < s.st_size; i++) {
		if (words[i] == '\n') {
			words[i] = keys[i] = '\0';
			sort_letters(keys + j);
			j = i + 1;
			n_word ++;
		}
	}

	list = calloc(n_word, sizeof(kw_t));

	/* make key/word pointer pairs for sorting */
	for (i = j = k = 0; i < s.st_size; i++) {
		if (words[i] == '\0') {
			list[j].key = keys + k;
			list[j].word = words + k;
			k = i + 1;
			j++;
		}
	}

	qsort(list, n_word, sizeof(kw_t), lst_cmp);

	/* count each key's repetition */
	for (i = j = k = offset = longest = 0; i < n_word; i++) {
		if (!strcmp(list[i].key, list[j].key)) {
			++k;
			continue;
		}

		/* move current longest to begining of array */
		if (k < longest) {
			k = 0;
			j = i;
			continue;
		}

		if (k > longest) offset = 0;

		while (j < i) list[offset++] = list[j++];
		longest = k;
		k = 0;
	}

	/* show the longest */
	for (i = 0; i < offset; i++) {
		printf("%s ", list[i].word);
		if (i < n_word - 1 && strcmp(list[i].key, list[i+1].key))
			printf("\n");
	}

	/* free(list); free(words); */
	close(fd);
	return 0;
}

You may also check:How to resolve the algorithm Cyclotomic polynomial step by step in the Kotlin programming language
You may also check:How to resolve the algorithm Euler's constant 0.5772... step by step in the Phix programming language
You may also check:How to resolve the algorithm Take notes on the command line step by step in the Clojure programming language
You may also check:How to resolve the algorithm Top rank per group step by step in the E programming language
You may also check:How to resolve the algorithm Singly-linked list/Element definition step by step in the Tcl programming language

How to resolve the algorithm Anagrams step by step in the C programming language

Table of Contents

Problem Statement

Step by Step solution about How to resolve the algorithm Anagrams step by step in the C programming language

Source code in the c programming language