Sparse Matrix


A sparse matrix is a special type of matrix in which most of the elements are zero or have some default value. In contrast, a dense matrix is one in which most of the elements are non-zero. Sparse matrices are used in various fields, including computer science, numerical analysis, and scientific computing, when dealing with large datasets or systems where efficiency and memory usage are crucial. As a rule of thumb, if 2/3 of the total elements in a matrix are zeros, it can be called a sparse matrix. Using a sparse matrix representation — where only the non-zero values are stored — the space used for representing data and the time for scanning the matrix are reduced significantly.

Here's a detailed description of sparse matrices, including their representation and common operations:

Example 

Let's take the example of a movie recommendation system. There are millions of users and thousands of movies, so it's not possible for users to watch and rate all movies. This data can be represented as a matrix where the rows are the users, and the columns are the movies.

Most of the matrix elements will be empty, where the missing values will be replaced with zeros. Since a small percentage of the matrix has non-zero values, this matrix can be considered a sparse matrix. A small portion of the data is given below:

The sparsity of a matrix refers to the proportion of its elements that are zero or some other default value (i.e., the number of zero elements divided by the total number of elements in the matrix). It is often expressed as a percentage or a fraction. To compute the sparsity of a matrix, you can follow these steps:

Count the number of zero elements in the matrix.
Determine the total number of elements in the matrix.
Calculate the sparsity using the formula:

Sparsity (%) = (Number of Zero Elements / Total Number of Elements) * 100

In the above example sparsity=26/35=0.742

Note that most of the matrix elements will be zero since only non-zero values are important for data analysis. In addition to a large amount of space used, there will be a computational time problem because all elements will be scanned to access non-zero elements. This process yields a computational complexity problem.

To overcome these problems, we can use different data structures to represent a sparse matrix. the following is a 3 tuple representation.The first row of sparse matrix always specifies the number of rows, number of columns and number of non zero elements in the matrix


This format is particularly useful when you want a simple and compact representation of a sparse matrix. However, for more efficient operations like matrix-vector multiplication, you might want to convert it to other formats like CSR or CSC.

1. Representation of Sparse Matrices:

There are several ways to represent sparse matrices, with the choice depending on the specific application and the operations you want to perform efficiently. The most common representations include:

a. Compressed Sparse Row (CSR) Format: In this format, you store three arrays: - Data Array (A): Contains the non-zero elements of the matrix row by row. - Column Indices (JA): Stores the column index of each non-zero element corresponding to the data array. - Row Pointers (IA): Indicates the starting index in the data and column arrays for each row.

b. Compressed Sparse Column (CSC) Format: Similar to CSR, but it organizes the data by columns instead of rows.

c. List of Lists (LIL) Format: In this format, you store a list of lists, where each list corresponds to a row in the matrix. Each inner list contains the non-zero elements and their column indices.

d. Dictionary of Keys (DOK) Format: This format uses a dictionary-like structure, where the keys are pairs of row and column indices, and the values are the non-zero elements.

e. Coordinate List (COO) Format: In this format, you store three arrays: one for row indices, one for column indices, and one for non-zero values. Each entry in these arrays represents a non-zero element in the matrix.

CSR is often preferred for efficient matrix-vector multiplication, making it popular in numerical linear algebra and iterative solvers. CSC is similar to CSR but is used when column-wise operations are more common. It is also prevalent in numerical computing.COO (Coordinate List)  is a straightforward representation that is useful when the structure of the matrix is irregular or when you need to create or modify sparse matrices efficiently. It's a versatile format that's easy to work with for a wide range of operations.

2. Advantages of Sparse Matrices:

a. Reduced Memory Usage: Sparse matrices require significantly less memory compared to dense matrices since they only store non-zero elements.

b. Efficient for Matrix-Vector Multiplication: Many numerical algorithms, such as iterative solvers and eigenvalue calculations, involve matrix-vector multiplication. Sparse matrices can be multiplied with vectors more efficiently because you can skip the multiplication by zero elements.

3. Common Operations on Sparse Matrices:

a. Matrix-Vector Multiplication: This operation is crucial in many numerical algorithms, and sparse matrices excel in this regard.

b. Matrix-Matrix Multiplication: Sparse matrices can also be efficiently multiplied with other matrices, taking advantage of their sparsity patterns.

c. Matrix Transposition: Transposing a sparse matrix retains its sparsity structure, and there are efficient algorithms for transposing sparse matrices.

d. Matrix Factorization: Sparse matrices are used in various factorization techniques, such as LU decomposition, QR decomposition, and singular value decomposition (SVD).

e. Solving Linear Systems: Sparse matrices are often used to solve large systems of linear equations efficiently.

f. Eigenvalue and Eigenvector Calculations: Sparse matrices are involved in eigenvalue problems, where the eigenvalues and eigenvectors of a matrix are computed.

4. Applications of Sparse Matrices:

a. Scientific Computing: Sparse matrices are prevalent in numerical simulations, finite element analysis, and computational science.

b. Machine Learning: Sparse matrices are used in various machine learning algorithms, including sparse coding, dimensionality reduction, and some types of neural networks.

c. Network Analysis: Sparse matrices can represent adjacency matrices in graph theory and are used in network analysis and social network analysis.

d. Natural Language Processing: Sparse matrices can represent term-document matrices in text analysis tasks like document clustering and topic modeling.

e.Recommendation Systems: A sparse matrix can be employed to represent whether any particular user has watched any movie.

f.Market Basket Analysis: Since the number of purchased items is tiny compared to the number of non-purchased items, a sparse matrix is used to represent all products and customers.

In summary, sparse matrices are a valuable tool for efficiently handling large datasets and performing various mathematical operations, making them indispensable in a wide range of scientific and computational applications. The choice of representation and algorithm depends on the specific problem you are trying to solve and the characteristics of your data.

/*************************************************************/
Algorithm- 3 tuple COO Format Representation


Input:
matrix: The input dense matrix of size m rows by n columns.

Output:
coo_representation: A list of tuples representing the non-zero elements of the matrix in the format (row_index, column_index, value).

Algorithm:

1.Initialize an empty list coo_representation to store the three-tuple representation of the matrix.

2.Loop through each element in the input matrix:
For each element at row i and column j, check if it's non-zero (i.e., matrix[i][j] != 0).
If the element is non-zero, create a tuple (i, j, matrix[i][j]) and append it to coo_representation.

3.After processing all elements, coo_representation contains all non-zero elements in tuple form, which represents the sparse matrix.

4.Return coo_representation as the three-tuple representation of the matrix.

C Program- Sparse Matrix to 3 Tuple representation
/**********************************************************************************/
#include<stdio.h>
void read_sparsemat(int mat_normal[100][100],int r,int c)
{
    int i=0,j=0;
    for(i=0;i<r;i++)
    {
        for(j=0;j<c;j++)
        {
            scanf("%d",&mat_normal[i][j]);
        }
    }
}
void print_sparsemat(int mat_normal[100][100],int r,int c)
{
    int i=0,j=0;
    for(i=0;i<r;i++)
    {
        printf("\n");
        for(j=0;j<c;j++)
        {
            printf("%d  ",mat_normal[i][j]);
        }
    }
}
void conv_tuple(int mat_normal[100][100],int r,int c)
{
    //sparse  to Tuple Form Conversion
    int mat_tup[100][3];
    int i,j,cnz=0,tr=0,tc=0;
    for(i=0;i<r;i++)
    {
        for(j=0;j<c;j++)
        {
            if(mat_normal[i][j]!=0)
            {
                cnz++;
                mat_tup[cnz][0]=i;
                mat_tup[cnz][1]=j;
                mat_tup[cnz][2]=mat_normal[i][j];
            }
         }
    }
    mat_tup[0][0]=r;
    mat_tup[0][1]=c;
    mat_tup[0][2]=cnz;
    tr=cnz+1;
    tc=3;
    printf("\nrows columns values");
    for(i=0;i<tr;i++)
    {
        printf("\n");
        for(j=0;j<tc;j++)
        {
            printf("%d  ",mat_tup[i][j]);
        }
    }
  }
void main()
{
    int mat1[100][100],r1,c1,mat2[100][100],r2,c2;
    printf("\nFirst matrix:");
    printf("\nEnter the no. of rows:");
    scanf("%d",&r1);
    printf("\nEnter the no. of columns:");
    scanf("%d",&c1);
    printf("Enter the elements\n");
    read_sparsemat(mat1,r1,c1);
    printf("\nSecond matrix:");
    printf("\nEnter the no. of rows:");
    scanf("%d",&r2);
    printf("\nEnter the no. of columns:");
    scanf("%d",&c2);
    printf("Enter the elements\n");
    read_sparsemat(mat2,r2,c2);
    printf("\nThe first matrix is :");
    print_sparsemat(mat1,r1,c1);
    printf("\nThe second matrix is :");
    print_sparsemat(mat2,r2,c2);
    printf("\nFirst Matrix in tuple form:\n");
    conv_tuple(mat1,r1,c1);
    printf("\nSecond Matrix in tuple form:\n");
    conv_tuple(mat2,r2,c2);
 }

/************************************************/
Algorithm- Addition of sparse matrix

Input:
coo_A and coo_B: Sparse matrices represented in COO format.
num_rows and num_cols: Number of rows and columns in the matrices.

Output:
result: The resulting sparse matrix in COO format.

Algorithm:

1.Initialize an empty list result to store the sum of coo_A and coo_B.

2.Initialize variables i and j to 0. These variables will keep track of the current position in the coo_A and coo_B lists, respectively.

3.Loop while i is less than the length of coo_A and j is less than the length of coo_B.

4.Inside the loop, compare the row and column indices at positions i and j in coo_A and coo_B:

If the row and column indices are the same, it means both matrices have non-zero elements at that position. Add the values at those positions and store the result in the result list as a tuple (row_index, column_index, sum), where sum is the sum of the values. Increment both i and j.

If the row and column indices are different, determine which tuple has a lower row index and add that tuple to the result list. Increment the corresponding i or j variable.

5.Continue this loop until either i or j exceeds the length of the respective lists.

6.After the loop, if there are any remaining tuples in coo_A, append them to the result list.

7.Similarly, if there are any remaining tuples in coo_B, append them to the result list.

8.Return the result list as the sum of coo_A and coo_B in COO format.

Function to add two sparse matrix in 3tuple form
/***************************************************************/
void sum(int mat1[100][3],int mat2[100][3])
{
    int tr1,tr2,n=0,mat[100][3],i,j,k;
    
    if((mat1[0][0]==mat2[0][0]) && (mat1[0][1]==mat2[0][1]))
    {
        
        tr1=mat1[0][2];
        tr2=mat2[0][2];
        i=1;
        j=1;
        k=1;
        n=tr1+tr2;
        while((i<=tr1) &&(j<=tr2))
        {
            if((mat1[i][0]==mat2[j][0]) &&(mat1[i][1]==mat2[j][1]))
            {
                mat[k][0]=mat1[i][0];
                mat[k][1]=mat1[i][1];
                mat[k][2]=mat1[i][2]+mat2[j][2];
                i++;j++;k++;
            }
            
               else if(mat1[i][0]<mat2[j][0])
                {
                    mat[k][0]=mat1[i][0];
                mat[k][1]=mat1[i][1];
                mat[k][2]=mat1[i][2];
                i++;k++;
                }
                else
                {
                     mat[k][0]=mat2[j][0];
                mat[k][1]=mat2[j][1];
                mat[k][2]=mat2[j][2];
                j++;k++;
                }
                       
                   }
        while(i<=tr1)
        {
           mat[k][0]=mat1[i][0];
                mat[k][1]=mat1[i][1];
                mat[k][2]=mat1[i][2];
                i++;k++; 
        }
        while(j<=tr2)
        {
             mat[k][0]=mat2[j][0];
                mat[k][1]=mat2[j][1];
                mat[k][2]=mat2[j][2];
                j++;k++;
        }
        mat[0][0]=mat1[0][0];
        mat[0][1]=mat1[0][1];
        mat[0][2]=k-1;
        printf("\nThe sum is:\n");
         for(i=0;i<k;i++)
    {
        printf("\n");
        for(j=0;j<3;j++)
        {
            
            printf("%d  ",mat[i][j]);
        }
    }
    }
    else
    {
        printf("The matrix is incompatible for addition\n");
    }
}

/*************************************************************************/
Algorithm - Transpose

Input:
coo_matrix: The sparse matrix in COO format.
num_rows and num_cols: The number of rows and columns in the original matrix.

Output:
transpose_coo_matrix: The transpose of the input sparse matrix in COO format.

Algorithm:

1.Initialize an empty list transpose_coo_matrix to store the transpose of the input matrix.

2.Iterate through each element (row, col, value) in the coo_matrix:
    Swap the row and col values to transpose the element.
    Append the transposed element (col, row, value) to the transpose_coo_matrix.

3.After processing all elements, set the dimensions of the transpose_coo_matrix:
    The number of rows becomes num_cols.
    The number of columns becomes num_rows.
    The number of non-zero entries remains the same.

4.Return the transpose_coo_matrix as the transpose of the input matrix.

Function to find the Transpose using 3 tuple
/*****************************************************************/
void transpose(int mat_tup[100][3])
    {
    // Transpose
    int temp[100],i,j;
    int tr=mat_tup[0][2];
    for(i=0;i<tr;i++)
    {
        temp[i]=mat_tup[i][0];
    }
    for(i=0;i<tr;i++) // exchanging row and column indexes
    {
        mat_tup[i][0]= mat_tup[i][1];
         mat_tup[i][1]=temp[i];
    }
    
    printf("\nThe Transpose is\n");
    for(i=0;i<tr+1;i++)
    {
        printf("\n");
        for(j=0;j<3;j++)
        {
            printf("%d  ",mat_tup[i][j]);
        }
    }
 }

Comments

Popular posts from this blog

Data Structures CST 201 KTU Third Semester Syllabus Notes and Solved Questions- Dr Binu V P 984739060

Depth First Search DFS

Binary Search Tree ( BST) and operations