Home > . > enz_cent_RCMLib_dir_quatexelero.m

enz_cent_RCMLib_dir_quatexelero

PURPOSE ^

Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for QuateXelero Algorithm.

SYNOPSIS ^

function [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2)

DESCRIPTION ^

 Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for QuateXelero Algorithm.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 The function reads a Metabolic Network SBML file, 
 and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm.
 The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team.
 http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero
 So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network.
 The Remove Currency Metabolites based-on Library (RCMLib) algorithm removes currency metabolites 
 in the metabolic network automatically IF AND ONLY IF the currency metabolits exists in the Library file.
 Note: COBRA Toolbox must be installed in MATLAB before running this function

 [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2)

INPUTS
 fileName1                                   The Library file includes pre-defined currency metabolites (in .txt format)
 Note: Library text file must include one metabolites per line (all in one column) 
 fileName2                                   The metabolic Network in the SBML format
 
OUTPUTS
 *_Enzyme_Cent_RCMLib_Dir_Index.dat          Matrix Indeces of Enzyme-Enzyme Connections 
 *_Enzyme_Cent_RCMLib_Dir_QuateXelero.dat    Directed-Enzyme-Enzyme Network - QuateXelero Compatible
 
 Yazdan Asgari 12/07/2012                 http://lbb.ut.ac.ir
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2)
0002 % Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for QuateXelero Algorithm.
0003 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0004 % The function reads a Metabolic Network SBML file,
0005 % and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm.
0006 % The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team.
0007 % http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero
0008 % So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network.
0009 % The Remove Currency Metabolites based-on Library (RCMLib) algorithm removes currency metabolites
0010 % in the metabolic network automatically IF AND ONLY IF the currency metabolits exists in the Library file.
0011 % Note: COBRA Toolbox must be installed in MATLAB before running this function
0012 %
0013 % [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2)
0014 %
0015 %INPUTS
0016 % fileName1                                   The Library file includes pre-defined currency metabolites (in .txt format)
0017 % Note: Library text file must include one metabolites per line (all in one column)
0018 % fileName2                                   The metabolic Network in the SBML format
0019 %
0020 %OUTPUTS
0021 % *_Enzyme_Cent_RCMLib_Dir_Index.dat          Matrix Indeces of Enzyme-Enzyme Connections
0022 % *_Enzyme_Cent_RCMLib_Dir_QuateXelero.dat    Directed-Enzyme-Enzyme Network - QuateXelero Compatible
0023 %
0024 % Yazdan Asgari 12/07/2012                 http://lbb.ut.ac.ir
0025 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0026 
0027 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0028 % check validity of input files format
0029 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0030 check1=regexp(fileName1,'.txt');
0031 assert(~isempty(check1),'Error in the first input: The fileName1 must contain .txt at its end')
0032 check2=regexp(fileName2,'.xml');
0033 assert(~isempty(check2),'Error in the second input: The fileName2 must contain .xml at its end')
0034 
0035 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0036 % start time evaluation of program
0037 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0038 tic;
0039 
0040 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0041 % reading the Library text file and construct array of currency metabolites
0042 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0043 fid = fopen(fileName1);
0044 tline = fgetl(fid);
0045 i=1;
0046 Curr_met={};
0047 while ischar(tline)
0048     Curr_met{i,1}=tline;
0049     tline = fgetl(fid);
0050     i=i+1;
0051 end
0052 fclose(fid);
0053 [h,g]=size(Curr_met);
0054 
0055 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0056 % reading the SBML file using COBRA Toolbox Command, and sets size of the S matrix
0057 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0058 model=readCbModel(fileName2);
0059 [m,n]=size(model.S);
0060 
0061 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0062 % calculate summation of each columns (i.e. How many metabolites each enzyme correlates)
0063 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0064 S_bin=zeros(size(model.S));
0065 S_bin(find(model.S))=1;
0066 CB=sum(S_bin,1);
0067 A=zeros(m,n);
0068 B=zeros(m,1);
0069 N3=zeros(m,1);
0070 N_curr=zeros(m,1);
0071 
0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0073 % reading the Metabolites array and check their availability in the library text file
0074 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0075 for q=1:m
0076     for i=1:h
0077         if strcmp(model.metNames{q},Curr_met{i,1})==1
0078             N_curr(q,1)=N_curr(q,1)+1;
0079         end
0080     end
0081 end
0082 
0083 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0084 % for each binary S-matrix element, subtracts its value from the column summation and put the result in the A matrix.
0085 % A(q) means the metabolite q connects to how many other metabolites through the enzyme i.
0086 % for each row, sums the binary S-matrix over all columns.
0087 % B(q) means how many enzymes the metabolite q correlates.
0088 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0089 for q=1:m
0090     for i=1:n
0091         if S_bin(q,i)~=0
0092             A(q,i)=CB(1,i)-S_bin(q,i);
0093         end
0094         B(q,1)=B(q,1)+S_bin(q,i);
0095     end
0096 end
0097 
0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0099 % Assumption: Generally, every metabolite is connected to the other one through a specific enzyme.
0100 % If a metabolite connects to more than one metabolite through an enzyme, this will be considered as a suspicious case.
0101 % Therefore, every N3(q) value equal to 3 will be marked for further analysis.
0102 % In addition, the availability of the metabolite in the library file will be checked.
0103 % So, the metabolites which do not exist in the library file, will not select for further analysis.
0104 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0105 for q=1:m
0106     for i=1:n
0107         if A(q,i)==3 && N_curr(q,1)~=0
0108             N3(q,1)=N3(q,1)+1;
0109         end
0110     end
0111 end
0112 
0113 s=0;
0114 for q=1:m
0115     if N3(q,1)~=0
0116         s=1;
0117     end
0118 end
0119 
0120 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0121 % If there is any value for N3 array, the RCM algorithm will be done.
0122 % This algorithm will be deleted the most probable metabolite among all (i.e. the one with the maximum value of N3 and C)
0123 % The selected metabolite will be deleted in the binary S-Matrix, and the "WHILE LOOP" repeated.
0124 % The algorithm is ended if there is not any suspicious case.
0125 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0126 while s==1
0127     C=zeros(m,1);
0128     max1=max(N3,[],1);
0129     for q=1:m
0130         if N3(q,1)==max1
0131             C(q,1)=B(q,1);
0132         else
0133             C(q,1)=0;
0134         end
0135     end
0136     max2=max(C,[],1);
0137     for q=1:m
0138         if ( (N3(q,1)==max1) && (C(q,1)==max2) )
0139             for i=1:n
0140                 S_bin(q,i)=0;
0141                 model.S(q,i)=0;
0142             end
0143         end
0144     end
0145     
0146     CB=sum(S_bin,1);
0147     A=zeros(m,n);
0148     B=zeros(m,1);
0149     N3=zeros(m,1);
0150     for q=1:m
0151         for i=1:n
0152             if S_bin(q,i)~=0
0153                 A(q,i)=CB(1,i)-S_bin(q,i);
0154             end
0155             B(q,1)=B(q,1)+S_bin(q,i);
0156         end
0157     end
0158     for q=1:m
0159         for i=1:n
0160             if A(q,i)==3 && N_curr(q,1)~=0
0161                 N3(q,1)=N3(q,1)+1;
0162             end
0163         end
0164     end
0165     s=0;
0166     for q=1:m
0167         if N3(q,1)~=0
0168             s=1;
0169         end
0170     end
0171 end
0172 
0173 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0174 % building the output file name
0175 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0176 outname1=strrep(fileName2,'.xml','_Enzyme_Cent_RCMLib_Dir_Index.dat')
0177 fout1 = fopen(outname1, 'w+');
0178 
0179 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0180 % finds non-zero elements of the S-matrix (in order to make the algorithm faster),
0181 % parses through each row, and considers an edge for every unlike-signs,
0182 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0183 num=zeros(size(model.rxns));
0184 for j=1:m
0185     indices=find(model.S(j,:));
0186     [a,b]=size(indices);
0187     r=0;
0188     if b~=0
0189         r=1;
0190     end 
0191     while r<b
0192         i=1;
0193         while i<(b-r+1)
0194             if model.S(j,indices(1,r))<0 && model.S(j,indices(1,r+i))>0
0195                 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i));
0196                 num(1,indices(1,r))=1;
0197                 num(1,indices(1,r+i))=1;
0198             elseif model.S(j,indices(1,r))>0 && model.S(j,indices(1,r+i))<0
0199                 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r));
0200                 num(1,indices(1,r))=1;
0201                 num(1,indices(1,r+i))=1;
0202             end
0203             i=i+1;
0204         end
0205         r=r+1;
0206     end
0207 end
0208 
0209 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0210 % considering nodes which do not contain any edges
0211 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0212 for k=1:n
0213     if num(k,1)==0
0214         fprintf(fout1,'%d\n',k);
0215     end
0216 end
0217 fclose(fout1);
0218 
0219 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0220 % building the output file name
0221 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0222 outname2=strrep(fileName2,'.xml','_Enzyme_Cent_RCMLib_Dir_QuateXelero.dat')    
0223 fout2=fopen(outname2,'w+');
0224 
0225 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0226 % reading the constructed Enzyme-Enzyme network file and re-format it to a Kavosh-compatible file.
0227 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0228 fid = fopen(outname1);
0229 fgetl(fid);
0230 C=fscanf(fid,'%d');
0231 i=1;
0232 while isinteger(fid)
0233     C(i)=fscanf(fid,'%d',C);
0234     i=i+1;
0235 end
0236 g=size(C);
0237 A=size(unique(C));
0238 if g~=0
0239     n=1;
0240 else
0241     disp('Error in reading the file, No Edge detected')
0242 end
0243 k=1;
0244 j=2;
0245 last=g/2;
0246 fprintf(fout2,'%d\n',A(1,1));   % total number of uniques nodes in the network (needed for Kavosh Algorithm)
0247 for i=1:last
0248     fprintf(fout2,'%d\t%d\n ',C(k),C(j));
0249     k=k+2;
0250     j=j+2;
0251 end
0252 fclose(fid);
0253 fclose(fout2);
0254 
0255 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0256 % End of time evaluation of program
0257 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0258 toc;

Generated on Thu 13-Dec-2012 14:17:37 by m2html © 2005