Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites which could be used as an input for QuateXelero Algorithm. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The function reads a Metabolic Network SBML file, and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm. The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team. http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. The Remove Currency Metabolites (RCM) algorithm removes currency metabolites in the metabolic network automatically. Note: COBRA Toolbox must be installed in MATLAB before running this function [Output] = enz_cent_RCM_dir_quatexelero(fileName) INPUTS fileName The metabolic Network in the SBML format OUTPUTS *_Enzyme_Cent_RCM_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections *_Enzyme_Cent_RCM_Dir_QuateXelero.dat Directed-Enzyme-Enzyme Network - QuateXelero Compatible Yazdan Asgari 12/07/2012 http://lbb.ut.ac.ir %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0001 function [Output] = enz_cent_RCM_dir_quatexelero(fileName) 0002 % Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites which could be used as an input for QuateXelero Algorithm. 0003 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0004 % The function reads a Metabolic Network SBML file, 0005 % and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm. 0006 % The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team. 0007 % http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero 0008 % So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. 0009 % The Remove Currency Metabolites (RCM) algorithm removes currency metabolites in the metabolic network automatically. 0010 % Note: COBRA Toolbox must be installed in MATLAB before running this function 0011 % 0012 % [Output] = enz_cent_RCM_dir_quatexelero(fileName) 0013 % 0014 %INPUTS 0015 % fileName The metabolic Network in the SBML format 0016 % 0017 %OUTPUTS 0018 % *_Enzyme_Cent_RCM_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections 0019 % *_Enzyme_Cent_RCM_Dir_QuateXelero.dat Directed-Enzyme-Enzyme Network - QuateXelero Compatible 0020 % 0021 % Yazdan Asgari 12/07/2012 http://lbb.ut.ac.ir 0022 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0023 0024 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0025 % check validity of input file format 0026 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0027 check=regexp(fileName,'.xml'); 0028 assert(~isempty(check),'The SBML fileName must contain .xml at its end') 0029 0030 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0031 % start time evaluation of program 0032 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0033 tic; 0034 0035 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0036 % reading the SBML file using COBRA Toolbox Command, and sets size of the S matrix 0037 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0038 model=readCbModel(fileName); 0039 [m,n]=size(model.S); 0040 0041 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0042 % calculate summation of each columns (i.e. How many metabolites each enzyme correlates) 0043 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0044 S_bin=zeros(size(model.S)); 0045 S_bin(find(model.S))=1; 0046 CB=sum(S_bin,1); 0047 A=zeros(m,n); 0048 B=zeros(m,1); 0049 N3=zeros(m,1); 0050 0051 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0052 % for each binary S-matrix element, subtracts its value from the column summation and put the result in the A matrix. 0053 % A(q) means the metabolite q connects to how many other metabolites through the enzyme i. 0054 % for each row, sums the binary S-matrix over all columns. 0055 % B(q) means how many enzymes the metabolite q correlates. 0056 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0057 for q=1:m 0058 for i=1:n 0059 if S_bin(q,i)~=0 0060 A(q,i)=CB(1,i)-S_bin(q,i); 0061 end 0062 B(q,1)=B(q,1)+S_bin(q,i); 0063 end 0064 end 0065 0066 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0067 % Assumption: Generally, every metabolite is connected to the other one through a specific enzyme. 0068 % If a metabolite connects to more than one metabolite through an enzyme, this will be considered as a suspicious case. 0069 % Therefore, every N3(q) value equal to 3 will be marked for further analysis. 0070 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0071 for q=1:m 0072 for i=1:n 0073 if A(q,i)==3 0074 N3(q,1)=N3(q,1)+1; 0075 end 0076 end 0077 end 0078 0079 s=0; 0080 for q=1:m 0081 if N3(q,1)~=0 0082 s=1; 0083 end 0084 end 0085 0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0087 % If there is any value for N3 array, the RCM algorithm will be done. 0088 % This algorithm will be deleted the most probable metabolite among all (i.e. the one with the maximum value of N3 and C) 0089 % The selected metabolite will be deleted in the binary S-Matrix, and the "WHILE LOOP" repeated. 0090 % The algorithm is ended if there is not any suspicious case. 0091 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0092 while s==1 0093 C=zeros(m,1); 0094 max1=max(N3,[],1); 0095 for q=1:m 0096 if N3(q,1)==max1 0097 C(q,1)=B(q,1); 0098 else 0099 C(q,1)=0; 0100 end 0101 end 0102 max2=max(C,[],1); 0103 for q=1:m 0104 if ( (N3(q,1)==max1) && (C(q,1)==max2) ) 0105 for i=1:n 0106 S_bin(q,i)=0; 0107 model.S(q,i)=0; 0108 end 0109 end 0110 end 0111 CB=sum(S_bin,1); 0112 A=zeros(m,n); 0113 B=zeros(m,1); 0114 N3=zeros(m,1); 0115 for q=1:m 0116 for i=1:n 0117 if S_bin(q,i)~=0 0118 A(q,i)=CB(1,i)-S_bin(q,i); 0119 end 0120 B(q,1)=B(q,1)+S_bin(q,i); 0121 end 0122 end 0123 for q=1:m 0124 for i=1:n 0125 if A(q,i)==3 0126 N3(q,1)=N3(q,1)+1; 0127 end 0128 end 0129 end 0130 s=0; 0131 for q=1:m 0132 if N3(q,1)~=0 0133 s=1; 0134 end 0135 end 0136 end 0137 0138 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0139 % building the output file name 0140 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0141 outname1=strrep(fileName,'.xml','_Enzyme_Cent_RCM_Dir_Index.dat') 0142 fout1 = fopen(outname1, 'w+'); 0143 0144 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0145 % finds non-zero elements of the S-matrix (in order to make the algorithm faster), 0146 % parses through each row, and considers an edge for every unlike-signs, 0147 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0148 num=zeros(size(model.rxns)); 0149 for j=1:m 0150 indices=find(model.S(j,:)); 0151 [a,b]=size(indices); 0152 r=0; 0153 if b~=0 0154 r=1; 0155 end 0156 while r<b 0157 i=1; 0158 while i<(b-r+1) 0159 if model.S(j,indices(1,r))<0 && model.S(j,indices(1,r+i))>0 0160 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0161 num(1,indices(1,r))=1; 0162 num(1,indices(1,r+i))=1; 0163 elseif model.S(j,indices(1,r))>0 && model.S(j,indices(1,r+i))<0 0164 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0165 num(1,indices(1,r))=1; 0166 num(1,indices(1,r+i))=1; 0167 end 0168 i=i+1; 0169 end 0170 r=r+1; 0171 end 0172 end 0173 0174 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0175 % considering nodes which do not contain any edges 0176 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0177 for k=1:n 0178 if num(k,1)==0 0179 fprintf(fout1,'%d\n',k); 0180 end 0181 end 0182 fclose(fout1); 0183 0184 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0185 % building the output file name 0186 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0187 outname2=strrep(fileName,'.xml','_Enzyme_Cent_RCM_Dir_QuateXelero.dat') 0188 fout2=fopen(outname2,'w+'); 0189 0190 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0191 % reading the constructed Enzyme-Enzyme network file and re-format it to a Kavosh-compatible file. 0192 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0193 fid = fopen(outname1); 0194 fgetl(fid); 0195 C=fscanf(fid,'%d'); 0196 i=1; 0197 while isinteger(fid) 0198 C(i)=fscanf(fid,'%d',C); 0199 i=i+1; 0200 end 0201 g=size(C); 0202 A=size(unique(C)); 0203 if g~=0 0204 n=1; 0205 else 0206 disp('Error in reading the file, No Edge detected') 0207 end 0208 k=1; 0209 j=2; 0210 last=g/2; 0211 fprintf(fout2,'%d\n',A(1,1)); % total number of uniques nodes in the network (needed for Kavosh Algorithm) 0212 for i=1:last 0213 fprintf(fout2,'%d\t%d\n ',C(k),C(j)); 0214 k=k+2; 0215 j=j+2; 0216 end 0217 fclose(fid); 0218 fclose(fout2); 0219 0220 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0221 % End of time evaluation of program 0222 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0223 toc;