Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for Kavosh Algorithm. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The function reads a Metabolic Network SBML file, and builds Directed Enzyme-Enzyme Networks which is compatible with Kavosh Algorithm. The Kavosh is one of the best motif finding algorithms. Its Cytoscape plugins is also called CytoKavosh. http://lbb.ut.ac.ir/Download/LBBsoft/Kavosh/ & http://www.ncbi.nlm.nih.gov/pubmed/19799800 http://lbb.ut.ac.ir/Download/LBBsoft/CytoKavosh/CytoKavosh-Manual/cytoKavoshTutorial.html So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. The Remove Currency Metabolites based-on Library (RCMLib) algorithm removes currency metabolites in the metabolic network automatically IF AND ONLY IF the currency metabolits exists in the Library file. Note: COBRA Toolbox must be installed in MATLAB before running this function [Output] = enz_cent_RCMLib_dir_kavosh(fileName1,fileName2) INPUTS fileName1 The Library file includes pre-defined currency metabolites (in .txt format) Note: Library text file must include one metabolites per line (all in one column) fileName2 The metabolic Network in the SBML format OUTPUTS *_Enzyme_Cent_RCMLib_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections *_Enzyme_Cent_RCMLib_Dir_Kavosh.dat Directed-Enzyme-Enzyme Network - Kavosh Compatible Yazdan Asgari 12/07/2012 http://lbb.ut.ac.ir %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0001 function [Output] = enz_cent_RCMLib_dir_kavosh(fileName1,fileName2) 0002 % Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for Kavosh Algorithm. 0003 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0004 % The function reads a Metabolic Network SBML file, 0005 % and builds Directed Enzyme-Enzyme Networks which is compatible with Kavosh Algorithm. 0006 % The Kavosh is one of the best motif finding algorithms. Its Cytoscape plugins is also called CytoKavosh. 0007 % http://lbb.ut.ac.ir/Download/LBBsoft/Kavosh/ & http://www.ncbi.nlm.nih.gov/pubmed/19799800 0008 % http://lbb.ut.ac.ir/Download/LBBsoft/CytoKavosh/CytoKavosh-Manual/cytoKavoshTutorial.html 0009 % So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. 0010 % The Remove Currency Metabolites based-on Library (RCMLib) algorithm removes currency metabolites 0011 % in the metabolic network automatically IF AND ONLY IF the currency metabolits exists in the Library file. 0012 % Note: COBRA Toolbox must be installed in MATLAB before running this function 0013 % 0014 % [Output] = enz_cent_RCMLib_dir_kavosh(fileName1,fileName2) 0015 % 0016 %INPUTS 0017 % fileName1 The Library file includes pre-defined currency metabolites (in .txt format) 0018 % Note: Library text file must include one metabolites per line (all in one column) 0019 % fileName2 The metabolic Network in the SBML format 0020 % 0021 %OUTPUTS 0022 % *_Enzyme_Cent_RCMLib_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections 0023 % *_Enzyme_Cent_RCMLib_Dir_Kavosh.dat Directed-Enzyme-Enzyme Network - Kavosh Compatible 0024 % 0025 % Yazdan Asgari 12/07/2012 http://lbb.ut.ac.ir 0026 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0027 0028 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0029 % check validity of input files format 0030 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0031 check1=regexp(fileName1,'.txt'); 0032 assert(~isempty(check1),'Error in the first input: The fileName1 must contain .txt at its end') 0033 check2=regexp(fileName2,'.xml'); 0034 assert(~isempty(check2),'Error in the second input: The fileName2 must contain .xml at its end') 0035 0036 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0037 % start time evaluation of program 0038 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0039 tic; 0040 0041 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0042 % reading the Library text file and construct array of currency metabolites 0043 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0044 fid = fopen(fileName1); 0045 tline = fgetl(fid); 0046 i=1; 0047 Curr_met={}; 0048 while ischar(tline) 0049 Curr_met{i,1}=tline; 0050 tline = fgetl(fid); 0051 i=i+1; 0052 end 0053 fclose(fid); 0054 [h,g]=size(Curr_met); 0055 0056 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0057 % reading the SBML file using COBRA Toolbox Command, and sets size of the S matrix 0058 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0059 model=readCbModel(fileName2); 0060 [m,n]=size(model.S); 0061 0062 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0063 % calculate summation of each columns (i.e. How many metabolites each enzyme correlates) 0064 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0065 S_bin=zeros(size(model.S)); 0066 S_bin(find(model.S))=1; 0067 CB=sum(S_bin,1); 0068 A=zeros(m,n); 0069 B=zeros(m,1); 0070 N3=zeros(m,1); 0071 N_curr=zeros(m,1); 0072 0073 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0074 % reading the Metabolites array and check their availability in the library text file 0075 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0076 for q=1:m 0077 for i=1:h 0078 if strcmp(model.metNames{q},Curr_met{i,1})==1 0079 N_curr(q,1)=N_curr(q,1)+1; 0080 end 0081 end 0082 end 0083 0084 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0085 % for each binary S-matrix element, subtracts its value from the column summation and put the result in the A matrix. 0086 % A(q) means the metabolite q connects to how many other metabolites through the enzyme i. 0087 % for each row, sums the binary S-matrix over all columns. 0088 % B(q) means how many enzymes the metabolite q correlates. 0089 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0090 for q=1:m 0091 for i=1:n 0092 if S_bin(q,i)~=0 0093 A(q,i)=CB(1,i)-S_bin(q,i); 0094 end 0095 B(q,1)=B(q,1)+S_bin(q,i); 0096 end 0097 end 0098 0099 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0100 % Assumption: Generally, every metabolite is connected to the other one through a specific enzyme. 0101 % If a metabolite connects to more than one metabolite through an enzyme, this will be considered as a suspicious case. 0102 % Therefore, every N3(q) value equal to 3 will be marked for further analysis. 0103 % In addition, the availability of the metabolite in the library file will be checked. 0104 % So, the metabolites which do not exist in the library file, will not select for further analysis. 0105 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0106 for q=1:m 0107 for i=1:n 0108 if A(q,i)==3 && N_curr(q,1)~=0 0109 N3(q,1)=N3(q,1)+1; 0110 end 0111 end 0112 end 0113 0114 s=0; 0115 for q=1:m 0116 if N3(q,1)~=0 0117 s=1; 0118 end 0119 end 0120 0121 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0122 % If there is any value for N3 array, the RCM algorithm will be done. 0123 % This algorithm will be deleted the most probable metabolite among all (i.e. the one with the maximum value of N3 and C) 0124 % The selected metabolite will be deleted in the binary S-Matrix, and the "WHILE LOOP" repeated. 0125 % The algorithm is ended if there is not any suspicious case. 0126 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0127 while s==1 0128 C=zeros(m,1); 0129 max1=max(N3,[],1); 0130 for q=1:m 0131 if N3(q,1)==max1 0132 C(q,1)=B(q,1); 0133 else 0134 C(q,1)=0; 0135 end 0136 end 0137 max2=max(C,[],1); 0138 for q=1:m 0139 if ( (N3(q,1)==max1) && (C(q,1)==max2) ) 0140 for i=1:n 0141 S_bin(q,i)=0; 0142 model.S(q,i)=0; 0143 end 0144 end 0145 end 0146 0147 CB=sum(S_bin,1); 0148 A=zeros(m,n); 0149 B=zeros(m,1); 0150 N3=zeros(m,1); 0151 for q=1:m 0152 for i=1:n 0153 if S_bin(q,i)~=0 0154 A(q,i)=CB(1,i)-S_bin(q,i); 0155 end 0156 B(q,1)=B(q,1)+S_bin(q,i); 0157 end 0158 end 0159 for q=1:m 0160 for i=1:n 0161 if A(q,i)==3 && N_curr(q,1)~=0 0162 N3(q,1)=N3(q,1)+1; 0163 end 0164 end 0165 end 0166 s=0; 0167 for q=1:m 0168 if N3(q,1)~=0 0169 s=1; 0170 end 0171 end 0172 end 0173 0174 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0175 % building the output file name 0176 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0177 outname1=strrep(fileName2,'.xml','_Enzyme_Cent_RCMLib_Dir_Index.dat') 0178 fout1 = fopen(outname1, 'w+'); 0179 0180 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0181 % finds non-zero elements of the S-matrix (in order to make the algorithm faster), 0182 % parses through each row, and considers an edge for every unlike-signs, 0183 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0184 num=zeros(size(model.rxns)); 0185 for j=1:m 0186 indices=find(model.S(j,:)); 0187 [a,b]=size(indices); 0188 r=0; 0189 if b~=0 0190 r=1; 0191 end 0192 while r<b 0193 i=1; 0194 while i<(b-r+1) 0195 if model.S(j,indices(1,r))<0 && model.S(j,indices(1,r+i))>0 0196 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0197 num(1,indices(1,r))=1; 0198 num(1,indices(1,r+i))=1; 0199 elseif model.S(j,indices(1,r))>0 && model.S(j,indices(1,r+i))<0 0200 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0201 num(1,indices(1,r))=1; 0202 num(1,indices(1,r+i))=1; 0203 end 0204 i=i+1; 0205 end 0206 r=r+1; 0207 end 0208 end 0209 0210 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0211 % considering nodes which do not contain any edges 0212 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0213 for k=1:n 0214 if num(k,1)==0 0215 fprintf(fout1,'%d\n',k); 0216 end 0217 end 0218 fclose(fout1); 0219 0220 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0221 % building the output file name 0222 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0223 outname2=strrep(fileName2,'.xml','_Enzyme_Cent_RCMLib_Dir_Kavosh.dat') 0224 fout2=fopen(outname2,'w+'); 0225 0226 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0227 % reading the constructed Enzyme-Enzyme network file and re-format it to a Kavosh-compatible file. 0228 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0229 fid = fopen(outname1); 0230 fgetl(fid); 0231 C=fscanf(fid,'%d'); 0232 i=1; 0233 while isinteger(fid) 0234 C(i)=fscanf(fid,'%d',C); 0235 i=i+1; 0236 end 0237 g=size(C); 0238 A=size(unique(C)); 0239 if g~=0 0240 n=1; 0241 else 0242 disp('Error in reading the file, No Edge detected') 0243 end 0244 k=1; 0245 j=2; 0246 last=g/2; 0247 fprintf(fout2,'%d\n',A(1,1)); % total number of uniques nodes in the network (needed for Kavosh Algorithm) 0248 for i=1:last 0249 fprintf(fout2,'%d\t%d\n ',C(k),C(j)); 0250 k=k+2; 0251 j=j+2; 0252 end 0253 fclose(fid); 0254 fclose(fout2); 0255 0256 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0257 % End of time evaluation of program 0258 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0259 toc;