Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites which could be used as an input for QuateXelero Algorithm. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The function reads a Metabolic Network SBML file, and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm. The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team. http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. The Remove Currency Metabolites (RCM) algorithm removes currency metabolites in the metabolic network automatically. Note: COBRA Toolbox must be installed in MATLAB before running this function [Output] = enz_cent_RCM_dir_quatexelero(fileName) INPUTS fileName The metabolic Network in the SBML format OUTPUTS *_Enzyme_Cent_RCM_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections *_Enzyme_Cent_RCM_Dir_QuateXelero.dat Directed-Enzyme-Enzyme Network - QuateXelero Compatible Yazdan Asgari 07/16/2016 http://yazdan59.ir/scan %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0001 function [Output] = enz_cent_RCM_dir_quatexelero(fileName) 0002 % Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites which could be used as an input for QuateXelero Algorithm. 0003 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0004 % The function reads a Metabolic Network SBML file, 0005 % and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm. 0006 % The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team. 0007 % http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero 0008 % So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. 0009 % The Remove Currency Metabolites (RCM) algorithm removes currency metabolites in the metabolic network automatically. 0010 % Note: COBRA Toolbox must be installed in MATLAB before running this function 0011 % 0012 % [Output] = enz_cent_RCM_dir_quatexelero(fileName) 0013 % 0014 %INPUTS 0015 % fileName The metabolic Network in the SBML format 0016 % 0017 %OUTPUTS 0018 % *_Enzyme_Cent_RCM_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections 0019 % *_Enzyme_Cent_RCM_Dir_QuateXelero.dat Directed-Enzyme-Enzyme Network - QuateXelero Compatible 0020 % 0021 % Yazdan Asgari 07/16/2016 http://yazdan59.ir/scan 0022 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0023 0024 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0025 % check validity of input file format 0026 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0027 check=regexp(fileName,'.xml'); 0028 assert(~isempty(check),'The SBML fileName must contain .xml at its end') 0029 0030 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0031 % start time evaluation of program 0032 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0033 tic; 0034 0035 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0036 % reading the SBML file using COBRA Toolbox Command, and sets size of the S matrix 0037 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0038 model=readCbModel(fileName); 0039 [m,n]=size(model.S); 0040 0041 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0042 % calculate summation of each columns (i.e. How many metabolites each enzyme correlates) 0043 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0044 S_bin=zeros(size(model.S)); 0045 S_bin(find(model.S))=1; 0046 CB=sum(S_bin,1); 0047 A=zeros(m,n); 0048 B=zeros(m,1); 0049 N3=zeros(m,1); 0050 0051 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0052 % for each binary S-matrix element, subtracts its value from the column summation and put the result in the A matrix. 0053 % A(q) means the metabolite q connects to how many other metabolites through the enzyme i. 0054 % for each row, sums the binary S-matrix over all columns. 0055 % B(q) means how many enzymes the metabolite q correlates. 0056 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0057 for q=1:m 0058 for i=1:n 0059 if S_bin(q,i)~=0 0060 A(q,i)=CB(1,i)-S_bin(q,i); 0061 end 0062 B(q,1)=B(q,1)+S_bin(q,i); 0063 end 0064 end 0065 0066 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0067 % Assumption: Generally, every metabolite is connected to the other one through a specific enzyme. 0068 % If a metabolite connects to more than one metabolite through an enzyme, this will be considered as a suspicious case. 0069 % Therefore, every A(q) value equal to 3 will be marked for further analysis. 0070 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0071 for q=1:m 0072 for i=1:n 0073 if A(q,i)==3 0074 N3(q,1)=N3(q,1)+1; 0075 end 0076 end 0077 end 0078 0079 s=0; 0080 for q=1:m 0081 if N3(q,1)~=0 0082 s=1; 0083 end 0084 end 0085 0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0087 % If there is any value for N3 array, the RCM algorithm will be done. 0088 % This algorithm will be deleted the most probable metabolite among all (i.e. the one with the maximum value of N3 and C) 0089 % The selected metabolite will be deleted in the binary S-Matrix, and the "WHILE LOOP" repeated. 0090 % The algorithm is ended if there is not any suspicious case. 0091 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0092 while s==1 0093 C=zeros(m,1); 0094 max1=max(N3,[],1); 0095 for q=1:m 0096 if N3(q,1)==max1 0097 C(q,1)=B(q,1); 0098 else 0099 C(q,1)=0; 0100 end 0101 end 0102 max2=max(C,[],1); 0103 for q=1:m 0104 if ( (N3(q,1)==max1) && (C(q,1)==max2) ) 0105 for i=1:n 0106 S_bin(q,i)=0; 0107 model.S(q,i)=0; 0108 end 0109 end 0110 end 0111 CB=sum(S_bin,1); 0112 A=zeros(m,n); 0113 B=zeros(m,1); 0114 N3=zeros(m,1); 0115 for q=1:m 0116 for i=1:n 0117 if S_bin(q,i)~=0 0118 A(q,i)=CB(1,i)-S_bin(q,i); 0119 end 0120 B(q,1)=B(q,1)+S_bin(q,i); 0121 end 0122 end 0123 for q=1:m 0124 for i=1:n 0125 if A(q,i)==3 0126 N3(q,1)=N3(q,1)+1; 0127 end 0128 end 0129 end 0130 s=0; 0131 for q=1:m 0132 if N3(q,1)~=0 0133 s=1; 0134 end 0135 end 0136 end 0137 0138 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0139 % building the output file name 0140 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0141 outname1=strrep(fileName,'.xml','_Enzyme_Cent_RCM_Dir_Index.dat') 0142 fout1 = fopen(outname1, 'w+'); 0143 0144 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0145 % finds non-zero elements of the S-matrix (in order to make the algorithm faster), 0146 % parses through each row, and considers an edge for every unlike-signs. 0147 % It also consider Reversibility for Enzyme-Enzyme network. 0148 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0149 num=zeros(size(model.rxns)); 0150 for j=1:m 0151 indices=find(model.S(j,:)); 0152 [a,b]=size(indices); 0153 r=0; 0154 if b~=0 0155 r=1; 0156 end 0157 while r<b 0158 i=1; 0159 while i<(b-r+1) 0160 if model.S(j,indices(1,r))>0 && model.S(j,indices(1,r+i))<0 0161 if model.rev(indices(r+i))==1 && model.rev(indices(r))==1 0162 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0163 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0164 num(1,indices(1,r))=1; 0165 num(1,indices(1,r+i))=1; 0166 else 0167 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0168 num(1,indices(1,r))=1; 0169 num(1,indices(1,r+i))=1; 0170 end 0171 elseif model.S(j,indices(1,r))<0 && model.S(j,indices(1,r+i))>0 0172 if model.rev(indices(r+i))==1 && model.rev(indices(r))==1 0173 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0174 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0175 num(1,indices(1,r))=1; 0176 num(1,indices(1,r+i))=1; 0177 else 0178 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0179 num(1,indices(1,r))=1; 0180 num(1,indices(1,r+i))=1; 0181 end 0182 end 0183 i=i+1; 0184 end 0185 r=r+1; 0186 end 0187 end 0188 0189 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0190 % building the output file name 0191 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0192 outname2=strrep(fileName,'.xml','_Enzyme_Cent_RCM_Dir_QuateXelero.dat') 0193 fout2=fopen(outname2,'w+'); 0194 0195 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0196 % reading the constructed Enzyme-Enzyme network file and re-format it to a Kavosh-compatible file. 0197 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0198 fid = fopen(outname1); 0199 C=fscanf(fid,'%d'); 0200 i=1; 0201 while isinteger(fid) 0202 C(i)=fscanf(fid,'%d',C); 0203 i=i+1; 0204 end 0205 g=size(C); 0206 A=size(unique(C)); 0207 if g~=0 0208 n=1; 0209 else 0210 disp('Error in reading the file, No Edge detected') 0211 end 0212 k=1; 0213 j=2; 0214 last=g/2; 0215 fprintf(fout2,'%d\n',A(1,1)); % total number of uniques nodes in the network (needed for Kavosh Algorithm) 0216 for i=1:last 0217 fprintf(fout2,'%d\t%d\n ',C(k),C(j)); 0218 k=k+2; 0219 j=j+2; 0220 end 0221 fclose(fid); 0222 fclose(fout2); 0223 0224 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0225 % End of time evaluation of program 0226 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0227 toc;