diff --git a/.gitignore b/.gitignore index 7e2147a..64ad9fc 100644 --- a/.gitignore +++ b/.gitignore @@ -26,3 +26,69 @@ octave-workspace # .DS_store files .DS_Store + +*.sublime-project +*.sublime-workspace + +*.txt +*.todo + +# Image files +*.png +*.tiff +*.eps +*.jpg +*.svg +*.fig +# data files +*.gc +*.xlsx +*.csv + +# movie files +*.mp4 + +youngha_data_conversion/ +html/ + +# Compiled source # +################### +*.com +*.class +*.dll +*.exe +*.o +*.so + +# Packages # +############ +# it's better to unpack these files and commit the raw source +# git has its own built in compression methods +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar +*.zip + +# Logs and databases # +###################### +*.log +*.sql +*.sqlite + +# OS generated files # +###################### +.DS_Store +.DS_Store? +._* +.Spotlight-V100 +.Trashes +ehthumbs.db +Thumbs.db + +# MATLAB Crash Dumps # +###################### +Crash_logs_*/ diff --git a/README b/README deleted file mode 100644 index 0588c08..0000000 --- a/README +++ /dev/null @@ -1,23 +0,0 @@ -This is the main directory for the TX-TL cell-free expression toolbox. -The subdiretories here contain models and code that can be used in -working with the TX-TL system. - -Contents: - - ChangeLog - list of changes that have been made - Makefile - makefile for creating distributions - Pending - (obsolete) list of pending tasks. See wiki version - aux/ - auxiliary function - components/ - component models - core/ - core functionality - data/ - default directory for wetlab experiment data - doc/ - toolbox documentation - examples/ - examples that use the toolbox - models/ - (obsolete) old directory structure. To be deleted. - modules/ - additional modules to the toolbox (e.g. parameter estimation) - tests/ - unit tests - tmp/ - temporarily stored filled (mostly script generated) - txtl_init.m - script to initalize toolbox (mainly setting up the path) - unused/ - old files that are no longer used and scheduled to be deleted - - diff --git a/README.md b/README.md new file mode 100644 index 0000000..c6fa4b3 --- /dev/null +++ b/README.md @@ -0,0 +1,45 @@ +# Introduction + +This package contains two related toolboxes that work with MATALB Simbiology models: txtlsim and mcmc_simbio. + +The first toolbox is called txtlsim (ref), and can be used to simulate the chemical reations that occur in the Transcription-Translation (TX-TL) cell-free gene expression system developed at the University of Minnesota (Vincent Noireaux) and Caltech (Richard Murray) (refs). The main features of this toolbox are its ability to track the loading of enzymatic machinery, the consumption of resources, the ease of setting up models, the automatic accounting of retroactivity effects, and the extensibility the reaction networks generated. + +The second toolbox is called mcmc_simbio. This is a concurrent Bayesian parameter inference toolbox for MATLAB Simbiology models. The Bayesian parameter inference is performed using a modification of Aslak Grinsted's MATLAB implementation of the affine invariant ensemble Metropolis-Hastings MCMC sampler (ref). We have added support for what we call 'concurrent' parameter inference, which refers to the capability to estimate a common set of parameters that get used simultaneously and in arbitrary combinations in multiple experiments/models. More information can be found below. + +# The Toolboxes + +## Getting the toolbox and running some simple examples + +Clone the repository using + +``` +git clone https://github.com/BuildACell/txtlsim.git +``` + +into a directory you wish to put the toolbox in. Alternatively, if you do not plan to version control the toolbox, you can simply download it as a zip file using the green button on the [main page](https://github.com/BuildACell/txtlsim). + +Lets call the directory where you cloned or downloaded the repository `trunk`, i.e., this is where directories like `core`, `components`, `examples` and `mcmc_simbio` are. Open MATLAB, and set the current working directory to `trunk`. Type in `txtl_init` and `mcmc_init` into the MATLAB command line. This initializes the txtlsim and mcmc_simbio toolboxes. + +Check if both the toolboxes are installed properly by running the follwoing examples: + +### txtlsim examples +Start with typing in `geneexpr` into the command window. This should run a constitutive gene expression example in the toolbox, and you should see a plot with three subplots (protein; mRNA and DNA; and resource usage) appear. There should not be any error messages in the command window. + +Next, run the `negautoreg` example in the command line. Again, you should see a plot of the species in the system, and no errors. This example simulates the negative autoregulation circuit. + +Type in `edit TXTL_tutorial` into the command line to open the tutorial file. You can also use the MATLAB publisher button to publish this file, and look at it in the MATLAB help file markup. We recommend running through the examples in this file, and exploring the reactions, species, etc set up in this file. Familiarity with the MATLAB Simbiology command line is helpful here. To learn more about Simbiology, go to the [Getting Started Using the Simbiology Command Line](https://www.mathworks.com/help/simbio/gs/simbiology-command-line-tutorial.html) page. + +### mcmc_simbio examples + +Next, open and explore the mcmc_simbio estimation examples given in the files `proj_mcmc_tutorial`, `proj_mcmc_tutorial_II`, and `proj_mcmc_tutorial_III` in the `trunk\mcmc_simbio\proj\` directory. We strongly recommend you skim through the `mcmc_info.m` and `data_info` files (`trunk\mcmc_simbio\models_and_supporting_files\` or type `help mcmc_info` and `help data_info` into the MATLAB command line) to gain an understanding of some of the key functionalities of the parameter inference toolbox. Along with the three tutorial files, the `mcmc_info.m` and the `data_info.m` files provide an initial idea of the capabilities of the toolbox. + +## References + +More information can be found in the following references: + +Z. A. Tuza, V. Singhal, J. Kim and R. M. Murray, "An in silico modeling toolbox for rapid prototyping of circuits in a biomolecular “breadboard” system," 52nd IEEE Conference on Decision and Control, Firenze, 2013, pp. 1404-1410. +doi: 10.1109/CDC.2013.6760079 + + +Vipul Singhal, 2018 +California Institute of Technology \ No newline at end of file diff --git a/auxiliary/boundedline/.gitignore b/auxiliary/boundedline/.gitignore new file mode 100755 index 0000000..c2bfeef --- /dev/null +++ b/auxiliary/boundedline/.gitignore @@ -0,0 +1,45 @@ +# MATLAB # +########## +# Editor autosave files +*~ +*.asv +# Compiled MEX binaries (all platforms) +*.mex* + +# Compiled source # +################### +*.com +*.class +*.dll +*.exe +*.o +*.so + +# Packages # +############ +# it's better to unpack these files and commit the raw source +# git has its own built in compression methods +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar +*.zip + +# Logs and databases # +###################### +*.log +*.sql +*.sqlite + +# OS generated files # +###################### +.DS_Store +.DS_Store? +._* +.Spotlight-V100 +.Trashes +ehthumbs.db +Thumbs.db \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/.gitignore b/auxiliary/boundedline/Inpaint_nans/.gitignore new file mode 100755 index 0000000..e492200 --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/.gitignore @@ -0,0 +1,37 @@ +# Compiled source # +################### +*.com +*.class +*.dll +*.exe +*.o +*.so + +# Packages # +############ +# it's better to unpack these files and commit the raw source +# git has its own built in compression methods +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar +*.zip + +# Logs and databases # +###################### +*.log +*.sql +*.sqlite + +# OS generated files # +###################### +.DS_Store +.DS_Store? +._* +.Spotlight-V100 +.Trashes +ehthumbs.db +Thumbs.db \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo.html b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo.html new file mode 100755 index 0000000..de6698a --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo.html @@ -0,0 +1,161 @@ + + + + + + inpaint_nans_demo + + + +
% Surface fit artifact removal
+[x,y] = meshgrid(0:.01:1);
+z0 = exp(x+y);
+
+znan = z0;
+znan(20:50,40:70) = NaN;
+znan(30:90,5:10) = NaN;
+znan(70:75,40:90) = NaN;
+
+z = inpaint_nans(znan,3);
+
+% Comparison to griddata
+k = isnan(znan);
+zk = griddata(x(~k),y(~k),z(~k),x(k),y(k));
+zg = znan;
+zg(k) = zk;
+
+close all
+figure
+surf(z0)
+title 'Original surface'
+
+figure
+surf(znan)
+title 'Artifacts (large holes) in surface'
+
+figure
+surf(zg)
+title(['Griddata inpainting (',num2str(sum(isnan(zg(:)))),' NaNs remain)'])
+
+figure
+surf(z)
+title 'Inpainted surface'
+
+figure
+surf(zg-z0)
+title 'Griddata error surface'
+
+figure
+surf(z-z0)
+title 'Inpainting error surface (Note z-axis scale)'
+
+ + + \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo.png new file mode 100755 index 0000000..954751b Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_01.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_01.png new file mode 100755 index 0000000..ad81599 Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_01.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_02.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_02.png new file mode 100755 index 0000000..a810f13 Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_02.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_03.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_03.png new file mode 100755 index 0000000..023f803 Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_03.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_04.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_04.png new file mode 100755 index 0000000..12fcd8c Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_04.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_05.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_05.png new file mode 100755 index 0000000..fa40492 Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_05.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_06.png b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_06.png new file mode 100755 index 0000000..5465081 Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/demo/html/inpaint_nans_demo_06.png differ diff --git a/auxiliary/boundedline/Inpaint_nans/demo/inpaint_nans_demo_old.m b/auxiliary/boundedline/Inpaint_nans/demo/inpaint_nans_demo_old.m new file mode 100755 index 0000000..f478bbf --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/demo/inpaint_nans_demo_old.m @@ -0,0 +1,42 @@ +% Surface fit artifact removal +[x,y] = meshgrid(0:.01:1); +z0 = exp(x+y); + +znan = z0; +znan(20:50,40:70) = NaN; +znan(30:90,5:10) = NaN; +znan(70:75,40:90) = NaN; + +z = inpaint_nans(znan,3); + +% Comparison to griddata +k = isnan(znan); +zk = griddata(x(~k),y(~k),z(~k),x(k),y(k)); +zg = znan; +zg(k) = zk; + +close all +figure +surf(z0) +title 'Original surface' + +figure +surf(znan) +title 'Artifacts (large holes) in surface' + +figure +surf(zg) +title(['Griddata inpainting (',num2str(sum(isnan(zg(:)))),' NaNs remain)']) + +figure +surf(z) +title 'Inpainted surface' + +figure +surf(zg-z0) +title 'Griddata error surface' + +figure +surf(z-z0) +title 'Inpainting error surface (Note z-axis scale)' + diff --git a/auxiliary/boundedline/Inpaint_nans/doc/Nomination comments.rtf b/auxiliary/boundedline/Inpaint_nans/doc/Nomination comments.rtf new file mode 100755 index 0000000..4173630 --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/doc/Nomination comments.rtf @@ -0,0 +1,39 @@ +{\rtf1\mac\ansicpg10000\cocoartf102 +{\fonttbl\f0\fswiss\fcharset77 Helvetica;} +{\colortbl;\red255\green255\blue255;} +\margl1440\margr1440\vieww10780\viewh13720\viewkind0 +\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural + +\f0\fs24 \cf0 Nomination comments:\ +\ +Inpaint_nans fills a hole in matlab. (Yes, the pun was intentional.) But there\ +is indeed a niche that inpaint_nans falls into.\ +\ +The alternative to inpaint_nans is griddata (interp1 can be used for the 1-d \ +problems) but griddata fails to extrapolate well. Griddata also has serious\ +problems when its data already lies on a grid, due to its use of a Delaunay \ +triangulation. The other serious problem with the use of griddata is the\ +triangulation itself. The shape of the hole to be filled can sometimes result\ +in triangles with a poor aspect ratio (long, thin triangles) which are in turn\ +poor for interpolation. In fact, Griddata can even leave interior points\ +uninterpolated (see the tests.)\ +\ +A future plan for inpaint_nans is to add an option that will use a locally\ +anisotropic membrane model. This will allow better modeling for certain\ +classes of wavy surfaces. I'm also highly tempted to remove method 5.\ +I've never really liked it, having put it in at the request of one user. It has\ +no valid theory behind it in the context of inpaint_nans.\ +\ +In the interest of openness, I'll also say what inpaint_nans does not do. It\ +does not handle non-uniform grids. It is limited by the amount of memory \ +in the size of the arrays it can handle, although some of the methods were\ +explicitly provided to be more memory efficient than others. Inpaint_nans\ +also makes heavy use of sparse matrices, so surprisingly large problems\ +are accessible.\ +\ +Finally, while inpaint_nans does work for 1-d problems, they are not my\ +target. Interp1 (with 'spline' as the method) is as accurate, and should be\ +faster in general.\ +\ +John\ +} \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/doc/methods_of_inpaint_nans.m b/auxiliary/boundedline/Inpaint_nans/doc/methods_of_inpaint_nans.m new file mode 100755 index 0000000..db956cb --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/doc/methods_of_inpaint_nans.m @@ -0,0 +1,187 @@ +%{ + +The methods of inpaint_nans + +Digital inpainting is the craft of replacing missing elements in an +"image" array. A Google search on the words "digita inpainting" will turn +up many hits. I just tried this search and found 18300 hits. + +If you wish to do inpainting in matlab, one place to start is with my +inpaint_nans code. Inpaint_nans is on the file exchange: + +http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=4551&objectType=file + +It looks for NaN elements in an array (or vector) and attempts to interpolate +(or extrapolate) smoothly to replace those elements. + +The name "inpainting" itself comes from the world of art restoration. +Damaged paintings are restored by an artist/craftsman skilled in matching +the style of the original artist to fill in any holes in the painting. + +In digital inpainting, the goal is to interpolate in from the boundaries +of a hole to smoothly replace an artifact. Obviously, where the hole is +large the digitally inpainted repair may not be an accurate approximation +to the original. + +Inpaint_nans itself is really only a boundary value solver. The basic idea +is to formulate a partial differential equation (PDE) that is assumed to +apply in the domain of the artifact to be inpainted. The perimeter of the +hole supplies boundary values for the PDE. Then the PDE is approximated +using finite difference methods (the array elements are assumed to be +equally spaced in each dimension) and then a large (and very sparse) linear +system of equations is solved for the NaN elements in the array. + +I've chosen a variety of simple differental equation models the user can +specify to be solved. All the methods current use a basically elliptic +PDE. This means that the resulting linear system will generally be well +conditioned. It does mean that the solution will generally be fairly smooth, +and over large holes, it will tend towards an average of the boundary +elements. These are characteristics of the elliptic PDEs chosen. (My hope +is to expand these options in the future.) + +%} + +%% + +% Lets formulate a simple problem, and see how we could solve it using +% some of these ideas. +A = [0 0 0 0;1 NaN NaN 4;2 3 5 8]; + +% Although we can't plot this matrix using the functions surf or mesh, +% surely we can visualize what the fudamental shape is. + +% There are only two unknown elements, the artifacts that inpaint_nans +% would fill in: A(2,2) and A(2,3). + +% For an equally spaced grid, the Laplacian equation (or Poisson's equation +% of heat conduction at steady state if you prefer. Or, for the fickle, +% Ficke's law of diffusion would apply.) All of these result in the PDE +% +% u_xx + u_yy = 0 +% +% where u_xx is the second partial derivative of u with respect to x, +% and u_yy is the second partial with respect to y. +% +% Approximating this PDE using finite differences for the partial +% derivatives, implies that at any node in the grid, we could replace +% it by the average of its 4 neighbors. Thus the two NaN elements +% generate two linear equations: +% +% A(2,2) = (A(1,2) + A(3,2) + A(2,1) + A(2,3)) / 4 +% A(2,3) = (A(1,3) + A(3,3) + A(2,2) + A(2,4)) / 4 +% +% Since we know all the parameters but A(2,2) and A(2,3), substitute their +% known values. +% +% A(2,2) = (0 + 3 + 1 + A(2,3)) / 4 +% A(2,3) = (0 + 5 + A(2,2) + 4) / 4 +% +% Or, +% +% 4*A(2,2) - A(2,3) = 4 +% -A(2,2) + 4*A(2,3) = 9 +% +% We can solve for the unkowns now using +u = [4 -1;-1 4]\[4;9] + +A(2,2) = u(1); +A(2,3) = u(2); + +% and finally plot the surface +close +surf(A) +title 'A simply inpainted surface' + +% Neat huh? For an arbitrary number of NaN elements in an array, +% the above scheme is all there is to method 2 of inpaint_nans, +% together with a very slick application of sparse linear algebra +% in Matlab. + +% Method 0 is very similar, but I've optimized it to build as +% small a linear system as possible for those cases where an array +% has only a few NaN elements. + +% Method 1 is another subtle variation on this scheme, but it +% tries to be slightly smoother at some cost of efficiency, while +% still not modifying the known (non-NaN) elements of the array. + +% Method 5 of inpaint_nans is also very similar to method 2, except +% that it uses a simple average of all 8 neighbors of an element. +% Its not actually an approximation to our PDE. + +% Method 3 is yet another variation on this theme, except the PDE +% model used is one more suited to a model of a thin plate than for +% heat diffusion. Here the governing PDE is: +% +% u_xxxx + 2*u_xxyy + u_yyyy = 0 +% +% again discretized into a linear system of equations. + +%% + +% Finally, method 4 of inpaint_nans has a different underlying +% model. Pretend that each element in the array was connected to +% its immediate neighbors to the left, right, up, and down by +% "springs". They are also connected to their neighbors at 45 +% degree angles by springs with a weaker spring constant. Since +% the potential energy stored in a spring is proportional to its +% extension, we can formulate this again as a linear system of +% equations to be solved. For the example above, we would generate +% the set of equations: + +% A(2,2) - A(1,2) = 0 +% A(2,2) - A(2,1) = 0 +% A(2,2) - A(3,2) = 0 +% A(2,2) - A(2,3) = 0 +% (A(2,2) - A(1,1))/sqrt(2) = 0 +% (A(2,2) - A(1,3))/sqrt(2) = 0 +% (A(2,2) - A(3,1))/sqrt(2) = 0 +% (A(2,2) - A(3,3))/sqrt(2) = 0 +% A(2,3) - A(1,3) = 0 +% A(2,3) - A(2,2) = 0 +% A(2,3) - A(3,3) = 0 +% A(2,3) - A(2,4) = 0 +% (A(2,3) - A(1,2))/sqrt(2) = 0 +% (A(2,3) - A(1,4))/sqrt(2) = 0 +% (A(2,3) - A(3,2))/sqrt(2) = 0 +% (A(2,3) - A(3,4))/sqrt(2) = 0 + +% Substitute for the known elements to get + +% A(2,2) - 0 = 0 +% A(2,2) - 1 = 0 +% A(2,2) - 3 = 0 +% A(2,2) - A(2,3) = 0 +% (A(2,2) - 0)/sqrt(2) = 0 +% (A(2,2) - 0)/sqrt(2) = 0 +% (A(2,2) - 2)/sqrt(2) = 0 +% (A(2,2) - 5)/sqrt(2) = 0 +% A(2,3) - 0 = 0 +% A(2,3) - A(2,2) = 0 +% A(2,3) - 5 = 0 +% A(2,3) - 4 = 0 +% (A(2,3) - 0)/sqrt(2) = 0 +% (A(2,3) - 0)/sqrt(2) = 0 +% (A(2,3) - 3)/sqrt(2) = 0 +% (A(2,3) - 8)/sqrt(2) = 0 + +% This system is also solvable now: +r2 = 1/sqrt(2); +M=[1 0;1 0;1 0;1 -1;r2 0;r2 0;r2 0;r2 0;0 1;-1 1;0 1;0 1;0 r2;0 r2;0 r2;0 r2]; +v = M\[0 1 3 0 0 0 2*r2 5*r2 0 0 5 4 0 0 3*r2 8*r2]' + +A(2,2) = v(1); +A(2,3) = v(2); + +% and finally plot the surface +surf(A) +title 'A simply inpainted surface using a spring model' + +%% + +% Why did I provide this approach, based on a spring metaphor? +% As you should have observed, methods 2 and 4 are really quite close +% in what they do for internal NaN elements. Its on the perimeter that +% they differ significantly. The diffusion/Laplacian model will +% extrapolate smoothly, and as linearly as possible. The spring model +% will tend to extrapolate as a constant function. \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/garden50.jpg b/auxiliary/boundedline/Inpaint_nans/garden50.jpg new file mode 100755 index 0000000..37e1e99 Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/garden50.jpg differ diff --git a/auxiliary/boundedline/Inpaint_nans/inpaint_nans.m b/auxiliary/boundedline/Inpaint_nans/inpaint_nans.m new file mode 100755 index 0000000..2460b51 --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/inpaint_nans.m @@ -0,0 +1 @@ +function B=inpaint_nans(A,method) % INPAINT_NANS: in-paints over nans in an array % usage: B=INPAINT_NANS(A) % default method % usage: B=INPAINT_NANS(A,method) % specify method used % % Solves approximation to one of several pdes to % interpolate and extrapolate holes in an array % % arguments (input): % A - nxm array with some NaNs to be filled in % % method - (OPTIONAL) scalar numeric flag - specifies % which approach (or physical metaphor to use % for the interpolation.) All methods are capable % of extrapolation, some are better than others. % There are also speed differences, as well as % accuracy differences for smooth surfaces. % % methods {0,1,2} use a simple plate metaphor. % method 3 uses a better plate equation, % but may be much slower and uses % more memory. % method 4 uses a spring metaphor. % method 5 is an 8 neighbor average, with no % rationale behind it compared to the % other methods. I do not recommend % its use. % % method == 0 --> (DEFAULT) see method 1, but % this method does not build as large of a % linear system in the case of only a few % NaNs in a large array. % Extrapolation behavior is linear. % % method == 1 --> simple approach, applies del^2 % over the entire array, then drops those parts % of the array which do not have any contact with % NaNs. Uses a least squares approach, but it % does not modify known values. % In the case of small arrays, this method is % quite fast as it does very little extra work. % Extrapolation behavior is linear. % % method == 2 --> uses del^2, but solving a direct % linear system of equations for nan elements. % This method will be the fastest possible for % large systems since it uses the sparsest % possible system of equations. Not a least % squares approach, so it may be least robust % to noise on the boundaries of any holes. % This method will also be least able to % interpolate accurately for smooth surfaces. % Extrapolation behavior is linear. % % Note: method 2 has problems in 1-d, so this % method is disabled for vector inputs. % % method == 3 --+ See method 0, but uses del^4 for % the interpolating operator. This may result % in more accurate interpolations, at some cost % in speed. % % method == 4 --+ Uses a spring metaphor. Assumes % springs (with a nominal length of zero) % connect each node with every neighbor % (horizontally, vertically and diagonally) % Since each node tries to be like its neighbors, % extrapolation is as a constant function where % this is consistent with the neighboring nodes. % % method == 5 --+ See method 2, but use an average % of the 8 nearest neighbors to any element. % This method is NOT recommended for use. % % % arguments (output): % B - nxm array with NaNs replaced % % % Example: % [x,y] = meshgrid(0:.01:1); % z0 = exp(x+y); % znan = z0; % znan(20:50,40:70) = NaN; % znan(30:90,5:10) = NaN; % znan(70:75,40:90) = NaN; % % z = inpaint_nans(znan); % % % See also: griddata, interp1 % % Author: John D'Errico % e-mail address: woodchips@rochester.rr.com % Release: 2 % Release date: 4/15/06 % I always need to know which elements are NaN, % and what size the array is for any method [n,m]=size(A); A=A(:); nm=n*m; k=isnan(A(:)); % list the nodes which are known, and which will % be interpolated nan_list=find(k); known_list=find(~k); % how many nans overall nan_count=length(nan_list); % convert NaN indices to (r,c) form % nan_list==find(k) are the unrolled (linear) indices % (row,column) form [nr,nc]=ind2sub([n,m],nan_list); % both forms of index in one array: % column 1 == unrolled index % column 2 == row index % column 3 == column index nan_list=[nan_list,nr,nc]; % supply default method if (nargin<2) || isempty(method) method = 0; elseif ~ismember(method,0:5) error 'If supplied, method must be one of: {0,1,2,3,4,5}.' end % for different methods switch method case 0 % The same as method == 1, except only work on those % elements which are NaN, or at least touch a NaN. % is it 1-d or 2-d? if (m == 1) || (n == 1) % really a 1-d case work_list = nan_list(:,1); work_list = unique([work_list;work_list - 1;work_list + 1]); work_list(work_list <= 1) = []; work_list(work_list >= nm) = []; nw = numel(work_list); u = (1:nw)'; fda = sparse(repmat(u,1,3),bsxfun(@plus,work_list,-1:1), ... repmat([1 -2 1],nw,1),nw,nm); else % a 2-d case % horizontal and vertical neighbors only talks_to = [-1 0;0 -1;1 0;0 1]; neighbors_list=identify_neighbors(n,m,nan_list,talks_to); % list of all nodes we have identified all_list=[nan_list;neighbors_list]; % generate sparse array with second partials on row % variable for each element in either list, but only % for those nodes which have a row index > 1 or < n L = find((all_list(:,2) > 1) & (all_list(:,2) < n)); nl=length(L); if nl>0 fda=sparse(repmat(all_list(L,1),1,3), ... repmat(all_list(L,1),1,3)+repmat([-1 0 1],nl,1), ... repmat([1 -2 1],nl,1),nm,nm); else fda=spalloc(n*m,n*m,size(all_list,1)*5); end % 2nd partials on column index L = find((all_list(:,3) > 1) & (all_list(:,3) < m)); nl=length(L); if nl>0 fda=fda+sparse(repmat(all_list(L,1),1,3), ... repmat(all_list(L,1),1,3)+repmat([-n 0 n],nl,1), ... repmat([1 -2 1],nl,1),nm,nm); end end % eliminate knowns rhs=-fda(:,known_list)*A(known_list); k=find(any(fda(:,nan_list(:,1)),2)); % and solve... B=A; B(nan_list(:,1))=fda(k,nan_list(:,1))\rhs(k); case 1 % least squares approach with del^2. Build system % for every array element as an unknown, and then % eliminate those which are knowns. % Build sparse matrix approximating del^2 for % every element in A. % is it 1-d or 2-d? if (m == 1) || (n == 1) % a 1-d case u = (1:(nm-2))'; fda = sparse(repmat(u,1,3),bsxfun(@plus,u,0:2), ... repmat([1 -2 1],nm-2,1),nm-2,nm); else % a 2-d case % Compute finite difference for second partials % on row variable first [i,j]=ndgrid(2:(n-1),1:m); ind=i(:)+(j(:)-1)*n; np=(n-2)*m; fda=sparse(repmat(ind,1,3),[ind-1,ind,ind+1], ... repmat([1 -2 1],np,1),n*m,n*m); % now second partials on column variable [i,j]=ndgrid(1:n,2:(m-1)); ind=i(:)+(j(:)-1)*n; np=n*(m-2); fda=fda+sparse(repmat(ind,1,3),[ind-n,ind,ind+n], ... repmat([1 -2 1],np,1),nm,nm); end % eliminate knowns rhs=-fda(:,known_list)*A(known_list); k=find(any(fda(:,nan_list),2)); % and solve... B=A; B(nan_list(:,1))=fda(k,nan_list(:,1))\rhs(k); case 2 % Direct solve for del^2 BVP across holes % generate sparse array with second partials on row % variable for each nan element, only for those nodes % which have a row index > 1 or < n % is it 1-d or 2-d? if (m == 1) || (n == 1) % really just a 1-d case error('Method 2 has problems for vector input. Please use another method.') else % a 2-d case L = find((nan_list(:,2) > 1) & (nan_list(:,2) < n)); nl=length(L); if nl>0 fda=sparse(repmat(nan_list(L,1),1,3), ... repmat(nan_list(L,1),1,3)+repmat([-1 0 1],nl,1), ... repmat([1 -2 1],nl,1),n*m,n*m); else fda=spalloc(n*m,n*m,size(nan_list,1)*5); end % 2nd partials on column index L = find((nan_list(:,3) > 1) & (nan_list(:,3) < m)); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,3), ... repmat(nan_list(L,1),1,3)+repmat([-n 0 n],nl,1), ... repmat([1 -2 1],nl,1),n*m,n*m); end % fix boundary conditions at extreme corners % of the array in case there were nans there if ismember(1,nan_list(:,1)) fda(1,[1 2 n+1])=[-2 1 1]; end if ismember(n,nan_list(:,1)) fda(n,[n, n-1,n+n])=[-2 1 1]; end if ismember(nm-n+1,nan_list(:,1)) fda(nm-n+1,[nm-n+1,nm-n+2,nm-n])=[-2 1 1]; end if ismember(nm,nan_list(:,1)) fda(nm,[nm,nm-1,nm-n])=[-2 1 1]; end % eliminate knowns rhs=-fda(:,known_list)*A(known_list); % and solve... B=A; k=nan_list(:,1); B(k)=fda(k,k)\rhs(k); end case 3 % The same as method == 0, except uses del^4 as the % interpolating operator. % del^4 template of neighbors talks_to = [-2 0;-1 -1;-1 0;-1 1;0 -2;0 -1; ... 0 1;0 2;1 -1;1 0;1 1;2 0]; neighbors_list=identify_neighbors(n,m,nan_list,talks_to); % list of all nodes we have identified all_list=[nan_list;neighbors_list]; % generate sparse array with del^4, but only % for those nodes which have a row & column index % >= 3 or <= n-2 L = find( (all_list(:,2) >= 3) & ... (all_list(:,2) <= (n-2)) & ... (all_list(:,3) >= 3) & ... (all_list(:,3) <= (m-2))); nl=length(L); if nl>0 % do the entire template at once fda=sparse(repmat(all_list(L,1),1,13), ... repmat(all_list(L,1),1,13) + ... repmat([-2*n,-n-1,-n,-n+1,-2,-1,0,1,2,n-1,n,n+1,2*n],nl,1), ... repmat([1 2 -8 2 1 -8 20 -8 1 2 -8 2 1],nl,1),nm,nm); else fda=spalloc(n*m,n*m,size(all_list,1)*5); end % on the boundaries, reduce the order around the edges L = find((((all_list(:,2) == 2) | ... (all_list(:,2) == (n-1))) & ... (all_list(:,3) >= 2) & ... (all_list(:,3) <= (m-1))) | ... (((all_list(:,3) == 2) | ... (all_list(:,3) == (m-1))) & ... (all_list(:,2) >= 2) & ... (all_list(:,2) <= (n-1)))); nl=length(L); if nl>0 fda=fda+sparse(repmat(all_list(L,1),1,5), ... repmat(all_list(L,1),1,5) + ... repmat([-n,-1,0,+1,n],nl,1), ... repmat([1 1 -4 1 1],nl,1),nm,nm); end L = find( ((all_list(:,2) == 1) | ... (all_list(:,2) == n)) & ... (all_list(:,3) >= 2) & ... (all_list(:,3) <= (m-1))); nl=length(L); if nl>0 fda=fda+sparse(repmat(all_list(L,1),1,3), ... repmat(all_list(L,1),1,3) + ... repmat([-n,0,n],nl,1), ... repmat([1 -2 1],nl,1),nm,nm); end L = find( ((all_list(:,3) == 1) | ... (all_list(:,3) == m)) & ... (all_list(:,2) >= 2) & ... (all_list(:,2) <= (n-1))); nl=length(L); if nl>0 fda=fda+sparse(repmat(all_list(L,1),1,3), ... repmat(all_list(L,1),1,3) + ... repmat([-1,0,1],nl,1), ... repmat([1 -2 1],nl,1),nm,nm); end % eliminate knowns rhs=-fda(:,known_list)*A(known_list); k=find(any(fda(:,nan_list(:,1)),2)); % and solve... B=A; B(nan_list(:,1))=fda(k,nan_list(:,1))\rhs(k); case 4 % Spring analogy % interpolating operator. % list of all springs between a node and a horizontal % or vertical neighbor hv_list=[-1 -1 0;1 1 0;-n 0 -1;n 0 1]; hv_springs=[]; for i=1:4 hvs=nan_list+repmat(hv_list(i,:),nan_count,1); k=(hvs(:,2)>=1) & (hvs(:,2)<=n) & (hvs(:,3)>=1) & (hvs(:,3)<=m); hv_springs=[hv_springs;[nan_list(k,1),hvs(k,1)]]; end % delete replicate springs hv_springs=unique(sort(hv_springs,2),'rows'); % build sparse matrix of connections, springs % connecting diagonal neighbors are weaker than % the horizontal and vertical springs nhv=size(hv_springs,1); springs=sparse(repmat((1:nhv)',1,2),hv_springs, ... repmat([1 -1],nhv,1),nhv,nm); % eliminate knowns rhs=-springs(:,known_list)*A(known_list); % and solve... B=A; B(nan_list(:,1))=springs(:,nan_list(:,1))\rhs; case 5 % Average of 8 nearest neighbors % generate sparse array to average 8 nearest neighbors % for each nan element, be careful around edges fda=spalloc(n*m,n*m,size(nan_list,1)*9); % -1,-1 L = find((nan_list(:,2) > 1) & (nan_list(:,3) > 1)); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([-n-1, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % 0,-1 L = find(nan_list(:,3) > 1); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([-n, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % +1,-1 L = find((nan_list(:,2) < n) & (nan_list(:,3) > 1)); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([-n+1, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % -1,0 L = find(nan_list(:,2) > 1); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([-1, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % +1,0 L = find(nan_list(:,2) < n); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([1, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % -1,+1 L = find((nan_list(:,2) > 1) & (nan_list(:,3) < m)); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([n-1, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % 0,+1 L = find(nan_list(:,3) < m); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([n, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % +1,+1 L = find((nan_list(:,2) < n) & (nan_list(:,3) < m)); nl=length(L); if nl>0 fda=fda+sparse(repmat(nan_list(L,1),1,2), ... repmat(nan_list(L,1),1,2)+repmat([n+1, 0],nl,1), ... repmat([1 -1],nl,1),n*m,n*m); end % eliminate knowns rhs=-fda(:,known_list)*A(known_list); % and solve... B=A; k=nan_list(:,1); B(k)=fda(k,k)\rhs(k); end % all done, make sure that B is the same shape as % A was when we came in. B=reshape(B,n,m); % ==================================================== % end of main function % ==================================================== % ==================================================== % begin subfunctions % ==================================================== function neighbors_list=identify_neighbors(n,m,nan_list,talks_to) % identify_neighbors: identifies all the neighbors of % those nodes in nan_list, not including the nans % themselves % % arguments (input): % n,m - scalar - [n,m]=size(A), where A is the % array to be interpolated % nan_list - array - list of every nan element in A % nan_list(i,1) == linear index of i'th nan element % nan_list(i,2) == row index of i'th nan element % nan_list(i,3) == column index of i'th nan element % talks_to - px2 array - defines which nodes communicate % with each other, i.e., which nodes are neighbors. % % talks_to(i,1) - defines the offset in the row % dimension of a neighbor % talks_to(i,2) - defines the offset in the column % dimension of a neighbor % % For example, talks_to = [-1 0;0 -1;1 0;0 1] % means that each node talks only to its immediate % neighbors horizontally and vertically. % % arguments(output): % neighbors_list - array - list of all neighbors of % all the nodes in nan_list if ~isempty(nan_list) % use the definition of a neighbor in talks_to nan_count=size(nan_list,1); talk_count=size(talks_to,1); nn=zeros(nan_count*talk_count,2); j=[1,nan_count]; for i=1:talk_count nn(j(1):j(2),:)=nan_list(:,2:3) + ... repmat(talks_to(i,:),nan_count,1); j=j+nan_count; end % drop those nodes which fall outside the bounds of the % original array L = (nn(:,1)<1)|(nn(:,1)>n)|(nn(:,2)<1)|(nn(:,2)>m); nn(L,:)=[]; % form the same format 3 column array as nan_list neighbors_list=[sub2ind([n,m],nn(:,1),nn(:,2)),nn]; % delete replicates in the neighbors list neighbors_list=unique(neighbors_list,'rows'); % and delete those which are also in the list of NaNs. neighbors_list=setdiff(neighbors_list,nan_list,'rows'); else neighbors_list=[]; end \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/inpaint_nans_bc.m b/auxiliary/boundedline/Inpaint_nans/inpaint_nans_bc.m new file mode 100755 index 0000000..d6ae57e --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/inpaint_nans_bc.m @@ -0,0 +1 @@ +function B=inpaint_nans_bc(A,method,bcclass) % INPAINT_NANS_BC: in-paints over nans in an array, with spherical or toroidal boundary conditions % usage: B=inpaint_nsns_bc(A) % default method % usage: B=inpaint_nsns_bc(A,method) % specify method used % usage: B=inpaint_nsns_bc(A,method,bcclass) % specify class of boundary conditions applied % % Solves approximation to one of several pdes to % interpolate and extrapolate holes in an array. % Depending upon the boundary conditions specified, % the array will effectively be treated as if it lies % on either the surface of a sphere or a toroid. % % arguments (input): % A - nxm array with some NaNs to be filled in % % method - (OPTIONAL) scalar numeric flag - specifies % which approach (or physical metaphor to use % for the interpolation.) All methods are capable % of extrapolation, some are better than others. % There are also speed differences, as well as % accuracy differences for smooth surfaces. % % The methods employed here are a subset of the % methods of the original inpaint_nans. % % methods {0,1} use a simple plate metaphor. % method 4 uses a spring metaphor. % % method == 0 --> (DEFAULT) see method 1, but % this method does not build as large of a % linear system in the case of only a few % NaNs in a large array. % Extrapolation behavior is linear. % % method == 1 --> simple approach, applies del^2 % over the entire array, then drops those parts % of the array which do not have any contact with % NaNs. Uses a least squares approach, but it % does not modify known values. % In the case of small arrays, this method is % quite fast as it does very little extra work. % Extrapolation behavior is linear. % % method == 4 --> Uses a spring metaphor. Assumes % springs (with a nominal length of zero) % connect each node with every neighbor % (horizontally, vertically and diagonally) % Since each node tries to be like its neighbors, % extrapolation is as a constant function where % this is consistent with the neighboring nodes. % % DEFAULT: 0 % % bcclass - (OPTIONAL) character flag, indicating how % the array boundaries will be treated in the % inpainting operation. bcclass may be either % 'sphere' or 'toroid', or any simple contraction % of these words. % % bcclass = 'sphere' --> The first and last rows % of the array will be treated as if they are % at the North and South poles of a sphere. % Adjacent to those rows will be singular % phantom nodes at each pole. % % bcclass = 'toroid' --> The first and last rows % of the array will be treated as if they are % adjacent to ech other. As well, the first and % last columns will be adjacent to each other. % % DEFAULT: 'sphere' % % arguments (output): % B - nxm array with NaNs replaced % % % Example: % [x,y] = meshgrid(0:.01:1); % z0 = exp(x+y); % znan = z0; % znan(20:50,40:70) = NaN; % znan(30:90,5:10) = NaN; % znan(70:75,40:90) = NaN; % % z = inpaint_nans(znan); % % % See also: griddata, interp1 % % Author: John D'Errico % e-mail address: woodchips@rochester.rr.com % Release: 2 % Release date: 4/15/06 % I always need to know which elements are NaN, % and what size the array is for any method [n,m]=size(A); A=A(:); nm=n*m; k=isnan(A(:)); % list those nodes which are known, and which will % be interpolated nan_list=find(k); known_list=find(~k); % how many nans overall nan_count=length(nan_list); % convert NaN indices to (r,c) form % nan_list==find(k) are the unrolled (linear) indices % (row,column) form [nr,nc]=ind2sub([n,m],nan_list); % both forms of index in one array: % column 1 == unrolled index % column 2 == row index % column 3 == column index nan_list=[nan_list,nr,nc]; % supply default method if (nargin<2) || isempty(method) method = 0; elseif ~ismember(method,[0 1 4]) error('INPAINT_NANS_BC:improperargument', ... 'If supplied, method must be one of: {0,1,4}.') end % supply default value for bcclass if (nargin < 3) || isempty(bcclass) bcclass = 'sphere'; elseif ~ischar(bcclass) error('INPAINT_NANS_BC:improperargument', ... 'If supplied, bcclass must be ''sphere'' or ''toroid''') else % it was a character string valid = {'sphere' 'toroid'}; % check to see if it is valid [bcclass,errorclass] = validstring(arg,valid); if ~isempty(errorclass) error('INPAINT_NANS_BC:improperargument', ... 'If supplied, bcclass must be ''sphere'' or ''toroid''') end end % choice of methods switch method case 0 % The same as method == 1, except only work on those % elements which are NaN, or at least touch a NaN. % horizontal and vertical neighbors only talks_to = [-1 0;0 -1;1 0;0 1]; neighbors_list=identify_neighbors(n,m,nan_list,talks_to); % list of all nodes we have identified all_list=[nan_list;neighbors_list]; % generate sparse array with second partials on row % variable for each element in either list, but only % for those nodes which have a row index > 1 or < n L = find((all_list(:,2) > 1) & (all_list(:,2) < n)); nl=length(L); if nl>0 fda=sparse(repmat(all_list(L,1),1,3), ... repmat(all_list(L,1),1,3)+repmat([-1 0 1],nl,1), ... repmat([1 -2 1],nl,1),nm,nm); else fda=spalloc(n*m,n*m,size(all_list,1)*5); end % 2nd partials on column index L = find((all_list(:,3) > 1) & (all_list(:,3) < m)); nl=length(L); if nl>0 fda=fda+sparse(repmat(all_list(L,1),1,3), ... repmat(all_list(L,1),1,3)+repmat([-n 0 n],nl,1), ... repmat([1 -2 1],nl,1),nm,nm); end % eliminate knowns rhs=-fda(:,known_list)*A(known_list); k=find(any(fda(:,nan_list(:,1)),2)); % and solve... B=A; B(nan_list(:,1))=fda(k,nan_list(:,1))\rhs(k); case 1 % least squares approach with del^2. Build system % for every array element as an unknown, and then % eliminate those which are knowns. % Build sparse matrix approximating del^2 for % every element in A. % Compute finite difference for second partials % on row variable first [i,j]=ndgrid(1:n,1:m); ind=i(:)+(j(:)-1)*n; np=n*m; switch bcclass case 'sphere' % we need to have two phantom nodes at the poles np = np + 2; end fda=sparse(repmat(ind,1,3),[ind-1,ind,ind+1], ... repmat([1 -2 1],np,1),n*m,n*m); % now second partials on column variable [i,j]=ndgrid(1:n,2:(m-1)); ind=i(:)+(j(:)-1)*n; np=n*(m-2); fda=fda+sparse(repmat(ind,1,3),[ind-n,ind,ind+n], ... repmat([1 -2 1],np,1),nm,nm); % eliminate knowns rhs=-fda(:,known_list)*A(known_list); k=find(any(fda(:,nan_list),2)); % and solve... B=A; B(nan_list(:,1))=fda(k,nan_list(:,1))\rhs(k); case 4 % Spring analogy % interpolating operator. % list of all springs between a node and a horizontal % or vertical neighbor hv_list=[-1 -1 0;1 1 0;-n 0 -1;n 0 1]; hv_springs=[]; for i=1:4 hvs=nan_list+repmat(hv_list(i,:),nan_count,1); k=(hvs(:,2)>=1) & (hvs(:,2)<=n) & (hvs(:,3)>=1) & (hvs(:,3)<=m); hv_springs=[hv_springs;[nan_list(k,1),hvs(k,1)]]; end % delete replicate springs hv_springs=unique(sort(hv_springs,2),'rows'); % build sparse matrix of connections, springs % connecting diagonal neighbors are weaker than % the horizontal and vertical springs nhv=size(hv_springs,1); springs=sparse(repmat((1:nhv)',1,2),hv_springs, ... repmat([1 -1],nhv,1),nhv,nm); % eliminate knowns rhs=-springs(:,known_list)*A(known_list); % and solve... B=A; B(nan_list(:,1))=springs(:,nan_list(:,1))\rhs; end % all done, make sure that B is the same shape as % A was when we came in. B=reshape(B,n,m); end % mainline % ==================================================== % end of main function % ==================================================== % ==================================================== % begin subfunctions % ==================================================== function neighbors_list=identify_neighbors(n,m,nan_list,talks_to) % identify_neighbors: identifies all the neighbors of % those nodes in nan_list, not including the nans % themselves % % arguments (input): % n,m - scalar - [n,m]=size(A), where A is the % array to be interpolated % nan_list - array - list of every nan element in A % nan_list(i,1) == linear index of i'th nan element % nan_list(i,2) == row index of i'th nan element % nan_list(i,3) == column index of i'th nan element % talks_to - px2 array - defines which nodes communicate % with each other, i.e., which nodes are neighbors. % % talks_to(i,1) - defines the offset in the row % dimension of a neighbor % talks_to(i,2) - defines the offset in the column % dimension of a neighbor % % For example, talks_to = [-1 0;0 -1;1 0;0 1] % means that each node talks only to its immediate % neighbors horizontally and vertically. % % arguments(output): % neighbors_list - array - list of all neighbors of % all the nodes in nan_list if ~isempty(nan_list) % use the definition of a neighbor in talks_to nan_count=size(nan_list,1); talk_count=size(talks_to,1); nn=zeros(nan_count*talk_count,2); j=[1,nan_count]; for i=1:talk_count nn(j(1):j(2),:)=nan_list(:,2:3) + ... repmat(talks_to(i,:),nan_count,1); j=j+nan_count; end % form the same format 3 column array as nan_list neighbors_list=[sub2ind([n,m],nn(:,1),nn(:,2)),nn]; % delete replicates in the neighbors list neighbors_list=unique(neighbors_list,'rows'); % and delete those which are also in the list of NaNs. neighbors_list=setdiff(neighbors_list,nan_list,'rows'); else neighbors_list=[]; end end % function identify_neighbors function [str,errorclass] = validstring(arg,valid) % validstring: compares a string against a set of valid options % usage: [str,errorclass] = validstring(arg,valid) % % If a direct hit, or any unambiguous shortening is found, that % string is returned. Capitalization is ignored. % % arguments: (input) % arg - character string, to be tested against a list % of valid choices. Capitalization is ignored. % % valid - cellstring array of alternative choices % % Arguments: (output) % str - string - resulting choice resolved from the % list of valid arguments. If no unambiguous % choice can be resolved, then str will be empty. % % errorclass - string - A string argument that explains % the error. It will be one of the following % possibilities: % % '' --> No error. An unambiguous match for arg % was found among the choices. % % 'No match found' --> No match was found among % the choices provided in valid. % % 'Ambiguous argument' --> At least two ambiguous % matches were found among those provided % in valid. % % % Example: % valid = {'off' 'on' 'The sky is falling'} % % % See also: parse_pv_pairs, strmatch, strcmpi % % Author: John D'Errico % e-mail: woodchips@rochester.rr.com % Release: 1.0 % Release date: 3/25/2010 ind = strmatch(lower(arg),lower(valid)); if isempty(ind) % No hit found errorclass = 'No match found'; str = ''; elseif (length(ind) > 1) % Ambiguous arg, hitting more than one of the valid options errorclass = 'Ambiguous argument'; str = ''; return else errorclass = ''; str = valid{ind}; end end % function validstring \ No newline at end of file diff --git a/auxiliary/boundedline/Inpaint_nans/inpaint_nans_demo.m b/auxiliary/boundedline/Inpaint_nans/inpaint_nans_demo.m new file mode 100755 index 0000000..2163cc0 --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/inpaint_nans_demo.m @@ -0,0 +1,43 @@ +%% Surface Fit Artifact Removal + +%% Construct the Surface +[x,y] = meshgrid(0:.01:1); +z0 = exp(x+y); + +close all +figure +surf(z0) +title 'Original surface' + +znan = z0; +znan(20:50,40:70) = NaN; +znan(30:90,5:10) = NaN; +znan(70:75,40:90) = NaN; + +figure +surf(znan) +title 'Artifacts (large holes) in surface' + +%% In-paint Over NaNs +z = inpaint_nans(znan,3); +figure +surf(z) +title 'Inpainted surface' + +figure +surf(z-z0) +title 'Inpainting error surface (Note z-axis scale)' + +%% Comapre to GRIDDATA +k = isnan(znan); +zk = griddata(x(~k),y(~k),z(~k),x(k),y(k)); +zg = znan; +zg(k) = zk; + +figure +surf(zg) +title(['Griddata inpainting (',num2str(sum(isnan(zg(:)))),' NaNs remain)']) + +figure +surf(zg-z0) +title 'Griddata error surface' diff --git a/auxiliary/boundedline/Inpaint_nans/monet_adresse.jpg b/auxiliary/boundedline/Inpaint_nans/monet_adresse.jpg new file mode 100755 index 0000000..bceff7e Binary files /dev/null and b/auxiliary/boundedline/Inpaint_nans/monet_adresse.jpg differ diff --git a/auxiliary/boundedline/Inpaint_nans/test/test_main.m b/auxiliary/boundedline/Inpaint_nans/test/test_main.m new file mode 100755 index 0000000..1dde28e --- /dev/null +++ b/auxiliary/boundedline/Inpaint_nans/test/test_main.m @@ -0,0 +1,178 @@ +%% Repair to an image with 50% random artifacts + +% Garden at Sainte-Adresse (Monet, 1867) +garden = imread('monet_adresse.jpg'); +G = double(garden); +G(rand(size(G))<0.50) = NaN; +Gnan = G; + +G(:,:,1) = inpaint_nans(G(:,:,1),2); +G(:,:,2) = inpaint_nans(G(:,:,2),2); +G(:,:,3) = inpaint_nans(G(:,:,3),2); + +figure +subplot(1,3,1) +image(garden) +title 'Garden at Sainte-Adresse (Monet)' + +subplot(1,3,2) +image(uint8(Gnan)) +title 'Corrupted - 50%' + +subplot(1,3,3) +image(uint8(G)) +title 'Inpainted Garden' + +%% Surface fit artifact removal + +[x,y] = meshgrid(0:.01:1); +z0 = exp(x+y); + +znan = z0; +znan(20:50,40:70) = NaN; +znan(30:90,5:10) = NaN; +znan(70:75,40:90) = NaN; + +tic,z = inpaint_nans(znan,3);toc + +tic +k = isnan(znan); +zk = griddata(x(~k),y(~k),z(~k),x(k),y(k)); +zg = znan; +zg(k) = zk; +toc + +figure +surf(z0) +title 'Original surface' + +figure +surf(znan) +title 'Artifacts (large holes) in surface' + +figure +surf(zg) +title(['Griddata inpainting (',num2str(sum(isnan(zg(:)))),' NaNs remain)']) + +figure +surf(z) +title 'Inpainted surface' + +figure +surf(zg-z0) +title 'Griddata error surface' + +figure +surf(z-z0) +title 'Inpainting error surface (Note z-axis scale)' + +%% Comparison of methods + +[x,y] = meshgrid(-1:.02:1); +r = sqrt(x.^2 + y.^2); +z = exp(-(x.^2+ y.^2)); + +z(r>=0.9) = NaN; + +z((r<=.5) & (x<0)) = NaN; + +figure +pcolor(z); +title 'Surface provided to inpaint_nans' + +% Method 0 +tic,z0 = inpaint_nans(z,0);toc + +% Method 1 +tic,z1 = inpaint_nans(z,1);toc + +% Method 2 +tic,z2 = inpaint_nans(z,2);toc + +% Method 3 +tic,z3 = inpaint_nans(z,3);toc + +% Method 4 +tic,z4 = inpaint_nans(z,4);toc + +% Method 5 +tic,z5 = inpaint_nans(z,5);toc + +figure +surf(z0) +colormap copper +hold on +h = surf(z); +set(h,'facecolor','r') +hold off +title 'Method 0 (Red was provided)' + +figure +surf(z1) +hold on +h = surf(z); +set(h,'facecolor','r') +hold off +title 'Method 1 (Red was provided)' + +figure +surf(z2) +hold on +h = surf(z); +set(h,'facecolor','r') +hold off +title 'Method 2 (Red was provided) - least accurate, but fastest' + +figure +surf(z3) +hold on +h = surf(z); +set(h,'facecolor','r') +hold off +title 'Method 3 (Red was provided) - Slow, but accurate' + +figure +surf(z4) +hold on +h = surf(z); +set(h,'facecolor','r') +hold off +title 'Method 4 (Red was provided) - designed for constant extrapolation!' + +figure +h = surf(z5); +set(h,'facecolor','y') +hold on +h = surf(z); +set(h,'facecolor','r') +hold off +title 'Method 5 (Red was provided)' + + +%% 1-d "inpainting" using interp1 + +x = linspace(0,3*pi,100); +y0 = sin(x); +y = y0; +% Drop out 2/3 of the data +y(1:3:end) = NaN; +y(2:3:end) = NaN; + +% inpaint_nans +y_inpaint = inpaint_nans(y,1); + +% interpolate using interp1 +k = isnan(y); +y_interp1c = y; +y_interp1s = y; +y_interp1c(k) = interp1(x(~k),y(~k),x(k),'cubic'); +y_interp1s(k) = interp1(x(~k),y(~k),x(k),'spline'); + +figure +plot(x,y,'ro',x,y_inpaint,'b+') +legend('sin(x), missing 2/3 points','inpaint-nans','Location','North') + +figure +plot(x,y0-y_inpaint,'r-',x,y0-y_interp1c,'b--',x,y0-y_interp1s,'g--') +title 'Inpainting residuals' +legend('Inpaint-nans','Pchip','Spline','Location','North') diff --git a/auxiliary/boundedline/README.md b/auxiliary/boundedline/README.md new file mode 100755 index 0000000..4adc2a7 --- /dev/null +++ b/auxiliary/boundedline/README.md @@ -0,0 +1,88 @@ +## boundeline.m Documentation + +The boundedline function is a Matlab utility to plot error bounds, confidence intervals, etc. for a line or lines. Advantages include: +- allows x-y input similar to plot, where one call can create multiple lines at once, either by listing consecutive x-y pairs or by using using matrices for x and/or y. +- can add bounds in either the x- or y-direction, leading to support of plots where the x axis represents the dependent variable +- can render the shaded bounds either with transparency or as a lighter opaque patch, allowing flexibility with different renderers (helpful when OpenGL acts up, as it often does on my own computer). +- Can use linespec definitions, a colormap, or the default color order, as well as varying color intensity for the shaded bounds, for flexible color of lines and bounds +- returns handles of lines and patches for future modification if necessary + +### Syntax + +``` +[hl, hp] = boundedline(x, y, b) +[hl, hp] = boundedline(x, y, b, linespec) +[hl, hp] = boundedline(x1, y1, b1, linespec1, x2, y2, b2, linespec2) +[hl, hp] = boundedline(..., 'alpha') +[hl, hp] = boundedline(..., ax) +[hl, hp] = boundedline(..., 'transparency', trans) +[hl, hp] = boundedline(..., 'orientation', orient) +[hl, hp] = boundedline(..., 'cmap', cmap) +``` + +See function help for description of input and output variables. + +### Examples + +Plot with opaque bounds. In this example, the bounds on the first line +vary over x, while the bounds on the second line are constant for all x. +An outline is added to the bounds so the overlapping region can be seen +more clearly. + +```matlab +x = linspace(0, 2*pi, 50); +y1 = sin(x); +y2 = cos(x); +e1 = rand(size(y1))*.5+.5; +e2 = [.25 .5]; + +ax(1) = subplot(2,2,1); +[l,p] = boundedline(x, y1, e1, '-b*', x, y2, e2, '--ro'); +outlinebounds(l,p); +title('Opaque bounds, with outline'); +``` +![boundedline1](boundedline_readme_01.png) + + +For our second axis, we use the same 2 lines, and this time assign +x-varying bounds to both lines. Rather than using the LineSpec syntax, +this example uses the default color order to assign the colors of the +lines and patches. + +```matlab +ax(2) = subplot(2,2,2); +boundedline(x, [y1;y2], rand(length(y1),2,2)*.5+.5, 'alpha'); +title('Transparent bounds'); +``` + +![boundedline2](boundedline_readme_02.png) + +The bounds can also be assigned to a horizontal orientation, for a case +where the x-axis represents the dependent variable. In this case, the +scalar error bound value applies to both lines and both sides of the +lines. + +```matlab +ax(3) = subplot(2,2,3); +boundedline([y1;y2], x, e1(1), 'orientation', 'horiz') +title('Horizontal bounds'); +``` + +![boundedline3](boundedline_readme_03.png) + + Rather than use a LineSpec or the default color order, a colormap array + can be used to assign colors. In this case, increasingly-narrower bounds + are added on top of the same line. + +```matlab +ax(4) = subplot(2,2,4); +boundedline(x, repmat(y1, 4,1), permute(0.5:-0.1:0.2, [3 1 2]), ... + 'cmap', cool(4), ... + 'transparency', 0.5); +title('Multiple bounds using colormap'); + +set(ax([1 2 4]), 'xlim', [0 2*pi]); +set(ax(3), 'ylim', [0 2*pi]); +``` + +![boundedline4](boundedline_readme_04.png) diff --git a/auxiliary/boundedline/boundedline/.gitignore b/auxiliary/boundedline/boundedline/.gitignore new file mode 100755 index 0000000..e492200 --- /dev/null +++ b/auxiliary/boundedline/boundedline/.gitignore @@ -0,0 +1,37 @@ +# Compiled source # +################### +*.com +*.class +*.dll +*.exe +*.o +*.so + +# Packages # +############ +# it's better to unpack these files and commit the raw source +# git has its own built in compression methods +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar +*.zip + +# Logs and databases # +###################### +*.log +*.sql +*.sqlite + +# OS generated files # +###################### +.DS_Store +.DS_Store? +._* +.Spotlight-V100 +.Trashes +ehthumbs.db +Thumbs.db \ No newline at end of file diff --git a/auxiliary/boundedline/boundedline/boundedline.m b/auxiliary/boundedline/boundedline/boundedline.m new file mode 100755 index 0000000..c94b597 --- /dev/null +++ b/auxiliary/boundedline/boundedline/boundedline.m @@ -0,0 +1,466 @@ +function varargout = boundedline(varargin) +%BOUNDEDLINE Plot a line with shaded error/confidence bounds +% +% [hl, hp] = boundedline(x, y, b) +% [hl, hp] = boundedline(x, y, b, linespec) +% [hl, hp] = boundedline(x1, y1, b1, linespec1, x2, y2, b2, linespec2) +% [hl, hp] = boundedline(..., 'alpha') +% [hl, hp] = boundedline(..., ax) +% [hl, hp] = boundedline(..., 'transparency', trans) +% [hl, hp] = boundedline(..., 'orientation', orient) +% [hl, hp] = boundedline(..., 'nan', nanflag) +% [hl, hp] = boundedline(..., 'cmap', cmap) +% +% Input variables: +% +% x, y: x and y values, either vectors of the same length, matrices +% of the same size, or vector/matrix pair where the row or +% column size of the array matches the length of the vector +% (same requirements as for plot function). +% +% b: npoint x nside x nline array. Distance from line to +% boundary, for each point along the line (dimension 1), for +% each side of the line (lower/upper or left/right, depending +% on orientation) (dimension 2), and for each plotted line +% described by the preceding x-y values (dimension 3). If +% size(b,1) == 1, the bounds will be the same for all points +% along the line. If size(b,2) == 1, the bounds will be +% symmetrical on both sides of the lines. If size(b,3) == 1, +% the same bounds will be applied to all lines described by +% the preceding x-y arrays (only applicable when either x or +% y is an array). Bounds cannot include Inf, -Inf, or NaN, +% +% linespec: line specification that determines line type, marker +% symbol, and color of the plotted lines for the preceding +% x-y values. +% +% 'alpha': if included, the bounded area will be rendered with a +% partially-transparent patch the same color as the +% corresponding line(s). If not included, the bounded area +% will be an opaque patch with a lighter shade of the +% corresponding line color. +% +% ax: handle of axis where lines will be plotted. If not +% included, the current axis will be used. +% +% transp: Scalar between 0 and 1 indicating with the transparency or +% intensity of color of the bounded area patch. Default is +% 0.2. +% +% orient: direction to add bounds +% 'vert': add bounds in vertical (y) direction (default) +% 'horiz': add bounds in horizontal (x) direction +% +% nanflag: Sets how NaNs in the boundedline patch should be handled +% 'fill': fill the value based on neighboring values, +% smoothing over the gap +% 'gap': leave a blank space over/below the line +% 'remove': drop NaNs from patches, creating a linear +% interpolation over the gap. Note that this +% applies only to the bounds; NaNs in the line will +% remain. +% +% cmap: n x 3 colormap array. If included, lines will be colored +% (in order of plotting) according to this colormap, +% overriding any linespec or default colors. +% +% Output variables: +% +% hl: handles to line objects +% +% hp: handles to patch objects +% +% Example: +% +% x = linspace(0, 2*pi, 50); +% y1 = sin(x); +% y2 = cos(x); +% e1 = rand(size(y1))*.5+.5; +% e2 = [.25 .5]; +% +% ax(1) = subplot(2,2,1); +% [l,p] = boundedline(x, y1, e1, '-b*', x, y2, e2, '--ro'); +% outlinebounds(l,p); +% title('Opaque bounds, with outline'); +% +% ax(2) = subplot(2,2,2); +% boundedline(x, [y1;y2], rand(length(y1),2,2)*.5+.5, 'alpha'); +% title('Transparent bounds'); +% +% ax(3) = subplot(2,2,3); +% boundedline([y1;y2], x, e1(1), 'orientation', 'horiz') +% title('Horizontal bounds'); +% +% ax(4) = subplot(2,2,4); +% boundedline(x, repmat(y1, 4,1), permute(0.5:-0.1:0.2, [3 1 2]), ... +% 'cmap', cool(4), 'transparency', 0.5); +% title('Multiple bounds using colormap'); + + +% Copyright 2010 Kelly Kearney + +%-------------------- +% Parse input +%-------------------- + +% Alpha flag + +isalpha = cellfun(@(x) ischar(x) && strcmp(x, 'alpha'), varargin); +if any(isalpha) + usealpha = true; + varargin = varargin(~isalpha); +else + usealpha = false; +end + +% Axis + +isax = cellfun(@(x) isscalar(x) && ishandle(x) && strcmp('axes', get(x,'type')), varargin); +if any(isax) + hax = varargin{isax}; + varargin = varargin(~isax); +else + hax = gca; +end + +% Transparency + +[found, trans, varargin] = parseparam(varargin, 'transparency'); + +if ~found + trans = 0.2; +end + +if ~isscalar(trans) || trans < 0 || trans > 1 + error('Transparency must be scalar between 0 and 1'); +end + +% Orientation + +[found, orient, varargin] = parseparam(varargin, 'orientation'); + +if ~found + orient = 'vert'; +end + +if strcmp(orient, 'vert') + isvert = true; +elseif strcmp(orient, 'horiz') + isvert = false; +else + error('Orientation must be ''vert'' or ''horiz'''); +end + + +% Colormap + +[hascmap, cmap, varargin] = parseparam(varargin, 'cmap'); + + +% NaN flag + +[found, nanflag, varargin] = parseparam(varargin, 'nan'); +if ~found + nanflag = 'fill'; +end +if ~ismember(nanflag, {'fill', 'gap', 'remove'}) + error('Nan flag must be ''fill'', ''gap'', or ''remove'''); +end + +% X, Y, E triplets, and linespec + +[x,y,err,linespec] = deal(cell(0)); +while ~isempty(varargin) + if length(varargin) < 3 + error('Unexpected input: should be x, y, bounds triplets'); + end + if all(cellfun(@isnumeric, varargin(1:3))) + x = [x varargin(1)]; + y = [y varargin(2)]; + err = [err varargin(3)]; + varargin(1:3) = []; + else + error('Unexpected input: should be x, y, bounds triplets'); + end + if ~isempty(varargin) && ischar(varargin{1}) + linespec = [linespec varargin(1)]; + varargin(1) = []; + else + linespec = [linespec {[]}]; + end +end + +%-------------------- +% Reformat x and y +% for line and patch +% plotting +%-------------------- + +% Calculate y values for bounding lines + +plotdata = cell(0,7); + +htemp = figure('visible', 'off'); +for ix = 1:length(x) + + % Get full x, y, and linespec data for each line (easier to let plot + % check for properly-sized x and y and expand values than to try to do + % it myself) + + try + if isempty(linespec{ix}) + hltemp = plot(x{ix}, y{ix}); + else + hltemp = plot(x{ix}, y{ix}, linespec{ix}); + end + catch + close(htemp); + error('X and Y matrices and/or linespec not appropriate for line plot'); + end + + linedata = get(hltemp, {'xdata', 'ydata', 'marker', 'linestyle', 'color'}); + + nline = size(linedata,1); + + % Expand bounds matrix if necessary + + if nline > 1 + if ndims(err{ix}) == 3 + err2 = squeeze(num2cell(err{ix},[1 2])); + else + err2 = repmat(err(ix),nline,1); + end + else + err2 = err(ix); + end + + % Figure out upper and lower bounds + + [lo, hi] = deal(cell(nline,1)); + for iln = 1:nline + + x2 = linedata{iln,1}; + y2 = linedata{iln,2}; + nx = length(x2); + + if isvert + lineval = y2; + else + lineval = x2; + end + + sz = size(err2{iln}); + + if isequal(sz, [nx 2]) + lo{iln} = lineval - err2{iln}(:,1)'; + hi{iln} = lineval + err2{iln}(:,2)'; + elseif isequal(sz, [nx 1]) + lo{iln} = lineval - err2{iln}'; + hi{iln} = lineval + err2{iln}'; + elseif isequal(sz, [1 2]) + lo{iln} = lineval - err2{iln}(1); + hi{iln} = lineval + err2{iln}(2); + elseif isequal(sz, [1 1]) + lo{iln} = lineval - err2{iln}; + hi{iln} = lineval + err2{iln}; + elseif isequal(sz, [2 nx]) % not documented, but accepted anyways + lo{iln} = lineval - err2{iln}(:,1); + hi{iln} = lineval + err2{iln}(:,2); + elseif isequal(sz, [1 nx]) % not documented, but accepted anyways + lo{iln} = lineval - err2{iln}; + hi{iln} = lineval + err2{iln}; + elseif isequal(sz, [2 1]) % not documented, but accepted anyways + lo{iln} = lineval - err2{iln}(1); + hi{iln} = lineval + err2{iln}(2); + else + error('Error bounds must be npt x nside x nline array'); + end + + end + + % Combine all data (xline, yline, marker, linestyle, color, lower bound + % (x or y), upper bound (x or y) + + plotdata = [plotdata; linedata lo hi]; + +end +close(htemp); + +% Override colormap + +if hascmap + nd = size(plotdata,1); + cmap = repmat(cmap, ceil(nd/size(cmap,1)), 1); + cmap = cmap(1:nd,:); + plotdata(:,5) = num2cell(cmap,2); +end + + +%-------------------- +% Plot +%-------------------- + +% Setup of x and y, plus line and patch properties + +nline = size(plotdata,1); +[xl, yl, xp, yp, marker, lnsty, lncol, ptchcol, alpha] = deal(cell(nline,1)); + +for iln = 1:nline + xl{iln} = plotdata{iln,1}; + yl{iln} = plotdata{iln,2}; +% if isvert +% xp{iln} = [plotdata{iln,1} fliplr(plotdata{iln,1})]; +% yp{iln} = [plotdata{iln,6} fliplr(plotdata{iln,7})]; +% else +% xp{iln} = [plotdata{iln,6} fliplr(plotdata{iln,7})]; +% yp{iln} = [plotdata{iln,2} fliplr(plotdata{iln,2})]; +% end + + [xp{iln}, yp{iln}] = calcpatch(plotdata{iln,1}, plotdata{iln,2}, isvert, plotdata{iln,6}, plotdata{iln,7}, nanflag); + + marker{iln} = plotdata{iln,3}; + lnsty{iln} = plotdata{iln,4}; + + if usealpha + lncol{iln} = plotdata{iln,5}; + ptchcol{iln} = plotdata{iln,5}; + alpha{iln} = trans; + else + lncol{iln} = plotdata{iln,5}; + ptchcol{iln} = interp1([0 1], [1 1 1; lncol{iln}], trans); + alpha{iln} = 1; + end +end + +% Plot patches and lines + +if verLessThan('matlab', '8.4.0') + [hp,hl] = deal(zeros(nline,1)); +else + [hp,hl] = deal(gobjects(nline,1)); +end + + +for iln = 1:nline + hp(iln) = patch(xp{iln}, yp{iln}, ptchcol{iln}, 'facealpha', alpha{iln}, 'edgecolor', 'none', 'parent', hax); +end + +for iln = 1:nline + hl(iln) = line(xl{iln}, yl{iln}, 'marker', marker{iln}, 'linestyle', lnsty{iln}, 'color', lncol{iln}, 'parent', hax); +end + +%-------------------- +% Assign output +%-------------------- + +nargoutchk(0,2); + +if nargout >= 1 + varargout{1} = hl; +end + +if nargout == 2 + varargout{2} = hp; +end + +%-------------------- +% Parse optional +% parameters +%-------------------- + +function [found, val, vars] = parseparam(vars, param) + +isvar = cellfun(@(x) ischar(x) && strcmpi(x, param), vars); + +if sum(isvar) > 1 + error('Parameters can only be passed once'); +end + +if any(isvar) + found = true; + idx = find(isvar); + val = vars{idx+1}; + vars([idx idx+1]) = []; +else + found = false; + val = []; +end + +%---------------------------- +% Calculate patch coordinates +%---------------------------- + +function [xp, yp] = calcpatch(xl, yl, isvert, lo, hi, nanflag) + +ismissing = isnan([xl;yl;lo;hi]); + +% If gap method, split + +if any(ismissing(:)) && strcmp(nanflag, 'gap') + + tmp = [xl;yl;lo;hi]; + + idx = find(any(ismissing,1)); + n = diff([0 idx length(xl)]); + + tmp = mat2cell(tmp, 4, n); + isemp = cellfun('isempty', tmp); + tmp = tmp(~isemp); + + tmp = cellfun(@(a) a(:,~any(isnan(a),1)), tmp, 'uni', 0); + isemp = cellfun('isempty', tmp); + tmp = tmp(~isemp); + + xl = cellfun(@(a) a(1,:), tmp, 'uni', 0); + yl = cellfun(@(a) a(2,:), tmp, 'uni', 0); + lo = cellfun(@(a) a(3,:), tmp, 'uni', 0); + hi = cellfun(@(a) a(4,:), tmp, 'uni', 0); +else + xl = {xl}; + yl = {yl}; + lo = {lo}; + hi = {hi}; +end + +[xp, yp] = deal(cell(size(xl))); + +for ii = 1:length(xl) + + iseq = ~verLessThan('matlab', '8.4.0') && isequal(lo{ii}, hi{ii}); % deal with zero-width bug in R2014b/R2015a + + if isvert + if iseq + xp{ii} = [xl{ii} nan(size(xl{ii}))]; + yp{ii} = [lo{ii} fliplr(hi{ii})]; + else + xp{ii} = [xl{ii} fliplr(xl{ii})]; + yp{ii} = [lo{ii} fliplr(hi{ii})]; + end + else + if iseq + xp{ii} = [lo{ii} fliplr(hi{ii})]; + yp{ii} = [yl{ii} nan(size(yl{ii}))]; + else + xp{ii} = [lo{ii} fliplr(hi{ii})]; + yp{ii} = [yl{ii} fliplr(yl{ii})]; + end + end + + if strcmp(nanflag, 'fill') + xp{ii} = inpaint_nans(xp{ii}', 4); + yp{ii} = inpaint_nans(yp{ii}', 4); + elseif strcmp(nanflag, 'remove') + isn = isnan(xp{ii}) | isnan(yp{ii}); + xp{ii} = xp{ii}(~isn); + yp{ii} = yp{ii}(~isn); + end + +end + +if strcmp(nanflag, 'gap') + [xp, yp] = singlepatch(xp, yp); +else + xp = xp{1}; + yp = yp{1}; +end + diff --git a/auxiliary/boundedline/boundedline/outlinebounds.m b/auxiliary/boundedline/boundedline/outlinebounds.m new file mode 100755 index 0000000..36724bd --- /dev/null +++ b/auxiliary/boundedline/boundedline/outlinebounds.m @@ -0,0 +1,43 @@ +function hnew = outlinebounds(hl, hp) +%OUTLINEBOUNDS Outline the patch of a boundedline +% +% hnew = outlinebounds(hl, hp) +% +% This function adds an outline to the patch objects created by +% boundedline, matching the color of the central line associated with each +% patch. +% +% Input variables: +% +% hl: handles to line objects from boundedline +% +% hp: handles to patch objects from boundedline +% +% Output variables: +% +% hnew: handle to new line objects + +% Copyright 2012 Kelly Kearney + + +hnew = zeros(size(hl)); +for il = 1:numel(hp) + col = get(hl(il), 'color'); + xy = get(hp(il), {'xdata','ydata'}); + ax = ancestor(hl(il), 'axes'); + + nline = size(xy{1},2); + if mod(size(xy{1}, 1), 2) == 0 + % Insert a NaN between upper and lower lines, so they're disconnected + L = size(xy{1}, 1) / 2; + xy{1} = [xy{1}(1:L, :); nan(1, nline); xy{1}(L+1:end, :)]; + xy{2} = [xy{2}(1:L, :); nan(1, nline); xy{2}(L+1:end, :)]; + end + if nline > 1 + xy{1} = reshape([xy{1}; nan(1,nline)], [], 1); + xy{2} = reshape([xy{2}; nan(1,nline)], [], 1); + end + hnew(il) = line(xy{1}, xy{2}, 'parent', ax, 'linestyle', '-', 'color', col); + +end + diff --git a/auxiliary/boundedline/boundedline_readme.png b/auxiliary/boundedline/boundedline_readme.png new file mode 100755 index 0000000..c125d08 Binary files /dev/null and b/auxiliary/boundedline/boundedline_readme.png differ diff --git a/auxiliary/boundedline/boundedline_readme_01.png b/auxiliary/boundedline/boundedline_readme_01.png new file mode 100755 index 0000000..60c15b1 Binary files /dev/null and b/auxiliary/boundedline/boundedline_readme_01.png differ diff --git a/auxiliary/boundedline/boundedline_readme_02.png b/auxiliary/boundedline/boundedline_readme_02.png new file mode 100755 index 0000000..d955b68 Binary files /dev/null and b/auxiliary/boundedline/boundedline_readme_02.png differ diff --git a/auxiliary/boundedline/boundedline_readme_03.png b/auxiliary/boundedline/boundedline_readme_03.png new file mode 100755 index 0000000..588a2fa Binary files /dev/null and b/auxiliary/boundedline/boundedline_readme_03.png differ diff --git a/auxiliary/boundedline/boundedline_readme_04.png b/auxiliary/boundedline/boundedline_readme_04.png new file mode 100755 index 0000000..9808fb8 Binary files /dev/null and b/auxiliary/boundedline/boundedline_readme_04.png differ diff --git a/auxiliary/boundedline/catuneven/.gitignore b/auxiliary/boundedline/catuneven/.gitignore new file mode 100755 index 0000000..e492200 --- /dev/null +++ b/auxiliary/boundedline/catuneven/.gitignore @@ -0,0 +1,37 @@ +# Compiled source # +################### +*.com +*.class +*.dll +*.exe +*.o +*.so + +# Packages # +############ +# it's better to unpack these files and commit the raw source +# git has its own built in compression methods +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar +*.zip + +# Logs and databases # +###################### +*.log +*.sql +*.sqlite + +# OS generated files # +###################### +.DS_Store +.DS_Store? +._* +.Spotlight-V100 +.Trashes +ehthumbs.db +Thumbs.db \ No newline at end of file diff --git a/auxiliary/boundedline/catuneven/catuneven.m b/auxiliary/boundedline/catuneven/catuneven.m new file mode 100755 index 0000000..c63439a --- /dev/null +++ b/auxiliary/boundedline/catuneven/catuneven.m @@ -0,0 +1,53 @@ +function b = catuneven(dim, padval, varargin) +%CATUNEVEN Concatenate unequally-sized arrays, padding with a value +% +% This function is similar to cat, except it does not require the arrays to +% be equally-sized along non-concatenated dimensions. Instead, all arrays +% are padded to be equally-sized using the value specified. +% +% b = catuneven(dim, padval, a1, a2, ...) +% +% Input variables: +% +% dim: dimension along which to concatenate +% +% padval: value used as placeholder when arrays are expanded +% +% a#: arrays to be concatenated, numerical +% +% Output variables: +% +% b: concatenated array + +% Copyright 2013 Kelly Kearney + +ndim = max(cellfun(@ndims, varargin)); +ndim = max(ndim, dim); + +for ii = 1:ndim + sz(:,ii) = cellfun(@(x) size(x, ii), varargin); +end +maxsz = max(sz, [], 1); + +nv = length(varargin); +val = cell(size(varargin)); +for ii = 1:nv + sztmp = maxsz; + sztmp(dim) = sz(ii,dim); + + idx = cell(ndim,1); + [idx{:}] = ind2sub(sz(ii,:), 1:numel(varargin{ii})); + + idxnew = sub2ind(sztmp, idx{:}); + + val{ii} = ones(sztmp) * padval; + val{ii}(idxnew) = varargin{ii}; + +end + +b = cat(dim, val{:}); + + + + + diff --git a/auxiliary/boundedline/singlepatch/.gitignore b/auxiliary/boundedline/singlepatch/.gitignore new file mode 100755 index 0000000..e492200 --- /dev/null +++ b/auxiliary/boundedline/singlepatch/.gitignore @@ -0,0 +1,37 @@ +# Compiled source # +################### +*.com +*.class +*.dll +*.exe +*.o +*.so + +# Packages # +############ +# it's better to unpack these files and commit the raw source +# git has its own built in compression methods +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar +*.zip + +# Logs and databases # +###################### +*.log +*.sql +*.sqlite + +# OS generated files # +###################### +.DS_Store +.DS_Store? +._* +.Spotlight-V100 +.Trashes +ehthumbs.db +Thumbs.db \ No newline at end of file diff --git a/auxiliary/boundedline/singlepatch/singlepatch.m b/auxiliary/boundedline/singlepatch/singlepatch.m new file mode 100755 index 0000000..d7258e3 --- /dev/null +++ b/auxiliary/boundedline/singlepatch/singlepatch.m @@ -0,0 +1,65 @@ +function varargout = singlepatch(varargin) +%SINGLEPATCH Concatenate patches to be plotted as one +% +% [xp, yp, zp, ...] = singlepatch(x, y, z, ...) +% +% Concatenates uneven vectors of x and y coordinates by replicating the +% last point in each polygon. This allows patches with different numbers +% of vertices to be plotted as one, which is often much, much more +% efficient than plotting lots of individual patches. +% +% Input variables: +% +% x: cell array, with each cell holding a vector of coordinates +% associates with a single patch. The input variables must all be +% of the same size, and usually will correspond to x, y, z, and c +% data for the patches. +% +% Output variables: +% +% xp: m x n array of coordinates, where m is the maximum length of the +% vectors in x and n is numel(x). + +% Copyright 2015 Kelly Kearney + +if nargin ~= nargout + error('Must supply the same number of input variables as output variables'); +end + +nv = nargin; +vars = varargin; + +sz = cellfun(@size, vars{1}(:), 'uni', 0); +sz = cat(1, sz{:}); + +if all(sz(:,1) == 1) + for ii = 1:nv + vars{ii} = catuneven(1, NaN, vars{ii}{:})'; + end +% x = catuneven(1, NaN, x{:})'; +% y = catuneven(1, NaN, y{:})'; +elseif all(sz(:,2) == 1) + for ii = 1:nv + vars{ii} = catuneven(2, NaN, vars{ii}{:}); + end +% x = catuneven(2, NaN, x{:}); +% y = catuneven(2, NaN, y{:}); +else + error('Inputs must be cell arrays of vectors'); +end + +[ii,jj] = find(isnan(vars{1})); + +ind = accumarray(jj, ii, [size(vars{1},2) 1], @min); +ij1 = [ii jj]; +ij2 = [ind(jj)-1 jj]; +idx1 = sub2ind(size(vars{1}), ij1(:,1), ij1(:,2)); +idx2 = sub2ind(size(vars{1}), ij2(:,1), ij2(:,2)); + +for ii = 1:nv + vars{ii}(idx1) = vars{ii}(idx2); +end + +varargout = vars; + + diff --git a/auxiliary/globalize_params.m b/auxiliary/globalize_params.m new file mode 100644 index 0000000..772df50 --- /dev/null +++ b/auxiliary/globalize_params.m @@ -0,0 +1,75 @@ +function [m] = globalize_params(m) +% globalize_params change the scope of all the parameters from the reaction +% to the model. +% delete the locally scoped parameters and add parameters at the model +% object level. +% Vipul Singhal, Nov 2017 + +% main loop over all the reactions +R = length(m.reactions); +for r = 1:R + + rx = m.reactions(r); % reaction class object + + % check if the reaction has an internal parameter object. + if ~isempty(rx.KineticLaw.Parameters) + % nothing to do if there are no reaction scoped parameters. + + % get the parameter names + % parnames = get(rx.kineticlaw.getparameters, 'Name'); % dont need to use this. + % use this instead: + parnames = rx.KineticLaw.ParameterVariableNames; + + % for each paramter name, get its value, then delete it. + P = length(parnames); + pvals = zeros(1, P); + for p = 1:P + % get the parameter values (to copy over to the global params) + pvals(p) = rx.KineticLaw.Parameters(p).Value; + % alt: get(rx.kineticlaw.getparameters, 'Value'), this returns a + % cell array of numbers, also put this outside the P loop. + end + + for p = 1:P + % delete the parameter objects + pTarget = sbioselect(rx, 'type', 'parameter', 'Name', parnames{p}); + pTarget.delete; + end + % Note that deleting the parameter objects does not remove the + % parameter variable name property of the KineticLaw object. + + % now if reversible, setup the Kd parameter and the rule that Kd=kr/kf + if rx.Reversible + parname_base = parnames{1}(1:end-2); + % assume a form like TXTL_UTR_UTR1_F, and so we want to remove the '_F' at the end. + %todo: check if this is the format of all the parameters. + if isempty(sbioselect(m,'Type','Parameter', 'Name', [parname_base '_Kd'])) + addparameter(m, [parname_base '_Kd'], pvals(2)/pvals(1)) ; + end + + if isempty(sbioselect(m,'Type','Parameter', 'Name', parnames{1})) + addparameter(m, parnames{1}, pvals(1)); + end + + if isempty(sbioselect(m,'Type','Parameter', 'Name', parnames{2})) + addparameter(m, parnames{2}, pvals(2)) ; + end + + ruleStr = [parnames{2} ' = ' parname_base '_Kd*' parnames{1}]; + if isempty(sbioselect(m,'Type','Rule', 'Rule', ruleStr)) + addrule(m, ruleStr, 'initialAssignment'); + end + else + % irreversible reaction, just add the parameter to the model scope. + % + if isempty(sbioselect(m,'Type','Parameter', 'Name', parnames{1})) + addparameter(m, parnames{1}, pvals(1)); + end + + end + end + +end + +end + diff --git a/auxiliary/listmodelparts.m b/auxiliary/listmodelparts.m new file mode 100644 index 0000000..01886e1 --- /dev/null +++ b/auxiliary/listmodelparts.m @@ -0,0 +1,15 @@ +function listmodelparts(m) +%listmodelparts list the insides of a model +% list the species, the reactions, the global parameters, the parameters +% and which reactions they are tied to, the rules and the events. +% input: model object, m. + +m.species +m.Reactions +m.Parameters +gp = getparam(m); +gp(:, [1, 3]) +m.Rules +m.Events +end + diff --git a/auxiliary/plotCustomSpecies2.m b/auxiliary/plotCustomSpecies2.m index 3db2005..8d7bfde 100644 --- a/auxiliary/plotCustomSpecies2.m +++ b/auxiliary/plotCustomSpecies2.m @@ -1,12 +1,23 @@ function varargout = plotCustomSpecies2(mobj, x_ode, t_ode, cellofspecies, varargin) -% plotting routine: input list of species, mobj cell array, data cell array, title containing what is being plotted and what speie is being varied, and cell array of parameter values +% plotting routine: input list of species, mobj cell array, data cell array, +% title containing what is being plotted and what speie is being varied, +% and cell array of parameter values numvarargs = length(varargin); -optargs = {{}, [],[],[]}; +optargs = {{}, [],[],[], []}; optargs(1:numvarargs) = varargin; -[legendList, saveflag, folderName, figureName] = optargs{:}; +[legendList, saveflag, folderName, figureName, fighandle] = optargs{:}; scrsz = get(0,'screensize'); - figure('position',[50 50 scrsz(3)/1.1 scrsz(4)/1.3])%,'name','simulation plot window','numbertitle','off') + if ~isempty(fighandle) + figure(fighandle )%,'name','simulation plot window','numbertitle','off') + set(fighandle, 'position',[50 50 scrsz(3)/1.1 scrsz(4)/1.3]); + + else + figure( 'position',[50 50 scrsz(3)/1.1 scrsz(4)/1.3]) + %,'name','simulation plot window','numbertitle','off') + end + + if iscell(mobj) colororder1 = lines; @@ -69,7 +80,8 @@ else - warning('txtltoolbox:plotcustomspecies','mobj and other things must be cell arrays, noting will be plotted') + warning('txtltoolbox:plotcustomspecies',... + 'mobj and other things must be cell arrays, nothing will be plotted') end diff --git a/auxiliary/twofactors.m b/auxiliary/twofactors.m new file mode 100644 index 0000000..079d40f --- /dev/null +++ b/auxiliary/twofactors.m @@ -0,0 +1,11 @@ +function [num1, num2] = twofactors(n) +%twofactors decompose a number into approximately equal factors +% if prime, just return 1 and number itself +% Vipul singhal +pf = factor(n); +numpf = length(pf); +mid = ceil(numpf/2); +num1 = prod(pf(1:mid)); +num2 = prod(pf(mid+1:end)); +end + diff --git a/components/txtl_prom_p70.m b/components/txtl_prom_p70.m index 551e2c0..4d406a4 100644 --- a/components/txtl_prom_p70.m +++ b/components/txtl_prom_p70.m @@ -69,8 +69,8 @@ % empty cellarray for amount => zero amount txtl_addspecies(tube, coreSpecies, cell(1,size(coreSpecies,2)), 'Internal'); - - txtl_transcription(mode, tube, dna, rna, RNAP, RNAPbound); + + txtl_transcription(mode, tube, dna, rna, RNAP, RNAPbound); %%%%%%%%%%%%%%%%%%% DRIVER MODE: Setup Reactions %%%%%%%%%%%%%%%%%%%%%%%%%% elseif strcmp(mode.add_dna_driver, 'Setup Reactions') @@ -82,6 +82,9 @@ error('the number of argument should be 5 or 8, not %d',nargin); end + + + % Parameters that describe this promoter parameters = {'TXTL_P70_RNAPbound_F',paramObj.RNAPbound_Forward;... 'TXTL_P70_RNAPbound_R',paramObj.RNAPbound_Reverse}; diff --git a/config/E30VNPRL_config.csv b/config/E30VNPRL_config.csv index e156030..4810912 100644 --- a/config/E30VNPRL_config.csv +++ b/config/E30VNPRL_config.csv @@ -1,4 +1,4 @@ -NTPmodel,Numeric,2, NTP model in use !TODO: document, AAmodel,Numeric,2, AA model in use !TODO: document, Transcription_Rate, Expression, 1.5/(RNA_Length), 1 NTP/second VN PRL, Translation_Rate, Expression, 4/(Protein_Length), >4 AA/second VN PRL, ,,,, DNA_RecBCD_Forward, Numeric,0.4, !TODO: document, DNA_RecBCD_Reverse, Numeric,0.1, !TODO: document, DNA_RecBCD_complex_deg, Numeric,0.5, !TODO: document, ,,,, Protein_ClpXP_Forward, Numeric,0.5, !TODO: document, Protein_ClpXP_Reverse, Numeric,0.0001, !TODO: document, Protein_ClpXP_complex_deg, Numeric,0.1, !TODO: document, ,,,, RNAP_S70_F, Numeric,100, really fast, RNAP_S70_R, Numeric,0.01,, ,,,, GamS_RecBCD_F, Numeric,0.001, ! !TODO document, GamS_RecBCD_R, Numeric,0.09, !TODO: document, ,,,, TL_AA_Forward,Expression,300, !TODO: document, TL_AA_Reverse,Expression,.10, !TODO: document, +NTPmodel,Numeric,2, NTP model in use !TODO: document, AAmodel,Numeric,2, AA model in use !TODO: document, Transcription_Rate, Expression, 7.5, 1.5 NTP/second VN PRL, Translation_Rate, Expression, 4, >4 AA/second VN PRL, ,,,, DNA_RecBCD_Forward, Numeric,0.4, !TODO: document, DNA_RecBCD_Reverse, Numeric,0.1, !TODO: document, DNA_RecBCD_complex_deg, Numeric,0.5, !TODO: document, ,,,, Protein_ClpXP_Forward, Numeric,0.5, !TODO: document, Protein_ClpXP_Reverse, Numeric,0.0001, !TODO: document, Protein_ClpXP_complex_deg, Numeric,0.1, !TODO: document, ,,,, RNAP_S70_F, Numeric,100, really fast, RNAP_S70_R, Numeric,0.01,, ,,,, GamS_RecBCD_F, Numeric,0.001, ! !TODO document, GamS_RecBCD_R, Numeric,0.09, !TODO: document, ,,,, TL_AA_Forward,Expression,300, !TODO: document, TL_AA_Reverse,Expression,.10, !TODO: document, TL_AGTP_Forward,Expression,30, !TODO: document, TL_AGTP_Reverse,Expression,10, !TODO: document, ,,,, RNA_deg,Expression,1/360, s^-1 12 min half life (VN PRL) , RNase_F,Expression,10, nM^-1 s^-1 see derivation on wiki, RNase_R,Expression,2000, s^-1 see derivation on wiki, ,,,, Ribosome_Binding_F, Expression,.0022, TODO, Ribosome_Binding_R, Expression,400, TODO, ,,,, NTP_Forward_1,Expression,1, !TODO: document, NTP_Reverse_1,Expression,1.20E+05, !TODO: document , NTP_Forward_2,Expression,1000, !TODO: document, NTP_Reverse_2,Expression,1.20E+8, !TODO: document, ,,,, RNAPbound_termination_rate, Numeric,0.05, !TODO: document, Ribobound_termination_rate, Numeric,50, !TODO: document, @@ -7,6 +7,7 @@ RNAP_ic, Numeric,100, !TODO: carry out MCMC and check lit, Ribo_ic, Numeric,30, !TODO: carry out MCMC and check lit, RNase_ic, Numeric,100, !TODO: carry out MCMC and check lit, ,,,, -ATP_degradation_rate, Numeric,0.0002, !TODO: document, -ATP_degradation_start_time, Numeric,5400, in seconds !TODO: document, +ATP_degradation_rate, Numeric,0.002, !TODO: document, +ATP_degradation_start_time, Numeric,10800, in seconds !TODO: document, +ATP_regeneration_rate, Numeric,0.002, !TODO: document, ,,,, AGTP_Concentration, Expression,3000000, ATP and GTP are at 1.5mM each (JoVE), CUTP_Concentration, Expression,1800000, CTP and UTP are at 0.9mM each (JovE), AA_Concentration, Expression,30000000, 20 tyes of AAs, $20% usability$ 30mM JoVE \ No newline at end of file diff --git a/config/E30_config.csv b/config/E30_config.csv index 040aeaf..647c360 100644 --- a/config/E30_config.csv +++ b/config/E30_config.csv @@ -1,7 +1,7 @@ NTPmodel,Numeric,2, NTP model in use, see documentation AAmodel,Numeric,2, AA model in use, see documentation -Transcription_Rate, Expression, 1/(RNA_Length), 1 NTP/second VN PRL -Translation_Rate, Expression, 4/(Protein_Length), >4 AA/second VN PRL +Transcription_Rate, Expression, 1, 1 NTP/second VN PRL +Translation_Rate, Expression, 4, >4 AA/second VN PRL ,,,, DNA_RecBCD_Forward, Numeric, 0.4, TODO DNA_RecBCD_Reverse, Numeric, 0.1, TODO diff --git a/config/E32_config.csv b/config/E32_config.csv index 5a23f7d..d8a547a 100644 --- a/config/E32_config.csv +++ b/config/E32_config.csv @@ -1,4 +1,4 @@ -NTPmodel,Numeric,2, NTP model in use !TODO: document, AAmodel,Numeric,2, AA model in use !TODO: document, Transcription_Rate, Expression, 4/(RNA_Length), 1 NTP/second VN PRL, Translation_Rate, Expression, 500/(Protein_Length), >4 AA/second VN PRL, ,,,, DNA_RecBCD_Forward, Numeric,0.4, !TODO: document, DNA_RecBCD_Reverse, Numeric,0.1, !TODO: document, DNA_RecBCD_complex_deg, Numeric,0.5, !TODO: document, ,,,, Protein_ClpXP_Forward, Numeric,0.5, !TODO: document, Protein_ClpXP_Reverse, Numeric,0.0001, !TODO: document, Protein_ClpXP_complex_deg, Numeric,0.1, !TODO: document, ,,,, RNAP_S70_F, Numeric,100, really fast, RNAP_S70_R, Numeric,0.01,, ,,,, GamS_RecBCD_F, Numeric,0.001, ! !TODO document, GamS_RecBCD_R, Numeric,0.09, !TODO: document, ,,,, TL_AA_Forward,Expression,300, !TODO: document, TL_AA_Reverse,Expression,.010, !TODO: document, +NTPmodel,Numeric,2, NTP model in use !TODO: document, AAmodel,Numeric,2, AA model in use !TODO: document, Transcription_Rate, Expression, 4, 1 NTP/second VN PRL, Translation_Rate, Expression, 50, >4 AA/second VN PRL, ,,,, DNA_RecBCD_Forward, Numeric,0.4, !TODO: document, DNA_RecBCD_Reverse, Numeric,0.1, !TODO: document, DNA_RecBCD_complex_deg, Numeric,0.5, !TODO: document, ,,,, Protein_ClpXP_Forward, Numeric,0.5, !TODO: document, Protein_ClpXP_Reverse, Numeric,0.0001, !TODO: document, Protein_ClpXP_complex_deg, Numeric,0.1, !TODO: document, ,,,, RNAP_S70_F, Numeric,100, really fast, RNAP_S70_R, Numeric,0.01,, ,,,, GamS_RecBCD_F, Numeric,0.001, ! !TODO document, GamS_RecBCD_R, Numeric,0.09, !TODO: document, ,,,, TL_AA_Forward,Expression,300, !TODO: document, TL_AA_Reverse,Expression,.010, !TODO: document, TL_AGTP_Forward,Expression,30, !TODO: document, TL_AGTP_Reverse,Expression,1, !TODO: document, ,,,, RNA_deg,Expression,0.5/360, s^-1 12 min half life (VN PRL) , RNase_F,Expression,.1, nM^-1 s^-1 see derivation on wiki, RNase_R,Expression,2, s^-1 see derivation on wiki, ,,,, Ribosome_Binding_F, Expression,.2, TODO, Ribosome_Binding_R, Expression,4, TODO, ,,,, NTP_Forward_1,Expression,0.1, !TODO: document, NTP_Reverse_1,Expression,1.20E+05, !TODO: document , NTP_Forward_2,Expression,100, !TODO: document, NTP_Reverse_2,Expression,1.20E+8, !TODO: document, ,,,, RNAPbound_termination_rate, Numeric,0.05, !TODO: document, Ribobound_termination_rate, Numeric,0.005, !TODO: document, ,,,, RecBCD_ic, Numeric,5, !TODO: carry out MCMC and check lit, diff --git a/config/E9_config.csv b/config/E9_config.csv index 087beb6..5ba9187 100644 --- a/config/E9_config.csv +++ b/config/E9_config.csv @@ -1,7 +1,7 @@ NTPmodel,Numeric,2, NTP model in use, see documentation AAmodel,Numeric,2, AA model in use, see documentation -Transcription_Rate, Expression, log(2)/(RNA_Length/50), 50 NTP/second transcription -Translation_Rate, Expression, log(2)/(Protein_Length/0.64), 1.5 AA/second translation +Transcription_Rate, Expression, log(2), +Translation_Rate, Expression, log(2), DNA_RecBCD_Forward, Numeric, 0.4, DNA_RecBCD_Reverse, Numeric, 0.1, diff --git a/config/Emcmc2017_config.csv b/config/Emcmc2017_config.csv new file mode 100644 index 0000000..7d0a381 --- /dev/null +++ b/config/Emcmc2017_config.csv @@ -0,0 +1,13 @@ +NTPmodel,Numeric,2, NTP model in use !TODO: document, AAmodel,Numeric,2, AA model in use !TODO: document, Transcription_Rate, Expression, 1.5, 1.5 NTP/second VN PRL, Translation_Rate, Expression, 4, >4 AA/second VN PRL, ,,,, DNA_RecBCD_Forward, Numeric,0.4, !TODO: document, DNA_RecBCD_Reverse, Numeric,0.1, !TODO: document, DNA_RecBCD_complex_deg, Numeric,0.5, !TODO: document, ,,,, Protein_ClpXP_Forward, Numeric,0.5, !TODO: document, Protein_ClpXP_Reverse, Numeric,0.0001, !TODO: document, Protein_ClpXP_complex_deg, Numeric,0.1, !TODO: document, ,,,, RNAP_S70_F, Numeric,1, really fast, RNAP_S70_R, Numeric,0.01,, ,,,, GamS_RecBCD_F, Numeric,0.001, ! !TODO document, GamS_RecBCD_R, Numeric,0.09, !TODO: document, ,,,, TL_AA_Forward,Expression,1, !TODO: document, TL_AA_Reverse,Expression,.10, !TODO: document, +TL_AGTP_Forward,Expression,1, !TODO: document, TL_AGTP_Reverse,Expression,10, !TODO: document, ,,,, RNA_deg,Expression,1/360, s^-1 12 min half life (VN PRL) , RNase_F,Expression,1, nM^-1 s^-1 see derivation on wiki, RNase_R,Expression,2000, s^-1 see derivation on wiki, ,,,, Ribosome_Binding_F, Expression,.0022, TODO, Ribosome_Binding_R, Expression,4, TODO, ,,,, NTP_Forward_1,Expression,1, !TODO: document, NTP_Reverse_1,Expression,10, !TODO: document , +NTP_Forward_2,Expression,10, !TODO: document, NTP_Reverse_2,Expression,100, !TODO: document, ,,,, RNAPbound_termination_rate, Numeric,0.05, !TODO: document, +Ribobound_termination_rate, Numeric,50, !TODO: document, +,,,, RecBCD_ic, Numeric,5, !TODO: carry out MCMC and check lit, +RNAP_ic, Numeric,100, !TODO: carry out MCMC and check lit, +Ribo_ic, Numeric,30, !TODO: carry out MCMC and check lit, +RNase_ic, Numeric,100, !TODO: carry out MCMC and check lit, +,,,, +ATP_degradation_rate, Numeric,0.0002, !TODO: document, +ATP_degradation_start_time, Numeric,10800, in seconds !TODO: document, +ATP_regeneration_rate, Numeric,0.002, !TODO: document, +,,,, AGTP_Concentration, Expression,3000000, ATP and GTP are at 1.5mM each (JoVE), CUTP_Concentration, Expression,1800000, CTP and UTP are at 0.9mM each (JovE), AA_Concentration, Expression,30000000, 20 tyes of AAs, $20% usability$ 30mM JoVE \ No newline at end of file diff --git a/config/Emcmc2018_config.csv b/config/Emcmc2018_config.csv new file mode 100644 index 0000000..980725a --- /dev/null +++ b/config/Emcmc2018_config.csv @@ -0,0 +1,13 @@ +NTPmodel,Numeric,2, NTP model in use !TODO: document, AAmodel,Numeric,2, AA model in use !TODO: document, Transcription_Rate, Expression, 10.5, need to get about 400nM of RNA, Translation_Rate, Expression, 20, >4 AA/second VN PRL, ,,,, DNA_RecBCD_Forward, Numeric,0.4, !TODO: document, DNA_RecBCD_Reverse, Numeric,0.1, !TODO: document, DNA_RecBCD_complex_deg, Numeric,0.5, !TODO: document, ,,,, Protein_ClpXP_Forward, Numeric,0.5, !TODO: document, Protein_ClpXP_Reverse, Numeric,0.0001, !TODO: document, Protein_ClpXP_complex_deg, Numeric,0.1, !TODO: document, ,,,, RNAP_S70_F, Numeric,1, really fast, RNAP_S70_R, Numeric,0.01,, ,,,, GamS_RecBCD_F, Numeric,0.001, ! !TODO document, GamS_RecBCD_R, Numeric,0.09, !TODO: document, ,,,, TL_AA_Forward,Expression,.001, !TODO: document TL_AA_Reverse,Expression,100, !TODO: document, +TL_AGTP_Forward,Expression,.00001, !TODO: document, TL_AGTP_Reverse,Expression,1, !TODO: document, ,,,, RNA_deg,Expression,1/360, s^-1 12 min half life (VN PRL) , RNase_F,Expression,.01, nM^-1 s^-1 see derivation on wiki, RNase_R,Expression,2, s^-1 see derivation on wiki, ,,,, Ribosome_Binding_F, Expression,.0022, TODO, Ribosome_Binding_R, Expression,4, TODO, ,,,, NTP_Forward_1,Expression,.0001, !TODO: document, NTP_Reverse_1,Expression,10, !TODO: document , +NTP_Forward_2,Expression,0.00001, !TODO: document, NTP_Reverse_2,Expression,10, !TODO: document, ,,,, RNAPbound_termination_rate, Numeric,.150, !TODO: document, +Ribobound_termination_rate, Numeric,40, !TODO: document, +,,,, RecBCD_ic, Numeric,5, !TODO: carry out MCMC and check lit, +RNAP_ic, Numeric,100, !TODO: carry out MCMC and check lit, +Ribo_ic, Numeric,30, !TODO: carry out MCMC and check lit, +RNase_ic, Numeric,100, !TODO: carry out MCMC and check lit, +,,,, +ATP_degradation_rate, Numeric,0.0002, !TODO: document, +ATP_degradation_start_time, Numeric,7200, in seconds !TODO: document, +ATP_regeneration_rate, Numeric,0.02, !TODO: document, +,,,, AGTP_Concentration, Expression,3000000, ATP and GTP are at 1.5mM each (JoVE), CUTP_Concentration, Expression,1800000, CTP and UTP are at 0.9mM each (JovE), AA_Concentration, Expression,30000000, 20 tyes of AAs, $20% usability$ 30mM JoVE \ No newline at end of file diff --git a/core/txtl_addreaction.m b/core/txtl_addreaction.m index 881d569..6265416 100644 --- a/core/txtl_addreaction.m +++ b/core/txtl_addreaction.m @@ -27,7 +27,6 @@ function txtl_addreaction(tube,reactionEq,kineticLaw,parameters,varargin) reactionEq = strjoin(mergeStr,''); end - end %%% Vesicule mode %%% diff --git a/core/txtl_enzyme_resource_degradation.m b/core/txtl_enzyme_resource_degradation.m index 9c8b88a..366172d 100644 --- a/core/txtl_enzyme_resource_degradation.m +++ b/core/txtl_enzyme_resource_degradation.m @@ -1,15 +1,59 @@ function txtl_enzyme_resource_degradation(modelObj) +% -atp_deg_rate = modelObj.UserData.ReactionConfig.ATP_degradation_rate; -atp_deg_time = modelObj.UserData.ReactionConfig.ATP_degradation_start_time; -% After some time, ATP regeneration stops, leading to an overall decrease in -% ATP concentrations. c.f. V Noireaux 2003. -parameterObj = addparameter(modelObj, 'AGTPdeg_F', 0, 'ConstantValue', false); -evt2 = addevent(modelObj, ['time <= ' num2str(atp_deg_time)] , 'AGTPdeg_F = 0'); -evt3 = addevent(modelObj, ['time > ' num2str(atp_deg_time)], ['AGTPdeg_F = ' num2str(atp_deg_rate)]); -reactionObj = addreaction(modelObj,'AGTP -> AGTP_USED'); -kineticlawObj = addkineticlaw(reactionObj, 'MassAction'); -set(kineticlawObj, 'ParameterVariableName', 'AGTPdeg_F'); +if isfield(modelObj.UserData, 'energymode') && strcmp(modelObj.UserData.energymode, 'regeneration') + atp_deg_rate = modelObj.UserData.ReactionConfig.ATP_degradation_rate; + atp_regen_time = modelObj.UserData.ReactionConfig.ATP_degradation_start_time; + atp_reg_rate = modelObj.UserData.ReactionConfig.ATP_regeneration_rate; + % After some time, ATP regeneration stops, leading to an overall decrease in + % ATP concentrations. c.f. V Noireaux 2003 + parameterObj = addparameter(modelObj, 'AGTPreg_varying', atp_reg_rate, 'ConstantValue', false); + + % time of the regenration system turn off + parameterObj = addparameter(modelObj, 'AGTPdeg_time', atp_regen_time, 'ConstantValue', true); + + parameterObj = addparameter(modelObj, 'AGTPreg_ON', atp_reg_rate, 'ConstantValue', true); + parameterObj = addparameter(modelObj, 'AGTPdeg_rate', atp_deg_rate, 'ConstantValue', true); + + + evt2 = addevent(modelObj, 'time <= AGTPdeg_time' , 'AGTPreg_varying = AGTPreg_ON'); + + evt3 = addevent(modelObj, 'time > AGTPdeg_time',... + 'AGTPreg_varying = 0'); + + % constantly active first order degradation. + reactionObj = addreaction(modelObj,'AGTP -> AGMP'); + kineticlawObj = addkineticlaw(reactionObj, 'MassAction'); + set(kineticlawObj, 'ParameterVariableName', 'AGTPdeg_rate'); + + % regeneration system + reactionObj = addreaction(modelObj,'AGMP -> AGTP'); + kineticlawObj = addkineticlaw(reactionObj, 'MassAction'); + set(kineticlawObj, 'ParameterVariableName', 'AGTPreg_varying'); + +else + + atp_deg_rate = modelObj.UserData.ReactionConfig.ATP_degradation_rate; + atp_deg_time = modelObj.UserData.ReactionConfig.ATP_degradation_start_time; + + % After some time, ATP regeneration stops, leading to an overall decrease in + % ATP concentrations. c.f. V Noireaux 2003. + parameterObj = addparameter(modelObj, 'AGTPdeg_F', 0, 'ConstantValue', false); + + parameterObj = addparameter(modelObj, 'AGTPdeg_time', atp_deg_time, 'ConstantValue', true); + + parameterObj = addparameter(modelObj, 'AGTPdeg_rate', atp_deg_rate, 'ConstantValue', true); + + evt2 = addevent(modelObj, 'time <= AGTPdeg_time' , 'AGTPdeg_F = 0'); + + evt3 = addevent(modelObj, 'time > AGTPdeg_time',... + ['AGTPdeg_F = AGTPdeg_rate']);% '=' num2str(atp_deg_rate)] + + reactionObj = addreaction(modelObj,'AGTP -> AGTP_USED'); + + kineticlawObj = addkineticlaw(reactionObj, 'MassAction'); + set(kineticlawObj, 'ParameterVariableName', 'AGTPdeg_F'); +end end \ No newline at end of file diff --git a/core/txtl_mrna_degradation.m b/core/txtl_mrna_degradation.m index 3fa4192..beb8898 100644 --- a/core/txtl_mrna_degradation.m +++ b/core/txtl_mrna_degradation.m @@ -33,7 +33,7 @@ function txtl_mrna_degradation(mode, tube, dna, rna, rbs_spec) complexF = degRate/10; complexR = degRate/40; end - + length_over_2 = round(rna.UserData/2); % Setup RNA degradation reactions, by searching for strings with rna in % them, and then degrading those strings. listOfSpecies = get(tube.species, 'name'); @@ -57,17 +57,25 @@ function txtl_mrna_degradation(mode, tube, dna, rna, rbs_spec) productspecies = []; for j = 1:length(nonRNAlist) - if strcmp(nonRNAlist{j}, '2AGTP') - nonRNAlist{j} = '2 AGTP'; - elseif strcmp(nonRNAlist{j}, 'term_Ribo') + % if in the complex we have term_Ribo, then we need to + % return Ribo, not create nonsensical species term_Ribo. + if strcmp(nonRNAlist{j}, 'term_Ribo') nonRNAlist{j} = 'Ribo'; end - productspecies = [productspecies ' + ' nonRNAlist{j}]; end - txtl_addreaction(tube,[RNAcomplexes{i} ':RNase -> RNase' productspecies],... - 'MassAction',{'TXTL_RNAdeg_F',degRate}); + if isfield(tube.UserData, 'energymode') && strcmp(tube.UserData.energymode, 'regeneration') + + txtl_addreaction(tube,[RNAcomplexes{i} ':RNase -> RNase + '... + num2str(length_over_2) ' AGMP + ' num2str(length_over_2) ' CUMP '... + productspecies],... + 'MassAction',{'TXTL_RNAdeg_kc',degRate}); + + else + txtl_addreaction(tube,[RNAcomplexes{i} ':RNase -> RNase ' productspecies],... + 'MassAction',{'TXTL_RNAdeg_kc',degRate}); + end end end diff --git a/core/txtl_plot_gui.fig b/core/txtl_plot_gui.fig deleted file mode 100644 index 56dd2a5..0000000 Binary files a/core/txtl_plot_gui.fig and /dev/null differ diff --git a/core/txtl_reaction_config.m b/core/txtl_reaction_config.m index 7ec24d7..7f45cd4 100644 --- a/core/txtl_reaction_config.m +++ b/core/txtl_reaction_config.m @@ -40,6 +40,7 @@ Ribo_ic; RNase_ic; ATP_degradation_rate; + ATP_regeneration_rate; ATP_degradation_start_time end diff --git a/core/txtl_runsim.m b/core/txtl_runsim.m index e38cfd1..095f7a8 100644 --- a/core/txtl_runsim.m +++ b/core/txtl_runsim.m @@ -1,6 +1,6 @@ -% Written by Zoltan A Tuza, Sep 2012 +% Written by Zoltan A Tuza and Vipul Singhal, Sep 2012 % % Copyright (c) 2012 by California Institute of Technology % All rights reserved. diff --git a/core/txtl_transcription.m b/core/txtl_transcription.m index 229172d..fce7347 100644 --- a/core/txtl_transcription.m +++ b/core/txtl_transcription.m @@ -7,7 +7,7 @@ % Written by Richard Murray, 9 Sep 2012 % Edited by Vipul Singhal, 2012 - 2017 -% +% % Copyright (c) 2012 by California Institute of Technology % All rights reserved. % @@ -65,66 +65,144 @@ function txtl_transcription(mode, varargin) % calculate the transcription rate from information in the config file % and the length of the gene to be transcribed - ktxExpression = strrep(tube.Userdata.ReactionConfig.Transcription_Rate,... - 'RNA_Length','rna.UserData'); - ktx = eval(ktxExpression); %kt/rna_length = 1.5(ntps^-1) / rnalength(ntp) - - % compute the consumption reaction rate - ntpcnt = round(rna.UserData/2); - % ntpcnt = rna.length/2. That way we can keep AGTP - %= atp + gtp. - NTPConsumptionRate = {'TXTL_NTP_consumption',(ntpcnt-1)*ktx}; - - % write down the string for the transcription equation - RNAPbound_term = ['term_' RNAPbound]; - transcriptionEq = ... - ['[CUTP:AGTP:' RNAPbound '] -> ' RNAPbound_term ' + ' rna.Name]; - % add the consumption and termination reactions. + % add global elongation parameter, + txglob = sbioselect(tube, 'Name', 'TX_elong_glob', 'Type', 'Parameter'); + if isempty(txglob) + addparameter(tube, 'TX_elong_glob',tube.Userdata.ReactionConfig.Transcription_Rate); + end + + + % then add the dependent parameters for both the tx and the consumption + % reactions, also at the global scope, and tie the three parameters via + % a couple of initial assignment rules. + + % start with grabbing the name strings for the RNA + temp = regexp(rna.name, 'RNA (\w*)--(\w*)', 'tokens'); + rnaspec = [temp{1}{1} '_' temp{1}{2}]; + + % use this sting to name the transcription and consumption reactions + txparamname = ['TX_transcription_' rnaspec]; + ntpconsname = ['TX_NTPcons_' rnaspec]; + + % get the RNA length, which decides what the actual mRNA production + % rate is + RNAlength = rna.UserData; + % add the transcription parameter in the model scope, with the length + % adjusted value. + addparameter(tube, txparamname,tube.Userdata.ReactionConfig.Transcription_Rate/RNAlength); + + % compute the consumption reaction rate as follows + % ntpcnt = rna.length/4. + ntpcnt = round(RNAlength/4); % + % add the ntp consumption parameter in the global scope. + addparameter(tube, ntpconsname,tube.Userdata.ReactionConfig.Transcription_Rate/RNAlength*(ntpcnt-1)); + + % how to think about this: 1 nM of AGTP is 0.5nM of ATP, 0.5nM of GTP. + % Since the reaction rnap:dna:agtp:cutp -> rnap:dna_term + mrna uses + % 2nM of NTPs to make 1nM of mRNA, if the mrna is 1000ntp long, then we + % still need to consume 998nM of ntp. The rate at which the above + % reaction took place is kt/1000. + % + % so now we consider the ntp consumption reaction. + % rnap:dna:agtp:cutp -> rnap:dna + % each time this reaction fires, it uses 2nM of ntp. so it needs to + % fire 998/2 = 499 times for each time the earlier reaction fires. Now + % 499 is 1000/2 - 1, thus we define ntpcnt = rnalength/2, and the + % reaction of the consumption reaction is ktx/rnalength*(ntpcnt-1) + + % now we actually add the rule that sets the transcription rate and the + % ntp consumption rate. + ruleStr = [txparamname ' = TX_elong_glob/' num2str(RNAlength)]; + if isempty(sbioselect(tube,'Type','Rule', 'Rule', ruleStr)) + addrule(tube, ruleStr, 'initialAssignment'); + end + + ruleStr = [ntpconsname ... + ' = TX_elong_glob/' num2str(RNAlength) '*(' num2str(ntpcnt) '-1)']; + if isempty(sbioselect(tube,'Type','Rule', 'Rule', ruleStr)) + addrule(tube, ruleStr, 'initialAssignment'); + end + + % write down the string for the transcription equation depending on the + % mode of transcription. ie, if we have the energy regeneration mode, + % then we model AGMP, otherwise we do the usual... except, with and + % without energy mode ends up being exactly the same for + % teranscription. ... + if isfield(tube.UserData, 'energymode') && strcmp(tube.UserData.energymode, 'regeneration') + + RNAPbound_term = ['term_' RNAPbound]; + transcriptionEq = ... + ['[CUTP:AGTP:' RNAPbound '] -> ' RNAPbound_term ' + '... + rna.Name]; + + else + RNAPbound_term = ['term_' RNAPbound]; + transcriptionEq = ... + ['[CUTP:AGTP:' RNAPbound '] -> ' RNAPbound_term ' + ' rna.Name]; + + end + + % add the actual transcription reaction. Note that we use addreaction ( + % a simbiology function) as opposed to the txtl toolbox wrapper + % txtl_addreaction, because we want to specify a model scoped parameter + % as a parmeter, and not create a reaction scoped parameter. + reactionObj = addreaction(tube,transcriptionEq); + addkineticlaw (reactionObj, 'MassAction'); + reactionObj.KineticLaw.ParameterVariableNames = txparamname; + + + % add the consumption reactions + reactionObj = addreaction(tube,['[CUTP:AGTP:' RNAPbound '] -> ' RNAPbound]); + addkineticlaw (reactionObj, 'MassAction'); + reactionObj.KineticLaw.ParameterVariableNames = ntpconsname; + + + + % add the termination reactions if nargin < 6 error('the number of argument should be at least 6, not %d',nargin); elseif nargin > 6 extraSpecies = varargin{6}; - % processing the extraSpecies + % processing the extraSpecies, like activators, inducers etc. extraStr = extraSpecies{1}; for k=2:size(extraSpecies,2) extraStr = [extraStr ' + ' extraSpecies{k}]; end - % consumption reaction in the extra species case - txtl_addreaction(tube,['[CUTP:AGTP:' RNAPbound '] -> ' RNAPbound],... - 'MassAction',NTPConsumptionRate); - - txtl_addreaction(tube,['[' RNAPbound_term '] -> ' RNAP ' + ' dna.Name ' + ' extraStr],... - 'MassAction',{'TXTL_RNAPBOUND_TERMINATION_RATE', tube.UserData.ReactionConfig.RNAPbound_termination_rate}); + + % the termination reaction parameter is reaction scoped, and can be + % globalized with the globalize_params function. + txtl_addreaction(tube,['[' RNAPbound_term '] -> '... + RNAP ' + ' dna.Name ' + ' extraStr],... + 'MassAction',... + {'TXTL_RNAPBOUND_TERMINATION_RATE', ... + tube.UserData.ReactionConfig.RNAPbound_termination_rate}); else - % consumption reaction - txtl_addreaction(tube,['[CUTP:AGTP:' RNAPbound '] -> ' RNAPbound],... - 'MassAction',NTPConsumptionRate); - %termination reaction txtl_addreaction(tube,['[' RNAPbound_term '] -> ' RNAP ' + ' dna.Name],... - 'MassAction',{'TXTL_RNAPBOUND_TERMINATION_RATE', tube.UserData.ReactionConfig.RNAPbound_termination_rate}); + 'MassAction',... + {'TXTL_RNAPBOUND_TERMINATION_RATE', ... + tube.UserData.ReactionConfig.RNAPbound_termination_rate}); end % define the nucleotide binding parameters - NTPparameters = {'TXTL_NTP_RNAP_F', tube.UserData.ReactionConfig.NTP_Forward_1; - 'TXTL_NTP_RNAP_R', tube.UserData.ReactionConfig.NTP_Reverse_1}; - NTPparameters_fast = {'TXTL_NTP_RNAP_F', tube.UserData.ReactionConfig.NTP_Forward_2; - 'TXTL_NTP_RNAP_R', tube.UserData.ReactionConfig.NTP_Reverse_2}; + NTPparameters_step1 = {'TXTL_NTP_RNAP_1_F', tube.UserData.ReactionConfig.NTP_Forward_1; + 'TXTL_NTP_RNAP_1_R', tube.UserData.ReactionConfig.NTP_Reverse_1}; + NTPparameters_step2 = {'TXTL_NTP_RNAP_2_F', tube.UserData.ReactionConfig.NTP_Forward_2; + 'TXTL_NTP_RNAP_2_R', tube.UserData.ReactionConfig.NTP_Reverse_2}; % add the nucleotide binding reaction txtl_addreaction(tube,['[' RNAPbound '] + AGTP <-> [AGTP:' RNAPbound ']'],... - 'MassAction',NTPparameters_fast); + 'MassAction',NTPparameters_step1); txtl_addreaction(tube,['[' RNAPbound '] + CUTP <-> [CUTP:' RNAPbound ']'],... - 'MassAction',NTPparameters_fast); + 'MassAction',NTPparameters_step1); txtl_addreaction(tube,['[AGTP:' RNAPbound '] + CUTP <-> [CUTP:AGTP:' RNAPbound ']'],... - 'MassAction',NTPparameters); + 'MassAction',NTPparameters_step2); txtl_addreaction(tube,['[CUTP:' RNAPbound '] + AGTP <-> [CUTP:AGTP:' RNAPbound ']'],... - 'MassAction',NTPparameters); + 'MassAction',NTPparameters_step2); + - % add the actual transcription reaction - txtl_addreaction(tube,transcriptionEq,'MassAction',{'TXTL_transcription_rate1',ktx}); %%%%%%%%%%%%%%%%%%% DRIVER MODE: error handling %%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/core/txtl_translation.m b/core/txtl_translation.m index 3c3a4a6..6373f6c 100644 --- a/core/txtl_translation.m +++ b/core/txtl_translation.m @@ -4,17 +4,17 @@ % Redistribution and use in source and binary forms, with or without % modification, are permitted provided that the following conditions are % met: -% +% % 1. Redistributions of source code must retain the above copyright % notice, this list of conditions and the following disclaimer. % -% 2. Redistributions in binary form must reproduce the above copyright -% notice, this list of conditions and the following disclaimer in the +% 2. Redistributions in binary form must reproduce the above copyright +% notice, this list of conditions and the following disclaimer in the % documentation and/or other materials provided with the distribution. % -% 3. The name of the author may not be used to endorse or promote products +% 3. The name of the author may not be used to endorse or promote products % derived from this software without specific prior written permission. -% +% % THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR % IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED % WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE @@ -35,80 +35,116 @@ function txtl_translation(mode, tube, dna, rna, protein, Ribobound) %%%%%%%%%%%%%%%%%%% DRIVER MODE: Setup Species %%%%%%%%%%%%%%%%%%%%%%%%%%%% if strcmp(mode.add_dna_driver, 'Setup Species') - % Set up the species for translation - Ribobound_term = ['term_' Ribobound.Name ]; + % Set up the species for translation + Ribobound_term = ['term_' Ribobound.Name ]; coreSpecies = {'AA',['AA:2AGTP:' Ribobound.Name],Ribobound_term, 'Ribo'}; % empty cellarray for amount => zero amount txtl_addspecies(tube, coreSpecies, cell(1,size(coreSpecies,2)), 'Internal'); -%%%%%%%%%%%%%%%%%%% DRIVER MODE: Setup Reactions %%%%%%%%%%%%%%%%%%%%%%%%%% + %%%%%%%%%%%%%%%%%%% DRIVER MODE: Setup Reactions %%%%%%%%%%%%%%%%%%%%%%%%%% elseif strcmp(mode.add_dna_driver, 'Setup Reactions') + %% Resource binding AAparameters = {'TL_AA_F',tube.UserData.ReactionConfig.TL_AA_Forward; - 'TL_AA_R',tube.UserData.ReactionConfig.TL_AA_Reverse}; + 'TL_AA_R',tube.UserData.ReactionConfig.TL_AA_Reverse}; AGTPparameters = {'TL_AGTP_F',tube.UserData.ReactionConfig.TL_AGTP_Forward; - 'TL_AGTP_R',tube.UserData.ReactionConfig.TL_AGTP_Reverse}; - - % translation rate - ktlExpression = strrep(tube.UserData.ReactionConfig.Translation_Rate,... - 'Protein_Length','protein.UserData'); - ktl_rbs = eval(ktlExpression); - - % define termination complex. - Ribobound_term = ['term_' Ribobound.Name ]; + 'TL_AGTP_R',tube.UserData.ReactionConfig.TL_AGTP_Reverse}; + + % resource binding + txtl_addreaction(tube, ... + ['[' Ribobound.Name '] + AA <-> [AA:' Ribobound.Name ']'],... + 'MassAction',AAparameters); + txtl_addreaction(tube, ... + ['[AA:' Ribobound.Name '] + AGTP <-> [AA:AGTP:' Ribobound.Name ']'],... + 'MassAction',AGTPparameters); + + + %% create the rules for the global parameters for the TL reaction and the consumption reaction + + % add global elongation parameter + tlglob = sbioselect(tube, 'Name', 'TL_elong_glob', 'Type', 'Parameter'); + if isempty(tlglob) + addparameter(tube, 'TL_elong_glob',tube.UserData.ReactionConfig.Translation_Rate); + end + + % grab the name strings for the protein + temp = regexp(protein.name, 'protein (\w*)', 'tokens'); + protspec = [temp{1}{1}]; + + % use this sting to name the translation and consumption reactions + tlparamname = ['TL_translation_' protspec]; + resourceconsname = ['TL_REScons_' protspec]; + + % get the protein length, which decides what the actual protein production + % rate is + proteinlength = round(protein.UserData); % this is in amino acids not nucleotides. + + % add the transcription parameter in the model scope, with the length + % adjusted value. The value specified here actually does not matter + % because we will use a rule to set it. + addparameter(tube, tlparamname,0); - % AA consumption models - if tube.UserData.ReactionConfig.AAmodel == 1 - % multimolecular binding, bad idea - aacnt = floor(protein.UserData/100); % get number of K amino acids - if (aacnt == 0) - aastr = ''; - else - aastr = int2str(aacnt); - end + % add the aa consumption parameter in the global scope. the value it is + % initialized to does not matter, since a rule will set it. + addparameter(tube, resourceconsname, 0); + + % now we actually add the rule that sets the translation rate and the + % aa consumption rate. + ruleStr = [tlparamname ' = TL_elong_glob/' num2str(proteinlength)]; + if isempty(sbioselect(tube,'Type','Rule', 'Rule', ruleStr)) + addrule(tube, ruleStr, 'initialAssignment'); + end + + % do the same for the consumption reactions + ruleStr = [resourceconsname ... + ' = TL_elong_glob/' num2str(proteinlength) '*(' num2str(proteinlength) '-1)']; + if isempty(sbioselect(tube,'Type','Rule', 'Rule', ruleStr)) + addrule(tube, ruleStr, 'initialAssignment'); + end + + + Ribobound_term = ['term_' Ribobound.Name ]; + % add the reaction + if isfield(tube.UserData, 'energymode') && strcmp(tube.UserData.energymode, 'regeneration') + + reactionObj = addreaction(tube, ... + ['[AA:AGTP:' Ribobound.Name '] -> ' Ribobound_term ' + ' protein.Name ' + AGMP' ]); + addkineticlaw (reactionObj, 'MassAction'); + reactionObj.KineticLaw.ParameterVariableNames = tlparamname; + + % add the consumption reactions. + reactionObj = addreaction(tube, ... + ['[AA:AGTP:' Ribobound.Name '] -> ' Ribobound_term ' + AGMP']); + addkineticlaw (reactionObj, 'MassAction'); + reactionObj.KineticLaw.ParameterVariableNames = resourceconsname; + - txtl_addreaction(tube,... - ['[' Ribobound.Name '] + ' aastr ' AA <-> [AA:' Ribobound.Name ']'],... - 'MassAction',AAparameters); else - % consumption reaction usage, a much better method. - % resource binding - txtl_addreaction(tube, ... - ['[' Ribobound.Name '] + AA <-> [AA:' Ribobound.Name ']'],... - 'MassAction',AAparameters); - txtl_addreaction(tube, ... - ['[AA:' Ribobound.Name '] + 2 AGTP <-> [AA:2AGTP:' Ribobound.Name ']'],... - 'MassAction',AGTPparameters); + reactionObj = addreaction(tube, ... + ['[AA:AGTP:' Ribobound.Name '] -> ' Ribobound_term ' + ' protein.Name ]); + addkineticlaw (reactionObj, 'MassAction'); + reactionObj.KineticLaw.ParameterVariableNames = tlparamname; - % consumption reaction - aacnt = floor(protein.UserData/100); - aa_consump_rate = (aacnt-1)*ktl_rbs; - txtl_addreaction(tube, ... - ['[AA:2AGTP:' Ribobound.Name '] -> ' Ribobound_term],... - 'MassAction',{'TXTL_TL_AA_consumption',aa_consump_rate}); + % add the consumption reactions. + reactionObj = addreaction(tube, ... + ['[AA:AGTP:' Ribobound.Name '] -> ' Ribobound_term]); + addkineticlaw (reactionObj, 'MassAction'); + reactionObj.KineticLaw.ParameterVariableNames = resourceconsname; end - % Translation (creation of protein and termination complex) - txtl_addreaction(tube, ... - ['[AA:2AGTP:' Ribobound.Name '] -> ' Ribobound_term ' + ' protein.Name ],... - 'MassAction',{'TXTL_TL_rate',ktl_rbs}); - % translation termination reaction - txtl_addreaction(tube,['[' Ribobound_term '] -> ' rna.Name ' + Ribo'],... - 'MassAction',{'TXTL_RIBOBOUND_TERMINATION_RATE', tube.UserData.ReactionConfig.Ribobound_termination_rate}); - % !TODO add these parameters to the config files and the parameter class + %%%%% - % old translation -% txtl_addreaction(tube, ... -% ['[AA:AGTP:' Ribobound.Name '] -> ' rna.Name ' + ' protein.Name ' + Ribo'],... -% 'MassAction',{'TXTL_TL_rate',ktl_rbs}); -%%%%%%%%%%%%%%%%%%% DRIVER MODE: error handling %%%%%%%%%%%%%%%%%%%%%%%%%%% + % translation termination reaction + txtl_addreaction(tube,['[' Ribobound_term '] -> ' rna.Name ' + Ribo'],... + 'MassAction',{'TXTL_RIBOBOUND_TERMINATION_RATE', tube.UserData.ReactionConfig.Ribobound_termination_rate}); + %%%%%%%%%%%%%%%%%%% DRIVER MODE: error handling %%%%%%%%%%%%%%%%%%%%%%%%%%% else error('txtltoolbox:txtl_translation:undefinedmode', ... - 'The possible modes are ''Setup Species'' and ''Setup Reactions''.'); -end - + 'The possible modes are ''Setup Species'' and ''Setup Reactions''.'); +end + end \ No newline at end of file diff --git a/cov_extract2.jpg b/cov_extract2.jpg new file mode 100644 index 0000000..e4af601 Binary files /dev/null and b/cov_extract2.jpg differ diff --git a/doc/extractTODOlist b/doc/extractTODOlist deleted file mode 100755 index f6c1b10..0000000 --- a/doc/extractTODOlist +++ /dev/null @@ -1,53 +0,0 @@ -#!/bin/sh -# m http://www.linuxweblog.com/bash-argument-numbers-check -EXPECTED_ARGS=1 -E_BADARGS=65 -if [ $# -gt $EXPECTED_ARGS ] -then - echo "Usage: ./extract [starting_directory]" >&2 - exit $E_BADARGS -fi - -# By default, start in the current working directory, but if they provide -# an argument, use that instead. -if [ $# -eq $EXPECTED_ARGS ] -then - startingDir=$1 -else - startingDir="." -fi - -# Start creating the HTML document -echo "" -echo "" -echo "" - -# The output of the find command will look like -# ./Telephone.java:20: // todo: Document - -find $startingDir -name "*.m" -exec grep -Hin TODO {} + | -# Allows the script to read in piped in arguments -while read data; do - - # The location of the file is the first argument - fileLoc=`echo "$data" | cut -d ":" -f 1` - fileName=`basename $fileLoc` - - # the line number is the second - lineNumber=`echo "$data" | cut -d ":" -f 2` - - # all arguments after the second colon are the comment. Eliminate the TODO - # text with a simple find and replace. - # Note: only handles todo and TODO, would need some more logic to handle other cases - comment=`echo "$data" | cut -d ":" -f 3- | sed -e 's/^[ ]*//' -e 's/[\/*]*[ ]*//' -e 's/TODO/todo/' -e 's/todo[:]*[ ]*//'` - echo "" - echo " " - echo " " - echo "" -done - -# Finish off the HTML document -echo "
LocationComment
$fileName ($lineNumber)$comment
" -echo "" - -exit 0 diff --git a/doc/txtl_template.m b/doc/txtl_template.m deleted file mode 100644 index 909085c..0000000 --- a/doc/txtl_template.m +++ /dev/null @@ -1,44 +0,0 @@ -% txtl_runsim_events.m - template file for MATLAB functions - -% -% This file is a template that includes the boilerplate for creating a -% MATLAB function with all of the BSD licensing information at the top. - - -% -% Copyright (c) 2012 by California Institute of Technology -% All rights reserved. -% -% Redistribution and use in source and binary forms, with or without -% modification, are permitted provided that the following conditions are -% met: -% -% 1. Redistributions of source code must retain the above copyright -% notice, this list of conditions and the following disclaimer. -% -% 2. Redistributions in binary form must reproduce the above copyright -% notice, this list of conditions and the following disclaimer in the -% documentation and/or other materials provided with the distribution. -% -% 3. The name of the author may not be used to endorse or promote products -% derived from this software without specific prior written permission. -% -% THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR -% IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED -% WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -% DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, -% INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES -% (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -% SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -% HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, -% STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING -% IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -% POSSIBILITY OF SUCH DAMAGE. - - - - -% Automatically use MATLAB mode in Emacs (keep at end of file) -% Local variables: -% mode: matlab -% End: diff --git a/examples/geneexpr.m b/examples/geneexpr.m index 609bf46..c038cd9 100644 --- a/examples/geneexpr.m +++ b/examples/geneexpr.m @@ -14,10 +14,7 @@ tube3 = txtl_newtube('gene_expression'); % Define the DNA strands (defines TX-TL species + reactions) -dna_deGFP = txtl_add_dna(tube3, ... - 'p70(50)', 'utr1(20)', 'deGFP(1000)', ... % promoter, rbs, gene - 30, ... % concentration (nM) - 'plasmid'); % type +dna_deGFP = txtl_add_dna(tube3, 'p70(50)', 'utr1(20)', 'deGFP(1000)', 30, 'plasmid'); % type % Mix the contents of the individual tubes Mobj = txtl_combine([tube1, tube2, tube3]); diff --git a/examples/geneexpr_test_regen_mode.m b/examples/geneexpr_test_regen_mode.m new file mode 100644 index 0000000..2e90577 --- /dev/null +++ b/examples/geneexpr_test_regen_mode.m @@ -0,0 +1,42 @@ +% geneexpr.m - basic gene expression reaction +% R. M. Murray, 9 Sep 2012 +% +% This file contains a simple example of setting up a TXTL simulation +% for gene expression using the standard TXTL control plasmid. +% +% Set up the standard TXTL tubes +% These load up the RNAP, Ribosome and degradation enzyme concentrations +close all +clear all + +tube1 = txtl_extract('Emcmc2018'); +tube2 = txtl_buffer('Emcmc2018'); +% Now set up a tube that will contain our DNA +tube3 = txtl_newtube('gene_expression'); +% Define the DNA strands (defines TX-TL species + reactions) +dna_deGFP = txtl_add_dna(tube3, 'p70(50)', 'utr1(20)', 'deGFP(1000)', 30,... + 'plasmid'); % type +% Mix the contents of the individual tubes +Mobj = txtl_combine([tube1, tube2, tube3]); +% create a regeneration mode field +Mobj.UserData.energymode = 'regeneration'; +% +% Run a simulaton +% +% At this point, the entire experiment is set up and loaded into 'Mobj'. +% So now we just use standard Simbiology and MATLAB commands to run +% and plot our results! +% +% tau = sbioselect(Mobj, 'Name', 'AGTPdeg_time', 'Type', 'Parameter') ; +% tau.Value = 3600*4; +% Mobj.UserData.energymode = 'regeneration'; +tic +[simData] = txtl_runsim(Mobj,14*60*60); +toc +% plot the result +txtl_plot(simData,Mobj); + +% Automatically use matlab mode in emacs (keep at end of file) +% Local variables: +% mode: matlab +% End: diff --git a/html/txtl_tutorial.html b/html/txtl_tutorial.html index 21f0cf2..61af71c 100644 --- a/html/txtl_tutorial.html +++ b/html/txtl_tutorial.html @@ -6,7 +6,7 @@ TXTL Tutorial

TXTL Tutorial

txtl_tutorial.m - basic usage of the TXTL modeling toolbox Vipul Singhal, 28 July 2017

This file contains a simple tutorial of the TXTL modeling toolbox. You will learn about setting up a negative autoregulation circuit, simulating it, plotting the results, creating variations of the circuit, and understanding the object structure of the models.

Contents

Initializing the toolbox

Use this command to add the subdirectories needed to your matlab path. To be run each time you begin a new TXTL toolbox session.

txtl_init;
-

Negative Autoregulation - A simple example

Here we demonstrate the setup of a genetic circuit where a transcription factor represses its own expression.

Set up the standard TXTL tubes These load up the RNAP, Ribosome and degradation enzyme concentrations ``E30VNPRL'' refers to a configuration file

tube1 = txtl_extract('E30VNPRL');
+  

TXTL Tutorial

txtl_tutorial.m - basic usage of the TXTL modeling toolbox

Vipul Singhal, 28 July 2017

This file contains a simple tutorial of the TXTL modeling toolbox. You will learn about setting up a negative autoregulation circuit, simulating it, plotting the results, creating variations of the circuit, and understanding the object structure of the models.

Contents

Initializing the toolbox

Use this command to add the subdirectories needed to your matlab path. To be run each time you begin a new TXTL toolbox session.

txtl_init;
+

Negative Autoregulation - A simple example

Here we demonstrate the setup of a genetic circuit where a transcription factor represses its own expression.

Set up the standard TXTL tubes These load up the RNAP, Ribosome and degradation enzyme concentrations ``E30VNPRL'' refers to a configuration file

tube1 = txtl_extract('E30VNPRL');
 tube2 = txtl_buffer('E30VNPRL');
 
 % Now set up a tube that will contain our DNA
 tube3 = txtl_newtube('gene_expression');
 
 % Define the DNA strands, and all the relevant reactions
-txtl_add_dna(tube3, 'ptet(50)', 'rbs(20)', 'tetR(1200)', 1, 'plasmid');
-txtl_add_dna(tube3, 'ptet(50)', 'rbs(20)', 'deGFP(1000)', 1, 'plasmid');
+txtl_add_dna(tube3, 'ptet(50)', 'utr1(20)', 'tetR(1200)', 1, 'plasmid');
+txtl_add_dna(tube3, 'ptet(50)', 'utr1(20)', 'deGFP(1000)', 1, 'plasmid');
 
 % Mix the contents of the individual tubes
 Mobj = txtl_combine([tube1, tube2, tube3]);
@@ -93,90 +93,28 @@
 toc
 t_ode = simData.Time;
 x_ode = simData.Data;
-
Elapsed time is 1.134961 seconds.
-

plot the result

The following function plots the proteins, RNA and resources in the toolbox. In the next section we delve deeper into the object oriented structure of the model, and how to plot arbitrary species in the model.

txtl_plot(simData,Mobj);
-
Current plot held
-

Model Structure

The model is organized as a model object, with sub objects specifying Parameters, Reactions, Species, etc. Type in

Mobj
-
-   SimBiology Model - mix_of_E30VNPRL_E30VNPRL_gene_expression 
-
-   Model Components:
-     Compartments:      1
-     Events:            2
-     Parameters:        73
-     Reactions:         47
-     Rules:             0
-     Species:           43
+
Error using SimBiology.Model/addparameter
+NAME TX_elong_glob is being used by another SimBiology parameter object. Specify a different NAME. Type 'help SimBiology.Model.addparameter' for more information.
 
-

There is one comaprtment, 2 events, 73 parameters, 47 Reactions, no rules and 43 Species in the toolbox. We can explore further by typing, for example,

Mobj.Species
-
-   SimBiology Species Array
-
-   Index:    Compartment:    Name:                                     InitialAmount:    InitialAmountUnits:
-   1         contents        RNAP                                      100               
-   2         contents        protein sigma70                           35                
-   3         contents        protein sigma28                           20                
-   4         contents        Ribo                                      50                
-   5         contents        RNAP70                                    0                 
-   6         contents        RNase                                     100               
-   7         contents        AGTP                                      3.18005e+06       
-   8         contents        CUTP                                      1.90803e+06       
-   9         contents        AA                                        3.18005e+07       
-   10        contents        protein tetR                              0                 
-   11        contents        aTc                                       0                 
-   12        contents        protein tetRdimer                         0                 
-   13        contents        RNA rbs--tetR                             0                 
-   14        contents        Ribo:RNA rbs--tetR                        0                 
-   15        contents        DNA ptet--rbs--tetR                       1                 
-   16        contents        RNAP70:DNA ptet--rbs--tetR                0                 
-   17        contents        CUTP:AGTP:RNAP70:DNA ptet--rbs--tetR      0                 
-   18        contents        term_RNAP70:DNA ptet--rbs--tetR           0                 
-   19        contents        AA:AGTP:Ribo:RNA rbs--tetR                0                 
-   20        contents        protein deGFP                             0                 
-   21        contents        protein deGFP*                            0                 
-   22        contents        RNA rbs--deGFP                            0                 
-   23        contents        Ribo:RNA rbs--deGFP                       0                 
-   24        contents        DNA ptet--rbs--deGFP                      1                 
-   25        contents        RNAP70:DNA ptet--rbs--deGFP               0                 
-   26        contents        CUTP:AGTP:RNAP70:DNA ptet--rbs--deGFP     0                 
-   27        contents        term_RNAP70:DNA ptet--rbs--deGFP          0                 
-   28        contents        AA:AGTP:Ribo:RNA rbs--deGFP               0                 
-   29        contents        RNAP28                                    0                 
-   30        contents        2 aTc:protein tetRdimer                   0                 
-   31        contents        AGTP:RNAP70:DNA ptet--rbs--tetR           0                 
-   32        contents        CUTP:RNAP70:DNA ptet--rbs--tetR           0                 
-   33        contents        DNA ptet--rbs--tetR:protein tetRdimer     0                 
-   34        contents        RNA rbs--tetR:RNase                       0                 
-   35        contents        AA:AGTP:Ribo:RNA rbs--tetR:RNase          0                 
-   36        contents        Ribo:RNA rbs--tetR:RNase                  0                 
-   37        contents        AGTP:RNAP70:DNA ptet--rbs--deGFP          0                 
-   38        contents        CUTP:RNAP70:DNA ptet--rbs--deGFP          0                 
-   39        contents        DNA ptet--rbs--deGFP:protein tetRdimer    0                 
-   40        contents        RNA rbs--deGFP:RNase                      0                 
-   41        contents        AA:AGTP:Ribo:RNA rbs--deGFP:RNase         0                 
-   42        contents        Ribo:RNA rbs--deGFP:RNase                 0                 
-   43        contents        AGTP_UNUSE                                0                 
+Error in txtl_transcription (line 70)
+    addparameter(tube, 'TX_elong_glob',tube.Userdata.ReactionConfig.Transcription_Rate);
 
-

We see that there are 43 species in the model, and they have somewhat different syntax for specification. Proteins, RNA and DNA generally follow the convention protein CDS, RNA 5'UTR--CDS, DNA promoter--5' UTR--CDS, with variations possible. There are also simply named `core' species like RNAP, Ribo, RNase, etc. Finally we denote bound complexes with a colon, for example, Species 1:Species 2.

We also see that each of them has certain other associated properties. You can explore further by accessing individual species using their index, and using the `get' and `set' commands to get and set the properties of the species. For example, try typing

Mobj.Species(1)
-
-   SimBiology Species Array
+Error in txtl_prom_ptet (line 92)
+    txtl_transcription(mode, tube, dna, rna, RNAP, RNAPbound);
 
-   Index:    Compartment:    Name:    InitialAmount:    InitialAmountUnits:
-   1         contents        RNAP     100               
+Error in txtl_add_dna (line 203)
+        eval(['txtl_prom_' promoterName '(mode, tube, dna, rna, listOfSpecies,prom_spec, utr_spec, cds_spec)']);
 
-

This gives you the first species in the model. You can find out what properties as associated with this species by typing in

get(Mobj.Species(1))
-
            Annotation: ''
-     BoundaryCondition: 0
-        ConstantAmount: 0
-         InitialAmount: 100.0001
-    InitialAmountUnits: ''
-                  Name: 'RNAP'
-                 Notes: ''
-                Parent: [1x1 SimBiology.Compartment]
-                   Tag: ''
-                  Type: 'species'
-              UserData: []
+Error in txtl_runsim (line 161)
+        txtl_add_dna(modelObj, m.DNAinfo{i}{1}, m.DNAinfo{i}{2}, ...
 
+Error in txtl_tutorial (line 45)
+[simData] = txtl_runsim(Mobj,14*60*60);
+

plot the result

The following function plots the proteins, RNA and resources in the toolbox. In the next section we delve deeper into the object oriented structure of the model, and how to plot arbitrary species in the model.

txtl_plot(simData,Mobj);
+

Model Structure

The model is organized as a model object, with sub objects specifying Parameters, Reactions, Species, etc. Type in

Mobj
+

There is one comaprtment, 2 events, 73 parameters, 47 Reactions, no rules and 43 Species in the toolbox. We can explore further by typing, for example,

Mobj.Species
+

We see that there are 43 species in the model, and they have somewhat different syntax for specification. Proteins, RNA and DNA generally follow the convention protein CDS, RNA 5'UTR--CDS, DNA promoter--5' UTR--CDS, with variations possible. There are also simply named `core' species like RNAP, Ribo, RNase, etc. Finally we denote bound complexes with a colon, for example, Species 1:Species 2.

We also see that each of them has certain other associated properties. You can explore further by accessing individual species using their index, and using the `get' and `set' commands to get and set the properties of the species. For example, try typing

Mobj.Species(1)
+

This gives you the first species in the model. You can find out what properties as associated with this species by typing in

get(Mobj.Species(1))
 

and then using the set command to set its initial concentration to 50 units:

set(Mobj.Species(1), 'InitialAmount', 50)
 

Learn more about the get and set commands by typing in

help get
 help set
@@ -186,7 +124,7 @@
 Mobj.Reactions(1).ReactionRate
 Mobj.Reactions(1).KineticLaw
 get(Mobj.Reactions(1).KineticLaw)
-

and so on.

Plotting individual species

You can also plot the trajectories of any of the species in the model. Use the function findspecies to get the index of the species object of interest. For example, if you want to plot the trajectory of the dimerized tetR protein, you could type in

tetRindex = findspecies(Mobj, 'protein tetRdimer');
+

and so on.

Plotting individual species

You can also plot the trajectories of any of the species in the model. Use the function findspecies to get the index of the species object of interest. For example, if you want to plot the trajectory of the dimerized tetR protein, you could type in

tetRindex = findspecies(Mobj, 'protein tetRdimer');
 figure
 plot(simData.Time/3600, simData.data(:,tetRindex));
 title('Dimerized tetR concentration')
@@ -195,10 +133,11 @@
 curraxis = axis;
 axis([curraxis(1:2) 0 curraxis(4)])
 %
-

EMACS editor support (ignore)

Automatically use matlab mode in emacs (keep at end of file) Local variables: mode: matlab End:

+% master vector -- paramMaps --> +% full parameter vector for each topo-geom pair -- orderingIx --> +% reordered vector for exported model simulation. +% +% param ranges: reduced param ranges matrix (by sematicGroups) +% compute initial parameter distributons +% then expand in the same way as above. +% once the parameters have been estimated, there is no need to +% reorder them, since the master vector was never reordered. +% can use the master_info.estNames for the names and +% master_info.mastervector(~master_info.fixedParams) for the +% parameter values. +% sematic + + +mcmc_info = struct('runsim_info', runsim_info, ... + 'model_info', model_info,... + 'master_info', master_info); + +end \ No newline at end of file diff --git a/mcmc_simbio/models_and_supporting_files/mcmc_info_vnprl2011_mrna.m b/mcmc_simbio/models_and_supporting_files/mcmc_info_vnprl2011_mrna.m new file mode 100644 index 0000000..af78ea8 --- /dev/null +++ b/mcmc_simbio/models_and_supporting_files/mcmc_info_vnprl2011_mrna.m @@ -0,0 +1,250 @@ +function mcmc_info = mcmc_info_vnprl2011_mrna(modelObj) + % version2 mcmc_info struct. Compatible with multimodel parameter inference. + % This mcmc info has two linked estimation problems: + % 1) transcription estimation + % 2) RNA degradation + % + % There is a second file in this series, mcmc_info_vnprl2011_protein, + % that tries to fit the protein data, with the mRNA parameters fixed to + % a few values found in this estimation. + % + % Finally there is a third file in this series that starts from all the + % parameters estimated in both the first and second estimations, and + % tries to fit all the parameters to all the data simultaneously within + % a relatively narrow range around the pre-fit parameters. + % + % Copyright (c) 2018, Vipul Singhal, Caltech + % Permission is hereby granted, free of charge, to any person obtaining a copy + % of this software and associated documentation files (the "Software"), to deal + % in the Software without restriction, including without limitation the rights + % to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + % copies of the Software, and to permit persons to whom the Software is + % furnished to do so, subject to the following conditions: + + % The above copyright notice and this permission notice shall be included in all + % copies or substantial portions of the Software. + + % THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + % IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + % FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + % AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + % LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + % OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + % SOFTWARE. + + % User readable description of the circuit. Will be used in the log file generated + % from the MCMC inference procedure. + circuitInfo1 = ... + ['This is a simple constitutive gene expression model \n'... + 'built using the TXTL modeling toolbox. It models DNA binding \n'... + 'to RNAP and nucleotides, followed by transcription. The resulting\n'... + 'mRNA can degrade and participate in translation. The former is \n'... + 'modeled as a enzymatic reaction involving every complex containing \n'... + 'mRNA. The latter involves binding to Ribosomes, followed by amino acids \n'... + 'and ATP, and finally elongation and termination resulting in protein.']; + + circuitInfo2 = ... + ['This is simple enzymatic rna degradation. Note that here mRNA can \n'... + 'to ribosomes and other species, but these offer no protection. '... + 'from rna degradation. \n ']; + + + + %{ + % activeNames has the mRNA parameters and the protein parameters. + % first half (up to RNase) are TX and the rest are TL. + % TX params are fixed from previous sims. + % + % ordering requirements: + % ensure that the following two orderings match up: + % activeNames(orderingIX) == masterVector(paramMaps(orderingIX)) + % + % ie, activeNames == masterVector(paramMaps) + % + % This gets satisfied when two conditions hold: + % + % The fixed parameters in the master vector must be arranged to that + % for every paramMap and every corresponding activeNames list, the + % fixed params subset of the elements gets mapped correctly. + % + % for the estimaed parameters, again, the estimated parameters need to + % populate the master vector in a way such that the condition + % activeNames == masterVector(paramMaps) holds for all the activeNames + % arrays (each topology will have one), and for each paramMap column + % (geometry) for each topology. + % + % + % of course, the masterVector is built as follows: + % masterVector(estparamIX) == logp + % masterVector(fixedParams) == [marray(:, end-2); marray(:, end-1); marray(:, end);] + % + % So this is all a bit complicated... + % Basically we need to make sure that after we build master vector from + % the fixed parameters (from the previous simulations), when we access + % them using paramMaps, we get the ones corresponding to the names in + % activeNames. + %} + + % names of the parameters and species to set allow for setting in the + % exported model + %% + activeNames1 = {... + 'TX_elong_glob' 1 [0.5 10] + 'AGTPdeg_time' 7200 [1800 18000] + 'AGTPdeg_rate' 0.0002 [1e-5 1e-1] + 'TXTL_P70_RNAPbound_Kd' 12 [0.1 1000] + 'TXTL_P70_RNAPbound_F' 17 [0.1 100] + 'TXTL_RNAPBOUND_TERMINATION_RATE' 0.07 [1e-4 10] + 'TXTL_NTP_RNAP_1_Kd' 2 [0.1 1000] + 'TXTL_NTP_RNAP_1_F' 10 [0.1 100] + 'TXTL_NTP_RNAP_2_Kd' 10 [0.1 1000] + 'TXTL_NTP_RNAP_2_F' 1 [0.1 100] + 'TXTL_RNAdeg_Kd' 2000 [1 5000] + 'TXTL_RNAdeg_F' 1 [0.1 1000] + 'TXTL_RNAdeg_kc' 0.001 [1e-4 1] + 'RNAP' 30 [5 500] + 'RNase' 200 [10 1000]}; + + activeNames2 = {... + 'TXTL_RNAdeg_Kd' 2000 [1 5000] + 'TXTL_RNAdeg_F' 1 [0.1 1000] + 'TXTL_RNAdeg_kc' 0.001 [1e-4 1] + 'RNase' 200 [10 1000]}; + %% + % Names of parameters and species to actually estimate. + estParams = activeNames1(:,1); + % fixedParams vector + fixedParams = []; % none + masterVector = zeros(length(activeNames1(:,1)), 1); % log transformed. + + % paramMap is a matrix mapping the parameters in the master vector to the + % (unordered) list of parameters in the model. (obvioulsy within the code + % these parameters get ordered before they are used in the exported model) + % More precisely, Let pp = paramMap(:, 1); then masterVector(pp) is the list + % of parameters for the first geometry within that topology. + % One such matrix exists for each topology. It has dimnesions + % length(model_info(i).namesUnord) x number of geometries associated with that topo. + paramMap1 = [1:length(activeNames1(:,1))]'; + paramMap2 = [11 12 13 15]'; + + % parameter ranges (for the to-be-estimated parameters in the master + % vector) + paramRanges = log(cell2mat(activeNames1(:,3))); + + +%% next we define the dosing strategy. + dosedNames1 = {'DNA p70--utr1--deGFP'}; + dosedVals1 = [0.5 2 5 20]; + + dosedNames2 = {'RNA utr1--deGFP'}; + dosedVals2 = [37.5 75 150 200 600 700 800 900 1000]; +%% create the measured species cell array +measuredSpecies = {{'[RNA utr1--deGFP]',... + '[Ribo:RNA utr1--deGFP]',... + '[AA:2AGTP:Ribo:RNA utr1--deGFP]', ... + '[term_Ribo:RNA utr1--deGFP]',... + '[AA:Ribo:RNA utr1--deGFP]'... + '[RNA utr1--deGFP:RNase]',... + '[Ribo:RNA utr1--deGFP:RNase]',... + '[AA:2AGTP:Ribo:RNA utr1--deGFP:RNase]', ... + '[term_Ribo:RNA utr1--deGFP:RNase]',... + '[AA:Ribo:RNA utr1--deGFP:RNase]'}}; +msIx = 1; % this is the index of the measured species in the data array +% from data_dsg2014. There are two species: 1: mRNA and 2: GFP. + + +%% setup the MCMC simulation parameters +stdev = 1; % i have no idea what a good value is +tightening = 1; % i have no idea what a good value is +nW = 400; % actual: 200 - 600 ish +stepsize = 1.5; % actual: 1.1 to 4 ish +niter = 20; % actual: 2 - 30 ish, +npoints = 4e4; % actual: 2e4 to 2e5 ish (or even 1e6 of the number of +% params is small) +thinning = 10; % actual: 10 to 40 ish + +%% pull all this together into an output struct. + +runsim_info = struct('stdev', {stdev}, ... + 'tightening', {tightening}, ... + 'nW', {nW}, ... + 'stepSize', {stepsize}, ... + 'nIter', {niter}, ... + 'nPoints', {npoints}, ... + 'thinning', {thinning}, ... + 'parallel', true); + +model_info = struct(... + 'circuitInfo',{circuitInfo1, circuitInfo2},... + 'modelObj', {modelObj,modelObj},... % array of model objects (different topologies) + 'modelName', modelObj.name,...; % model names. + 'namesUnord', {activeNames1(:,1),activeNames2(:,1)}, ... % names of parameters per model, unordered. + 'paramMaps', {paramMap1, paramMap2}, ... % paramMap is a matrix mapping models to master vector. + 'dosedNames', {dosedNames1, dosedNames2},... % cell arrays of species. cell array corresponds + ... % to a model. + 'dosedVals', {dosedVals1, dosedVals2},... % matrices of dose vals + 'measuredSpecies', {measuredSpecies, measuredSpecies}, ... % cell array of cell arrays of + ... % species names. the elements of the inner + ... % cell array get summed. + 'measuredSpeciesIndex', msIx,... % maps measuredSpecies to the species in data array + 'dataToMapTo', {1, 3}); % each dataToMapTo property within an element of the + % model_info array is a vector of length # of geometries. + % data indices tell us which data set to use for each topology (model) - geometry pair + % from the data_info struct array. + + +semanticGroups = num2cell((1:length(estParams))'); +%arrayfun(@num2str, 1:10, 'UniformOutput', false); +estParamsIx = setdiff((1:length(masterVector))', fixedParams); + +%% master parameter vector, param ranges, +master_info = struct(... + 'estNames', {estParams},... + 'masterVector', {masterVector},... + 'paramRanges', {paramRanges},... % + 'fixedParams', {fixedParams},... % indexes of the fixed params (withing master vector) + 'semanticGroups', {semanticGroups});% EITHER EMPTY OR + % a cell array of vectors specifying parameter + % groupings. + % The vectors contain indices to the + % parameters in (non fixed subset of) the master + % vector that need to be grouped. + % I.e., They contain indexes of the subvector + % logp = + % master_info.mastervector(~master_info.fixedParams) + % and to the rows of the paramRanges matrix and the + % estNames cell array of strings. + % + % parameter grouping so that these parameters + % get INITIALIZED to the same values. + % + % every parameter index must show up in at least + % one group, even if that is the only parameter in + % that group. If the semanticGroups field is empty, + % then all parameters are assumed to be in their + % distinct groups. + + +% how the parameter distribution flow works: +% WALKER INITIALIZATION +% reduced master vector -- semanticGroups --> +% master vector -- paramMaps --> +% full parameter vector for each topo-geom pair -- orderingIx --> +% reordered vector for exported model simulation. +% +% param ranges: reduced param ranges matrix (by sematicGroups) +% compute initial parameter distributons +% then expand in the same way as above. +% once the parameters have been estimated, there is no need to +% reorder them, since the master vector was never reordered. +% can use the master_info.estNames for the names and +% master_info.mastervector(~master_info.fixedParams) for the +% parameter values. +% sematic + + +mcmc_info = struct('runsim_info', runsim_info, ... + 'model_info', model_info,... + 'master_info', master_info); + +end \ No newline at end of file diff --git a/mcmc_simbio/models_and_supporting_files/model_aTc_induc1.m b/mcmc_simbio/models_and_supporting_files/model_aTc_induc1.m new file mode 100644 index 0000000..327045d --- /dev/null +++ b/mcmc_simbio/models_and_supporting_files/model_aTc_induc1.m @@ -0,0 +1,141 @@ +function mobj = model_aTc_induc1 +% aTc derepression (induction) of a circuit involving +% repression with enzymatic one step protein production +% The 1 at the end of the aTc_induc1 is used because this uses +% a transcriptional model of type 1. +% +% ~~~ MODEL ~~~ +% D_T + P <-> D_T:P -> D_T + P + T +% D_G + P <-> D_G:P -> D_G + P + G +% 2 T <-> T2 +% D_G + T2 <-> D_G:T2 +% 2 aTc <-> aTc2 +% aTc2 + T2 <-> aTc2:T2 + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: +% +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. +% +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +% D_T + P <-> D_T:P -> D_T + P + T +% D_G + P <-> D_G:P -> D_G + P + G +% 2 T <-> T2 +% D_G + T2 <-> D_G:T2 +% 2 aTc <-> aTc2 +% aTc2 + T2 <-> aTc2:T2 +% + +p = inputParser; +addParameter(p, 'simtime', 8*3600); +parse(p); +p = p.Results; +%% setup model +mobj = sbiomodel('aTcInduc1'); +%% setup model reactions + +rxn = addreaction(mobj,'dT + pol <-> dT_pol'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdT','krdT'}; +addparameter(mobj, 'kfdT', 1); +addparameter(mobj, 'krdT', 60); + +rxn = addreaction(mobj,'dT_pol -> dT + pol + pT'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kcp'}; +addparameter(mobj, 'kcp', 0.12); + +rxn = addreaction(mobj,'dG + pol <-> dG_pol'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdG','krdG'}; +addparameter(mobj, 'kfdG', 0.5); +addparameter(mobj, 'krdG', 30); + +rxn = addreaction(mobj,'dG_pol -> dG + pol + pG'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kcp'}; +% already added to model. + +rxn = addreaction(mobj,'2 pT <-> pT2'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdimTet','krdimTet'}; +addparameter(mobj, 'kfdimTet', .2); +addparameter(mobj, 'krdimTet', 4); + +rxn = addreaction(mobj,'dG + pT2 <-> dG_pT2'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfseqTet','krseqTet'}; +addparameter(mobj, 'kfseqTet', .2); +addparameter(mobj, 'krseqTet', 4); + +% rxn = addreaction(mobj,'2 aTc <-> aTc2'); +% Kobj = addkineticlaw(rxn,'MassAction'); +% Kobj.ParameterVariableNames = {'kfdimaTc','krdimaTc'}; +% addparameter(mobj, 'kfdimaTc', .2); +% addparameter(mobj, 'krdimaTc', 4); + +rxn = addreaction(mobj,'aTc2 + pT2 <-> aTc2_pT2'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfseqaTc','krseqaTc'}; +addparameter(mobj, 'kfseqaTc', .2); +addparameter(mobj, 'krseqaTc', 4); + +% setup model species initial concentrations. +specie = sbioselect(mobj, 'name', 'dT'); +specie.InitialAmount = 0.4; + +specie = sbioselect(mobj, 'name', 'dG'); +specie.InitialAmount = 4; + +specie = sbioselect(mobj, 'name', 'pT'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pG'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pol'); +specie.InitialAmount = 100; + +specie = sbioselect(mobj, 'name', 'dT_pol'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'dG_pol'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pT2'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'dG_pT2'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'aTc2'); +specie.InitialAmount = 0; + +% specie = sbioselect(mobj, 'name', 'aTc'); +% specie.InitialAmount = 1000; + +specie = sbioselect(mobj, 'name', 'aTc2_pT2'); +specie.InitialAmount = 0; + +%% Run the model + +cs = getconfigset(mobj, 'active'); +set(cs, 'StopTime', p.simtime); + +sd = sbiosimulate(mobj); + +end + diff --git a/mcmc_simbio/models_and_supporting_files/model_protein3.m b/mcmc_simbio/models_and_supporting_files/model_protein3.m new file mode 100644 index 0000000..d090a3c --- /dev/null +++ b/mcmc_simbio/models_and_supporting_files/model_protein3.m @@ -0,0 +1,53 @@ +function mobj = model_protein3(varargin) +% model_protein3 Constitutive gene expression model using a single +% enzymatic step. +% +% ~~~ MODEL ~~~ +% D + pol <-> D__pol (k_f, k_r ) +% D__pol -> D + pol + protien (kc) +% + + +%% set input defaults +p = inputParser ; +addParameter(p, 'simtime', 2*3600); +parse(p); +p = p.Results; + +%% setup model +mobj = sbiomodel('expression'); + +%% setup model reactions +r1 = addreaction(mobj,'dG + pol <-> dG_pol'); +Kobj = addkineticlaw(r1,'MassAction'); +Kobj.ParameterVariableNames = {'kfdG','krdG'}; +addparameter(mobj, 'kfdG', 10); +addparameter(mobj, 'krdG', 600); + +r2 = addreaction(mobj,'dG_pol -> dG + pol + pG'); +Kobj = addkineticlaw(r2,'MassAction'); +Kobj.ParameterVariableNames = {'kcp'}; +addparameter(mobj, 'kcp', 0.012); + +% setup model species initial concentrations. +P = sbioselect(mobj, 'name', 'dG'); +P.InitialAmount = 30; + +C = sbioselect(mobj, 'name', 'pol'); +C.InitialAmount = 100; + +E = sbioselect(mobj, 'name', 'dG_pol'); +E.InitialAmount = 0; + +S = sbioselect(mobj, 'name', 'pG'); +S.InitialAmount = 0; + +%% Run the model + +cs = getconfigset(mobj, 'active'); +set(cs, 'StopTime', p.simtime); + +sd = sbiosimulate(mobj); + +end + diff --git a/mcmc_simbio/models_and_supporting_files/model_protein5.m b/mcmc_simbio/models_and_supporting_files/model_protein5.m new file mode 100644 index 0000000..ecd2613 --- /dev/null +++ b/mcmc_simbio/models_and_supporting_files/model_protein5.m @@ -0,0 +1,116 @@ +function mobj = model_protein5 +% enzymatic mrna and protein production and first order mrna degradation +% +% ~~~ MODEL ~~~ +% D + pol <-> D__pol (k_fd, k_rd) +% D__pol -> D + pol + mrna (kcm) +% +% mrna + ribo <-> mrna__ribo (k_fm, k_rm) +% mrna__ribo <-> mrna + ribo + protein (kcp) +% +% mrna -> null (kcx) +% +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: +% +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. +% +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +p = inputParser ; +addParameter(p, 'simtime', 2*3600); +parse(p); +p = p.Results; + + +% Model Parameter values +cpol = 100; % nM +cribo = 50; %nM + +rkfdG = 10; % nM-1s-1 +rkrdG = 600; % s-1 +rkcm = 0.001; %s-1 + +rkfpG = 10; % nM-1s-1 +rkrpG = 300; % s-1 +rkcp = 1/36; + +rdel_m = log(2)/720; % 12 min half life of mrna + +%% setup model reactions + +% setup model +mobj = sbiomodel('expression'); +% GFP TXTL +rxn = addreaction(mobj,'dG + pol <-> dG_pol'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdG','krdG'}; +addparameter(mobj, 'kfdG', rkfdG); +addparameter(mobj, 'krdG', rkrdG); + +rxn = addreaction(mobj,'dG_pol -> dG + pol + mG'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kcm'}; +addparameter(mobj, 'kcm', rkcm); + +rxn = addreaction(mobj,'mG + ribo <-> mG_ribo'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfpG','krpG'}; +addparameter(mobj, 'kfpG', rkfpG); +addparameter(mobj, 'krpG', rkrpG); + +rxn = addreaction(mobj,'mG_ribo -> mG + ribo + pG'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kcp'}; +addparameter(mobj, 'kcp', rkcp); + +rxn = addreaction(mobj,'mG -> null'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'del_m'}; +addparameter(mobj, 'del_m', rdel_m); + +% setup model species initial concentrations. + +specie = sbioselect(mobj, 'name', 'dG'); +specie.InitialAmount = 30; + +specie = sbioselect(mobj, 'name', 'pG'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'mG'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pol'); +specie.InitialAmount = cpol; + +specie = sbioselect(mobj, 'name', 'ribo'); +specie.InitialAmount = cribo; + +specie = sbioselect(mobj, 'name', 'dG_pol'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'mG_ribo'); +specie.InitialAmount = 0; + + +%% Run the model + +cs = getconfigset(mobj, 'active'); +set(cs, 'StopTime', p.simtime); + +sd = sbiosimulate(mobj); + +end + diff --git a/mcmc_simbio/models_and_supporting_files/model_tetR_repression1.m b/mcmc_simbio/models_and_supporting_files/model_tetR_repression1.m new file mode 100644 index 0000000..6f0ce0e --- /dev/null +++ b/mcmc_simbio/models_and_supporting_files/model_tetR_repression1.m @@ -0,0 +1,113 @@ +function mobj = model_tetR_repression1 +% repression with enzymatic one step protein production +% +% ~~~ MODEL ~~~ +% D_T + P <-> D_T:P -> D_T + P + T +% D_G + P <-> D_G:P -> D_G + P + G +% 2 T <-> T2 +% D_G + T2 <-> D_G:T2 + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: +% +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. +% +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +% D_T + P <-> D_T:P -> D_T + P + T +% D_G + P <-> D_G:P -> D_G + P + G +% 2 T <-> T2 +% D_G + T2 <-> D_G:T2 +% +p = inputParser ; +addParameter(p, 'simtime', 2*3600); +parse(p); +p = p.Results; +%% setup model +mobj = sbiomodel('tetRepression1'); +%% setup model reactions + +rxn = addreaction(mobj,'dT + pol <-> dT_pol'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdT','krdT'}; +addparameter(mobj, 'kfdT', 10); +addparameter(mobj, 'krdT', 600); + +rxn = addreaction(mobj,'dT_pol -> dT + pol + pT'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kcp'}; +addparameter(mobj, 'kcp', 0.012); + +rxn = addreaction(mobj,'dG + pol <-> dG_pol'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdG','krdG'}; +addparameter(mobj, 'kfdG', 10); +addparameter(mobj, 'krdG', 600); + +rxn = addreaction(mobj,'dG_pol -> dG + pol + pG'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kcp'}; +% already added to model. + +rxn = addreaction(mobj,'2 pT <-> pT2'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfdimTet','krdimTet'}; +addparameter(mobj, 'kfdimTet', 20); +addparameter(mobj, 'krdimTet', 40); + +rxn = addreaction(mobj,'dG + pT2 <-> dG_pT2'); +Kobj = addkineticlaw(rxn,'MassAction'); +Kobj.ParameterVariableNames = {'kfseqTet','krseqTet'}; +addparameter(mobj, 'kfseqTet', 20); +addparameter(mobj, 'krseqTet', 40); + +% setup model species initial concentrations. +% setup model species initial concentrations. +specie = sbioselect(mobj, 'name', 'dT'); +specie.InitialAmount = 0.5; + +specie = sbioselect(mobj, 'name', 'dG'); +specie.InitialAmount = 30; + +specie = sbioselect(mobj, 'name', 'pT'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pG'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pol'); +specie.InitialAmount = 100; + +specie = sbioselect(mobj, 'name', 'dT_pol'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'dG_pol'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'pT2'); +specie.InitialAmount = 0; + +specie = sbioselect(mobj, 'name', 'dG_pT2'); +specie.InitialAmount = 0; + +%% Run the model + +cs = getconfigset(mobj, 'active'); +set(cs, 'StopTime', p.simtime); + +sd = sbiosimulate(mobj); + +end + diff --git a/mcmc_simbio/projects/.gitignore b/mcmc_simbio/projects/.gitignore new file mode 100644 index 0000000..bf7e0ef --- /dev/null +++ b/mcmc_simbio/projects/.gitignore @@ -0,0 +1,5 @@ +# git ignore file for the projects directory +/proj_*/ +/html/ + +/explore_*/ \ No newline at end of file diff --git a/mcmc_simbio/projects/proj_VNPRL_mrna.m b/mcmc_simbio/projects/proj_VNPRL_mrna.m new file mode 100644 index 0000000..474f3ef --- /dev/null +++ b/mcmc_simbio/projects/proj_VNPRL_mrna.m @@ -0,0 +1,150 @@ +%% proj_VNPRL_mrna.m +% Fitting +% +% We set up the estimation of the data from the PRL paper: +% Karzbrun, Eyal, Jonghyeon Shin, Roy H. Bar-Ziv, and Vincent Noireaux. +% "Coarse-Grained Dynamics of Protein Synthesis in a Cell-Free System." +% Physical Review Letters 106, no. 4 (January 24, 2011): 48104. +% https://doi.org/10.1103/PhysRevLett.106.048104. +% +% Some of the main conclusions of that paper were: +% - Transcriptional Elongation rate: 1 ntp/s +% - Translational Elongation rate: >4 aa/s +% - mRNA exponential decay, even when purified RNA is in excess of 200nM +% - mRNA degradation half life: 10 - 14 min +% - 30nM RNAP conc +% - 1.5nM RNAP - promoter Kd +% - Protein production linear in mRNA (TL machinery not saturated) +% - 1uM protein in 1h, And anywhere between 3 to 10 uM by the end (5 ish +% hours) +% - dp/dt|max is about 30 to 40 nM / min for proteins +% - For 30 nM of DNA, mRNA steady state is 20 - 30 nM. +% +% This mcmc has two linked estimation problems: +% 1) transcription estimation +% 2) RNA degradation +% +% We rescale the mRNA data from the paper titled: +% Gene Circuit Performance Characterization and Resource Usage in a Cell-Free “Breadboard” +% ACS Synth. Biol., 2014, 3 (6), pp 416–425, DOI: 10.1021/sb400203p, +% to make it compatible with the conclusions of the PRL paper (30nM peak +% mRNA expression), and use this rescaled data as the data to fit our models to. +% In this sense, the ACS paper serves to give a "typical" shape of mRNA expression +% and acts as a rough guide to estimate parameters for TXTL. +% +% Vipul Singhal, California Institute of Technology +% 2018 + + + +%% initialize the directory where things are stored. +[tstamp, projdir, st] = project_init; +% data_init + +%% data_info struct. +di = data_VNPRL2011; +da1 = squeeze(di(1).dataArray); +da2 = squeeze(di(2).dataArray); +ntrej = 30; +figure + +subplot(2, 2,2) % MGa +tv1hrs = di(1).timeVector(1:end-ntrej)/3600; +mga1 = squeeze(da1(1:end-ntrej,1,:)); +plot(tv1hrs, mga1); +xlabel('hours') +ylabel('MGa, nM') +title('MG aptamer/10, ACS DSG 2014') + +subplot(2, 2,1) % dgfp post interp +dgfp1 = diff(squeeze(da1(1:end-ntrej+1,2,:)))... + ./... + diff(di(1).timeVector(1:end-ntrej+1)); +plot(tv1hrs,dgfp1); +xlabel('hours') +ylabel('dgfp/dt, nM/s') +title('dgfp/dt, post interpolation') + +subplot(2, 2,3) % dgfp pre interp +tv2hrs = di(2).timeVector(1:end-ntrej)/3600; +dgfp2 = da2((1:end-ntrej),:); + +plot(tv2hrs,dgfp2); +xlabel('hours') +ylabel('dgfp/dt, nM/s') +title('dgfp/dt, pre interpolation') + + +subplot(2, 2,4) % gfp +gfp = squeeze(da1(1:end-ntrej,2,:)); +plot(tv2hrs, gfp); +xlabel('hours') +ylabel('GFP, nM') +title('GFP/1.8, ACS DSG 2014') + +close all +%% repopulate the data info structs' data array and time vector with these +% truncated-in-time data sets. (truncated to show only the data before 10 +% hours) +da1 = di(1).dataArray; +da2 = di(2).dataArray; +tv1 = di(1).timeVector; +tv2 = di(2).timeVector; + +ix1 = tv1<10*3600; +tv1 = tv1(ix1); +da1 = da1(ix1, :, :, :); +di(1).dataArray = da1; +di(1).timeVector = tv1; + +ix2 = tv2<10*3600; +tv2 = tv2(ix2); +da2 = da2(ix2, :, :, :); +di(2).dataArray = da2; +di(2).timeVector = tv2; + + +%% construct simbiology model object, and simulate with parameters +% to bring it close to what is expected from the PRL paper. + +% define a parameter info struct + + + + + +mobj = model_dsg2014; % use the same model as dsg2014. (this is just +% the basic constitutive production model) + + + + +%% %% setup the mcmc_info struct - capture all the mcmc information +% except the data and the model. +mcmc_info = mcmc_info_vnprl2011_mrna(mobj); +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + +%% set up the MCMC estimation +% mcmc20180225_220513_ID3 + marray = mcmc_get_walkers({'20180225_220513'}, {3}, projdir); +% marray_cut = mcmc_cut(marray, (1:10), flipud((mai.paramRanges)')); +% if size(marray_cut, 2) < ri.nW +% error('too few initial points'); +% elseif size(marray_cut, 2) > ri.nW +% marray_cut = marray_cut(:,1:ri.nW, :); +% end +%% + + +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'UserInitialize', marray(:,:,end)); +% 'UserInitialize', marray_cut(:,:,end) +%'InitialDistribution', 'gaussian' 'UserInitialize', marray_cut(:,:,end)); + + + + + + + diff --git a/mcmc_simbio/projects/proj_acs_dsg2014_regen_A.m b/mcmc_simbio/projects/proj_acs_dsg2014_regen_A.m new file mode 100644 index 0000000..5c3ba93 --- /dev/null +++ b/mcmc_simbio/projects/proj_acs_dsg2014_regen_A.m @@ -0,0 +1,134 @@ +function [mi,mai, ri, tstamp, projdir, di] = proj_acs_dsg2014_regen_A(varargin) +% notes: +% - the first run on this is on lambda, with timestamp simdata_20180419_212516 +% step size is 2, and stdev is 5. +% - and in parallel there is a nessa run 20180419_213851 step size is 1.05, +% and stdev is 2. +% so these two runs are a bit different. +% April 19 - 20: once these are done, compare the cornerplots to each other +% to see if the results you get are the same. This will give stationarity. +% Otherwise, we need to run longer. this is a 20 dimensional search, so not +% super trivial. need to probably do a lot of iterations. + +%% MCMC toolbox fits using ACS DSG data, with VNPRL 2011 for some extra info, +% and the new regeneration system for ATP management. There are 4 parts to +% this fitting. +% A: Fitting the mRNA degradation and mRNA transcription data +% B: Fit the protien data with the TX and degradation parameters set to 10 +% random points from A, B. +% C: Fit the major parameters from rna deg, tx and tl, (but not stuff like +% NTP binding rates, AA binding rates, AGTP deg rate etc. + +% Vipul Singhal, +% California Institute of Technology +% 2018 + +p = inputParser; +p.addParameter('prevtstamp', []); +p.addParameter('stepSize', 1.4); +p.addParameter('nW', 1000); +p.addParameter('nPoints', 4e4); +p.addParameter('thinning', 4); +p.addParameter('nIter', 3); +p.addParameter('parallel', true); +p.addParameter('stdev', 1); +p.addParameter('poolsize', []); +p.addParameter('multiplier', 2); +p.parse(varargin{:}); +p = p.Results; +%% initialize the directory where things are stored. +[tstamp, projdir, st] = project_init; +% data_init +% proj_acs_dsg2014_regen_A('nW', 6400, 'nPoints', 6400*10*20, 'nIter', 20, 'poolsize', 36, 'multiplier', 3, 'thinning', 10) +%% construct simbiology model object(s) +mobj = model_dsg2014_regen; + +%% setup the mcmc_info struct +mcmc_info = mcmc_info_dsg2014_regen_A(mobj); + +mi = mcmc_info.model_info; + +%% setup the data_info struct +di = data_dsg2014_full +% modify di to only contain the mRNA data. +% di.dataArray = di.dataArray(:, 1, :, :); % pick out only the mrna +% di.measuredNames = di.measuredNames(1); +% di.dataUnits = di.dataUnits(1); +% di.dataInfo = ['Modified to only have mRNA data. \n',... +% di.dataInfo]; + +if ~isempty(p.poolsize) + delete(gcp('nocreate')) + parpool(p.poolsize) +end + +% Run the MCMC +if ~isempty(p.stepSize) + mcmc_info.runsim_info.stepSize = p.stepSize; +end + +if ~isempty(p.nW) + mcmc_info.runsim_info.nW = p.nW; +end + +if ~isempty(p.nPoints) + mcmc_info.runsim_info.nPoints = p.nPoints; +end + +if ~isempty(p.thinning) + mcmc_info.runsim_info.thinning = p.thinning; +end + +if ~isempty(p.nIter) + mcmc_info.runsim_info.nIter = p.nIter; +end + +if ~isempty(p.parallel) + mcmc_info.runsim_info.parallel = p.parallel; +end + +if ~isempty(p.stdev) + mcmc_info.runsim_info.stdev = p.stdev; +end +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; +%% run the mcmc simulations +% prevtstamp = {'20180120_172922'}; +% simID = {'1'}; +% marray = mcmc_get_walkers(prevtstamp, {simID}, projdir); +% mtemp = marray(:,:); +if isempty(p.prevtstamp) +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'InitialDistribution', 'LHS', 'multiplier', p.multiplier); +else + + specificprojdir = [projdir '/simdata_' p.prevtstamp]; + + % load mcmc_info and the updated model_info + SS = load([specificprojdir '/full_variable_set_' p.prevtstamp], 'mcmc_info'); + + marray = mcmc_get_walkers({p.prevtstamp}, {SS.mcmc_info.runsim_info.nIter},... + projdir); + % assume the projdir where this data is stored is the same one as the + % one created at the start of this file + + + pID = 1:length(mai.estNames); + marray_cut = mcmc_cut(marray, pID, flipud((mai.paramRanges)')); + if size(marray_cut, 2) < ri.nW + error('too few initial points'); + elseif size(marray_cut, 2) > ri.nW + marray_cut = marray_cut(:,1:ri.nW, :); + end + + mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'UserInitialize', marray_cut(:,:,end), 'multiplier', p.multiplier); + + +% +% [mi,mai, ri, tstamp, projdir, di] = proj_acs_dsg2014_regen_A(... +% 'stepSize', 1.05, 'nW', 400, 'nPoints', 4e4, 'thinning', 20,... +% 'nIter', 80, 'parallel', true, 'multiplier', 2, 'stdev', 2); + +end + diff --git a/mcmc_simbio/projects/proj_mcmc_tutorial.m b/mcmc_simbio/projects/proj_mcmc_tutorial.m new file mode 100644 index 0000000..103c425 --- /dev/null +++ b/mcmc_simbio/projects/proj_mcmc_tutorial.m @@ -0,0 +1,121 @@ +%% First tutorial file for the mcmc_simbio package +% proj_mcmc_tutorial.m - Basic toturial of the mcmc_simbio package +% demonstrating the estimation of parameters for a constitutive gene +% expression circuit modeled as a single enzymatic reaction. + +%% Initializing the toolbox +% If you have not already initialized the txtlsim and mcmc_simbio +% toolboxes, initialize them by running the txtl_init and mcmc_init +% commands in the command line. You need your working directory to be the +% main directory where the txtlsim toolbox is stored (i.e., the directory +% in which directories like core, components, mcmc_simbio etc are stored). + +%% Run project_init +% This creates a directory within the projects directory where +% the results of the simulation will be stored. The name of the directory +% will be the same as the name of this file (proj_mcmc_tutorial, in this +% case). it also creates a timestamped subdirectory within this directory, +% where the actual results are stored. If the top level directory +% (proj_mcmc_tutorial) already exists, then only the subdirectory is +% created. +[tstamp, projdir, st] = project_init; + +%% Define the MATLAB Simbiology model +% We use the file model_protein3.m to define a constitutive gene expression +% model using a single enzymatic step. The reactions and species that it +% sets up are: +% +% dG + pol <-> dG_pol (k_f, k_r) +% dG_pol -> dG + pol + pG (kc) +mobj = model_protein3; + +% The species of the model can be visualized as follows: +mobj.species + +% The reactions may be visualized as +mobj.reactions +% For more on MATLAB Simbiology, see the Simbiology +% page. + +%% Defining the experiment / model arrangement. +% We can define the experimental setup and how it related to data, the +% Simbiology model and the estimation problem using what we call an +% mcmc_info struct. For this example, we will be using an +% mcmc_info_constgfp3i.m file to generate the mcmc_info struct that we need +% to define our parameter inference problem. +% Please enter 'help mcmc_info' into the command window prompt to read more +% about this struct. Also, open the mcmc_info_constgfp3i file (edit +% mcmc_info_constgfp3i) to view how it is set up. + +mcmc_info = mcmc_info_constgfp3i(mobj); + +%% Creating artificial data to fit the model to. +% Instead of using real data, we will create artificial data for +% demonstration purposes. We will use the data_artificial_v2 fucntion to do +% this. + +% Get the model_info struct needed to generate the artificial data +mi = mcmc_info.model_info; + +% A list of nominal parameter values to use to generate the data. +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkcp1 = 0.012; %s-1 +cpol1 = 100; % nM + +% Arrange the parameters in a log transformed vector. +masterVector = log([rkfdG + rkrdG + rkcp1 + cpol1]); + +% Supply the experimental setup information to the data_artificial_v2 +% function so that it can generate the data_info struct that contains the +% artificial data. +% type 'help data_artificial_v2' into the command window prompt to read +% more about this function. For our purposes we simply note that we need to +% specify our Simbiology model object, a set of timepoints to report the +% output trajectories for, the list of measured species' names for our +% model, the list of dosed species' names, the matrix of dosed values, the +% names of the species and parameters to set values for in the model +% (namesUnord), and the non-log-transformed values as a vector. All of +% these arguments must be encapsulated in cells. + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector), [exp(masterVector(1:end-2)); 0.024; 200]}); + +da_extract1 = di(1).dataArray; +tv = di(1).timeVector; + +%% Plot the artificial data +% we can plot the data using the mcmc_trajectories function. See its help +% file for usage information. +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + +%% Run the MCMC +ri = mcmc_info.runsim_info; + +mai = mcmc_info.master_info; + +mi1 = mcmc_runsim_v2(tstamp, projdir, di(1), mcmc_info,... + 'InitialDistribution', 'LHS', 'multiplier', 2); % 'InitialDistribution', 'gaussian' + +%% plot stuff +tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +mcmc_plot(marray, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse); +titls = {'dna 10'; 'dna 30';'dna 60'}; +lgds = {}; +mvarray1 = masterVecArray(marray, mai); +marrayOrd = mvarray1(mi1.paramMaps(mi1.orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi1.emo, di(1), mi1, marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse); + + +% Vipul Singhal, +% California Institute of Technology +% 2018 diff --git a/mcmc_simbio/projects/proj_mcmc_tutorial_II.m b/mcmc_simbio/projects/proj_mcmc_tutorial_II.m new file mode 100644 index 0000000..d323d68 --- /dev/null +++ b/mcmc_simbio/projects/proj_mcmc_tutorial_II.m @@ -0,0 +1,191 @@ +%% Second tutorial for the mcmc_simbio package +% proj_mcmc_tutorial_II.m - tutorial of the mcmc_simbio package +% demonstrating the estimation of parameters for a constitutive gene +% expression circuit modeled as a single enzymatic reaction. In this +% example, we demonstrate the concurrent parameter +% estimation capability, where we have the constitutive expression circuit +% in two environments, with each environment having its own environment +% specific parameters (ESPs), while the circuit having a single set of +% circuit specific parameters. + +%% Initializing the toolbox +% If you have not already initialized the txtlsim and mcmc_simbio +% toolboxes, initialize them by running the txtl_init and mcmc_init +% commands in the command line. You need your working directory to be the +% main directory where the txtlsim toolbox is stored (i.e., the directory +% in which directories like core, components, mcmc_simbio etc are stored). + +%% Run project_init +% This creates a directory within the projects directory where +% the results of the simulation will be stored. The name of the directory +% will be the same as the name of this file (proj_mcmc_tutorial_II, in this +% case). it also creates a timestamped subdirectory within this directory, +% where the actual results are stored. If the top level directory +% (proj_mcmc_tutorial_II) already exists, then only the subdirectory is +% created. +delete(gcp('nocreate')) +parpool(48) +[tstamp, projdir, st] = project_init; + +%% Define the MATLAB Simbiology model +% We use the file model_protein3.m to define a constitutive gene expression +% model using a single enzymatic step. The reactions and species that it +% sets up are: +% +% dG + pol <-> dG_pol (k_f, k_r) +% dG_pol -> dG + pol + pG (kc) + +mobj = model_protein3; + +% The species of the model can be visualized as follows: +mobj.species + +% The reactions may be visualized as +mobj.reactions +% For more on MATLAB Simbiology, see the Simbiology +% page. + +%% Defining the experiment / model arrangement. +% We can define the experimental setup and how it related to data, the +% Simbiology model and the estimation problem using what we call an +% mcmc_info struct. For this example, we will be using an +% mcmc_info_constgfp3ii.m file to generate the mcmc_info struct that we need +% to define our parameter inference problem. +% Please enter 'help mcmc_info' into the command window prompt to read more +% about this struct. Also, open the mcmc_info_constgfp3ii file (enter 'edit +% mcmc_info_constgfp3ii' into the command prompt) to learn how it is set up. + +mcmc_info = mcmc_info_constgfp3ii(mobj); + +%% Creating artificial data to fit the model to. +% Instead of using real data, we will create artificial data for +% demonstration purposes. We will use the data_artificial_v2 fucntion to do +% this. + +% Get the model_info struct needed to generate the artificial data +mi = mcmc_info.model_info; + +% A list of nominal parameter values to use to generate the data. +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkcp1 = 0.012; %s-1 +rkcp2 = 0.024; %s-1 +cpol1 = 100; % nM +cpol2 = 200; % nM + +% Arrange the parameters in a log transformed vector. +masterVector = log([... +rkfdG +rkrdG +rkcp1 +rkcp2 +cpol1 +cpol2]); + +% Supply the experimental setup information to the data_artificial_v2 +% function so that it can generate the data_info struct that contains the +% artificial data. +% type 'help data_artificial_v2' into the command window prompt to read +% more about this function. For our purposes we simply note that we need to +% specify our Simbiology model object, a set of timepoints to report the +% output trajectories for, the list of measured species' names for our +% model, the list of dosed species' names, the matrix of dosed values, the +% names of the species and parameters to set values for in the model +% (namesUnord), and the non-log-transformed values as a vector. All of +% these arguments must be encapsulated in cells. + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies},... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:2 3 5])), exp(masterVector([1:2 4 6]))}); + + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; + +%% Plot the artificial data +% we can plot the data using the mcmc_trajectories function. See its help +% file for usage information. +%mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + +%% Run the MCMC +ri = mcmc_info.runsim_info; + +mai = mcmc_info.master_info; + + + + specificprojdir = [projdir '/simdata_' '20190131_050421']; + + % load mcmc_info and the updated model_info + SS = load([specificprojdir '/full_variable_set_20190131_050421'], 'mcmc_info'); + + marray = mcmc_get_walkers({'20190131_050421'}, {9},... + projdir); + % assume the projdir where this data is stored is the same one as the + % one created at the start of this file + + + pID = 1:length(mai.estNames); + marray_cut = mcmc_cut(marray, pID, flipud((mai.paramRanges)')); + if size(marray_cut, 2) < ri.nW + error('too few initial points'); + elseif size(marray_cut, 2) > ri.nW + marray_cut = marray_cut(:,1:ri.nW, :); + end + + +% now run the simulation. +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'UserInitialize', marray_cut(:,:,end), 'multiplier', 2,... + 'pausemode', false); + + + + +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'InitialDistribution', 'LHS', 'multiplier', 2,... + 'pausemode', false); +% 'InitialDistribution', 'gaussian' +% 'UserInitialize', marray_cut(:,:,end) + +%% plot stuff + + +% load([pwd '/mcmc_simbio/projects/proj_mcmc_tutorial_II/'... +% 'simdata_20180802_221756/full_variable_set_20180802_221756']) +%% +% di = data_info +% tstamptouse = tstamp; +% marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +% mcmc_plot(marray([1 2 4], :,:), mai.estNames([1 2 4]),... +% 'savematlabfig', true, 'savejpeg', true,... +% 'projdir', projdir, 'tstamp', tstamptouse,... +% 'extrafignamestring', '_extract1'); +% % +% figure +% mcmc_plot(marray([1 3 5], :,:), mai.estNames([1 3 5]),... +% 'savematlabfig', true, 'savejpeg', true,... +% 'projdir', projdir, 'tstamp', tstamptouse,... +% 'extrafignamestring', '_extract2'); +% titls = {'E1 dG 10';'E1 dG 30';'E1 dG 60';}; +% lgds = {}; +% mvarray = masterVecArray(marray, mai); +% marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +% fhandle = mcmc_trajectories(mi(1).emo, di(1), mi(1), marrayOrd,... +% titls, lgds,... +% 'SimMode', 'meanstd', 'savematlabfig', true, 'savejpeg', true,... +% 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... +% '_extract1'); +% marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +% titls = {'E2 dG 10';'E2 dG 30';'E2 dG 60';}; +% fhandle = mcmc_trajectories(mi(1).emo, di(2), mi(1), marrayOrd,... +% titls, lgds,... +% 'SimMode', 'meanstd', 'savematlabfig', true, 'savejpeg', true,... +% 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... +% '_extract2'); + +% Vipul Singhal, +% California Institute of Technology +% 2018 diff --git a/mcmc_simbio/projects/proj_mcmc_tutorial_III.m b/mcmc_simbio/projects/proj_mcmc_tutorial_III.m new file mode 100644 index 0000000..abe192d --- /dev/null +++ b/mcmc_simbio/projects/proj_mcmc_tutorial_III.m @@ -0,0 +1,217 @@ +%% Second tutorial for the mcmc_simbio package +% proj_mcmc_tutorial_III.m - tutorial of the mcmc_simbio package +% demonstrating the estimation of parameters shared between two different +% circuits. In the language of the concurrent parameter inference +% problem (type 'help mcmc_info' into the command prompt window to +% read more), we say that there are two +% network topologies, with each topology having one geometry associated +% with it. +% +% This example demonstrates a slightly more complex example of the +% concurrence feature of the mcmc_simbio Bayesian parameter inference +% toolbox. Here, we have two circuits: the constitutive gene expression +% circuit with model +% +% D + pol <-> D__pol (kfdG, krdG) +% D__pol -> D + pol + protien (kcp) +% +% and the tetR repression circuit with model +% +% D_T + P <-> D_T:P -> D_T + P + T (kfdT, krdT; kcp) +% D_G + P <-> D_G:P -> D_G + P + G (kfdG, krdG; kcp) +% 2 T <-> T2 (kfdimTet, krdimTet) +% D_G + T2 <-> D_G:T2 (kfseqTet, krseqTet) +% +% Here each model is a different topology with only one +% geometry associated with it. The paramters that are shared +% between the two topologies are: kfdG, krdG, kcp, pol. The remaining +% parameters are specific to the topology-geometry pair they appear in (in +% this case all the remaining parameters appear in the tetR repression +% circuit topology). Furthermore, we set the forward rate parameters in +% all the reversible reaction to be fixed parameters, and therefore only +% estimate the reverse rate parameters. + +%% Initializing the toolbox +% If you have not already initialized the txtlsim and mcmc_simbio +% toolboxes, initialize them by running the txtl_init and mcmc_init +% commands in the command line. You need your working directory to be the +% main directory where the txtlsim toolbox is stored (i.e., the directory +% in which directories like core, components, mcmc_simbio etc are stored). + +%% Run project_init +% This creates a directory within the projects directory where +% the results of the simulation will be stored. The name of the directory +% will be the same as the name of this file (proj_mcmc_tutorial_III, in this +% case). it also creates a timestamped subdirectory within this directory, +% where the actual results are stored. If the top level directory +% (proj_mcmc_tutorial_III) already exists, then only the subdirectory is +% created. +delete(gcp('nocreate')); +parpool(48); +[tstamp, projdir, st] = project_init; + +%% Define the MATLAB Simbiology model +% We use the file model_protein3.m to define a constitutive gene expression +% model using a single enzymatic step. +m_constgfp = model_protein3; +m_tetRrep = model_tetR_repression1; + +%% Defining the experiment / model arrangement. +% We can define the experimental setup and how it is related to data, the +% Simbiology model and the estimation problem using what we call an +% mcmc_info struct. For this example, we will be using an +% mcmc_info_constgfp3tetR1.m file to generate the mcmc_info struct that we need +% to define our parameter inference problem. +% Please enter 'help mcmc_info' into the command window prompt to read more +% about this struct. Also, open the mcmc_info_constgfp3tetR1 file (enter 'edit +% mcmc_info_constgfp3tetR1' into the command prompt) to learn how it is set up. +mcmc_info = mcmc_info_constgfp3tetR1(m_constgfp, m_tetRrep); + +%% Creating artificial data to fit the model to. +% Instead of using real data, we will create artificial data for +% demonstration purposes. We will use the data_artificial_v2 fucntion to do +% this. + +% Get the model_info struct needed to generate the artificial data +mi = mcmc_info.model_info; + +% A list of nominal parameter values to use to generate the data. + +cpol = 100; % nM +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkfdT = 5; +rkrdT = 300; +rkcp = 0.012; %s-1 +rkfdimTet = 20; % nM-1s-1 +rkrdimTet = 10; % s-1 +rkfseqTet = 20; % nM-1s-1 +rkrseqTet = 10; % s-1 + +% Arrange the parameters in a log transformed vector. +masterVector = log([... +rkfdG +rkrdG +rkfdT +rkrdT +rkfdimTet +rkrdimTet +rkfseqTet +rkrseqTet +rkcp +cpol]); + +% Supply the experimental setup information to the data_artificial_v2 +% function so that it can generate the data_info struct that contains the +% artificial data. +% type 'help data_artificial_v2' into the command window prompt to read +% more about this function. For our purposes we simply note that we need to +% specify our Simbiology model object, a set of timepoints to report the +% output trajectories for, the list of measured species' names for our +% model, the list of dosed species' names, the matrix of dosed values, the +% names of the species and parameters to set values for in the model +% (namesUnord), and the non-log-transformed values as a vector. All of +% these arguments must be encapsulated in cells. + +di = data_artificial_v2(... + {m_constgfp, m_tetRrep},... % the two model objects + {0:180:7200, 0:180:7200},... % time vectors for the two data sets + {mi(1).measuredSpecies, mi(2).measuredSpecies},... + ... % measured species setup in mcmc_info.model_info + {mi(1).dosedNames, mi(2).dosedNames},... % dosed species + {mi(1).dosedVals, mi(2).dosedVals,},... % dosing values + {mi(1).namesUnord, mi(2).namesUnord},... + ... % names of species and parameters to set in each model + {exp(masterVector([1 2 9 10])), exp(masterVector)}); + % values to use for the names in namesUnord. + + +da_constgfp = di(1).dataArray; +da_tetRrep = di(2).dataArray; +tv = di(1).timeVector; + +%% Plot the artificial data +% we can plot the data using the mcmc_trajectories function. See its help +% file for usage information. +%mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + +%% Run the MCMC +ri = mcmc_info.runsim_info; + +mai = mcmc_info.master_info; + + + +% get info from a previous run to initialize. + + + + specificprojdir = [projdir '/simdata_' '20190131_064508']; + + % load mcmc_info and the updated model_info + SS = load([specificprojdir '/full_variable_set_20190131_064508'], 'mcmc_info'); + + marray = mcmc_get_walkers({'20190131_064508'}, {SS.mcmc_info.runsim_info.nIter},... + projdir); + % assume the projdir where this data is stored is the same one as the + % one created at the start of this file + + + pID = 1:length(mai.estNames); + marray_cut = mcmc_cut(marray, pID, flipud((mai.paramRanges)')); + if size(marray_cut, 2) < ri.nW + error('too few initial points'); + elseif size(marray_cut, 2) > ri.nW + marray_cut = marray_cut(:,1:ri.nW, :); + end + + +% now run the simulation. +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'UserInitialize', marray_cut(:,:,end), 'multiplier', 2,... + 'pausemode', false); +% +% mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... +% 'InitialDistribution','UserInitialize', marray_cut(:,:,end), 2,... +% 'pausemode', false); +% 'InitialDistribution', 'gaussian' +% +%% plot stuff +% +% These functions simply generate some standard plots from the data that is +% saved in the timestamped subdirectory of the directory specified in +% projdir. You can open that directory to view the results, including a log +% file. + +tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); + +% plot parameter distribution corner plot, and markov chains. +mcmc_plot(marray, mai.estNames,... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse,... + 'extrafignamestring', '_tutorialIII'); + +% plot individual trajectories of the data and the model fits for both +% models. +titls = {'dG 10';'dG 30';'dG 60';}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(1), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_contgfp'); +marrayOrd = mvarray(mi(2).paramMaps(mi(2).orderingIx, 1),:,:); +titls = {'dG 10 dT 0.1';'dG 30 dT 0.1';'dG 10 dT 2';'dG 30 dT 2';... + 'dG 10 dT 8';'dG 30 dT 8';}; +fhandle = mcmc_trajectories(mi(2).emo, di(2), mi(2), marrayOrd,... + titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_tetRrep'); + +% Vipul Singhal, +% California Institute of Technology +% 2018 diff --git a/mcmc_simbio/projects/proj_protein_constgfp3i.m b/mcmc_simbio/projects/proj_protein_constgfp3i.m new file mode 100644 index 0000000..6ea5532 --- /dev/null +++ b/mcmc_simbio/projects/proj_protein_constgfp3i.m @@ -0,0 +1,134 @@ +%% MCMC toolbox demo - proj_protein_constgfp3i.m +% +% const gfp 3, artificial data, separate, 2 extracts. Check if the CSPs line up exactly. +% kf fixed +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. +% close all +% clear all +% clc +[tstamp1, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info struct. + +mobj = model_protein3; + +mcmc_info = mcmc_info_constgfp3i(mobj); + +mi = mcmc_info.model_info; + +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkcp1 = 0.012; %s-1 +cpol1 = 100; % nM + +masterVector = log([rkfdG + rkrdG + rkcp1 + cpol1]); + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector), [exp(masterVector(1:end-2)); 0.024; 200]}); + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; + +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + + +% Run the MCMC +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + +% marray = mcmc_get_walkers({'20180311_223247'}, {10}, projdir); +% marray_cut = mcmc_cut(marray, (1:10), flipud((mai.paramRanges)')); +% if size(marray_cut, 2) < ri.nW +% error('too few initial points'); +% elseif size(marray_cut, 2) > ri.nW +% marray_cut = marray_cut(:,1:ri.nW, :); +% end +%% + +mi1 = mcmc_runsim_v2(tstamp1, projdir, di(1), mcmc_info,... + 'InitialDistribution', 'LHS'); % 'InitialDistribution', 'gaussian' +% 'UserInitialize', marray_cut(:,:,end) + +%% plot stuff +tstamptouse = tstamp1; %'20180311_223247'; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +mcmc_plot(marray, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse); +% mcmc_plot(marray, mi1.namesUnord,'ks', true, 'scatter', false); +% mcmc_plot(marray, mi1.namesUnord,'transparency', 0.05); +titls = {'dna 10'; 'dna 30';'dna 60'}; +lgds = {}; +mvarray1 = masterVecArray(marray, mai); +marrayOrd = mvarray1(mi1.paramMaps(mi1.orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi1.emo, di(1), mi1, marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse); +% %% 3D plot +% pToPlot = [2 3 1]; +% labellist = mai.estNames; +% for plotID = 1:size(pToPlot, 1) +% mstacked = marray1(:,:)'; +% figure +% XX = mstacked(1:2:end, [pToPlot(plotID,1)]); +% YY = mstacked(1:2:end, [pToPlot(plotID,2)]); +% ZZ = mstacked(1:2:end, [pToPlot(plotID,3)]); +% scatter3(XX,YY,ZZ) +% xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) +% ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) +% zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) +% title('covariation in Extract 1', 'FontSize', 20) +% saveas(gcf, [projdir '/simdata_' tstamp1 '/3dfig_ext1_' num2str(plotID) '_' tstamp1]); +% end +% +% +% mi2 = mcmc_runsim_v2(tstamp2, projdir, di(2), mcmc_info,... +% 'InitialDistribution', 'LHS'); % 'InitialDistribution', 'gaussian' +% % 'UserInitialize', marray_cut(:,:,end) +% +% %% plot stuff +% tstamptouse = tstamp2; %'20180311_223247'; +% marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +% mcmc_plot(marray, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... +% 'projdir', projdir, 'tstamp', tstamptouse); +% % mcmc_plot(marray, mi1.namesUnord,'ks', true, 'scatter', false); +% % mcmc_plot(marray, mi1.namesUnord,'transparency', 0.05); +% titls = {'dna 10'; 'dna 30';'dna 60'}; +% lgds = {}; +% mvarray1 = masterVecArray(marray, mai); +% marrayOrd = mvarray2(mi1.paramMaps(mi1.orderingIx, 1),:,:); +% fhandle = mcmc_trajectories(mi2.emo, di(2), mi2, marrayOrd, titls, lgds,... +% 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... +% 'projdir', projdir, 'tstamp', tstamptouse); +% %% 3D plot +% pToPlot = [2 3 1]; +% labellist = mai.estNames; +% for plotID = 1:size(pToPlot, 1) +% mstacked = marray2(:,:)'; +% figure +% XX = mstacked(1:2:end, [pToPlot(plotID,1)]); +% YY = mstacked(1:2:end, [pToPlot(plotID,2)]); +% ZZ = mstacked(1:2:end, [pToPlot(plotID,3)]); +% scatter3(XX,YY,ZZ) +% xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) +% ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) +% zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) +% title('covariation in Extract 2', 'FontSize', 20) +% saveas(gcf, [projdir '/simdata_' tstamptouse '/3dfig_ext2_' num2str(plotID) '_' tstamptouse]); +% end + +% marray = mcmc_get_walkers({'20180311_224651'}, {10}, projdir); +% marray_cut = mcmc_cut(marray, (1:10), flipud((mai.paramRanges)')); +% if size(marray_cut, 2) < ri.nW +% error('too few initial points'); +% elseif size(marray_cut, 2) > ri.nW +% marray_cut = marray_cut(:,1:ri.nW, :); +% end diff --git a/mcmc_simbio/projects/proj_protein_constgfp3ii.m b/mcmc_simbio/projects/proj_protein_constgfp3ii.m new file mode 100644 index 0000000..2af6ff2 --- /dev/null +++ b/mcmc_simbio/projects/proj_protein_constgfp3ii.m @@ -0,0 +1,150 @@ +%% MCMC toolbox demo - proj_protein_constgfp3i.m +% +% const gfp 3, artificial data, separate, 2 extracts. Check if the +% CSPs line up exactly. +% +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. +% close all +% clear all +% clc +[tstamp, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info +% struct. + +mobj = model_protein3; + +mcmc_info = mcmc_info_constgfp3ii(mobj); + +mi = mcmc_info.model_info; + + +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkcp1 = 0.012; %s-1 +rkcp2 = 0.024; %s-1 +cpol1 = 100; % nM +cpol2 = 200; % nM + + +masterVector = log([... +rkfdG +rkrdG +rkcp1 +rkcp2 +cpol1 +cpol2]); + +% supply parameter vectors to this function to generate simulated +% data. +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies},... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:2 3 5])), exp(masterVector([1:2 4 6]))}); + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; + +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + +% Run the MCMC +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; +%% +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'InitialDistribution', 'LHS'); + +%% plot stuff +tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +mcmc_plot(marray([1 2 4], :,:), mai.estNames([1 2 4]),... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse,... + 'extrafignamestring', '_extract1'); +figure +mcmc_plot(marray([1 3 5], :,:), mai.estNames([1 3 5]),... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse,... + 'extrafignamestring', '_extract2'); +titls = {'E1 dG 10';'E1 dG 30';'E1 dG 60';}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(1), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_extract1'); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +titls = {'E2 dG 10';'E2 dG 30';'E2 dG 60';}; +fhandle = mcmc_trajectories(mi(1).emo, di(2), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_extract2'); + +mstacked = marray(:,:)'; + + + +pToPlot = [ 2 4 1 ;]; +% CSP on the vertical axis to conform to schematics in presentations. + +labellist = mai.estNames; +for plotID = 1:size(pToPlot, 1) + figure + XX = mstacked(1:end, [pToPlot(plotID,1)]); + YY = mstacked(1:end, [pToPlot(plotID,2)]); + ZZ = mstacked(1:end, [pToPlot(plotID,3)]); + scatter3(XX,YY,ZZ) + xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) + ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) + zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) + title('covariation in Extract 1', 'FontSize', 20) + saveas(gcf, [projdir '/simdata_' tstamptouse '/3dfig_ext1_'... + num2str(plotID) '_' tstamptouse]); +end +% +pToPlot = [3 5 1]; +for plotID = 1:size(pToPlot, 1) + figure + XX = mstacked(1:end, [pToPlot(plotID,1)]); + YY = mstacked(1:end, [pToPlot(plotID,2)]); + ZZ = mstacked(1:end, [pToPlot(plotID,3)]); + scatter3(XX,YY,ZZ) + xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) + ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) + zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) + title('covariation in Extract 2', 'FontSize', 20) + saveas(gcf, [projdir '/simdata_' tstamptouse '/3dfig_ext2_'... + num2str(plotID) '_' tstamptouse]); +end + + + + + + + + +titls = {'dna 1'; 'dna 2';'dna 5'}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(1), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_extract1'); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(2), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_extract2'); + +%% \ No newline at end of file diff --git a/mcmc_simbio/projects/proj_protein_constgfp3ii_function.m b/mcmc_simbio/projects/proj_protein_constgfp3ii_function.m new file mode 100644 index 0000000..9db43ac --- /dev/null +++ b/mcmc_simbio/projects/proj_protein_constgfp3ii_function.m @@ -0,0 +1,132 @@ +function [mi,mai, ri, tstamp, projdir, di] = proj_protein_constgfp3ii_linux(varargin) + +%% MCMC toolbox demo - proj_protein_constgfp3i.m +% +% const gfp 3, artificial data, separate, 2 extracts. Check if the CSPs line up exactly. +% +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. +% close all +% clear all +% clc +p = inputParser; +p.addOptional('prevtstamp', []); +p.addParameter('stepSize', []); +p.addParameter('nW', []); +p.addParameter('nPoints', []); +p.addParameter('thinning', []); +p.addParameter('nIter', []); +p.addParameter('parallel', []); +p.addParameter('stdev', []); + +p.addParameter('multiplier', 1); +p.parse(varargin{:}); +p = p.Results; + +[tstamp, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info struct. + +mobj = model_protein3; + +mcmc_info = mcmc_info_constgfp3ii(mobj); + +mi = mcmc_info.model_info; + + +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkcp1 = 0.012; %s-1 +rkcp2 = 0.024; %s-1 +cpol1 = 100; % nM +cpol2 = 200; % nM + + +masterVector = log([... +rkfdG +rkrdG +rkcp1 +rkcp2 +cpol1 +cpol2]); + +% supply parameter vectors to this function to generate simulated data. +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:2 3 5])), exp(masterVector([1:2 4 6]))}); + +% Run the MCMC +if ~isempty(p.stepSize) + mcmc_info.runsim_info.stepSize = p.stepSize; +end + +if ~isempty(p.nW) + mcmc_info.runsim_info.nW = p.nW; +end + +if ~isempty(p.nPoints) + mcmc_info.runsim_info.nPoints = p.nPoints; +end + +if ~isempty(p.thinning) + mcmc_info.runsim_info.thinning = p.thinning; +end + +if ~isempty(p.nIter) + mcmc_info.runsim_info.nIter = p.nIter; +end + +if ~isempty(p.parallel) + mcmc_info.runsim_info.parallel = p.parallel; +end + +if ~isempty(p.stdev) + mcmc_info.runsim_info.stdev = p.stdev; +end + +%% +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + + +if isempty(p.prevtstamp) + mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'InitialDistribution', 'LHS', 'multiplier', p.multiplier); +else + + specificprojdir = [projdir '/simdata_' p.prevtstamp]; + + % load mcmc_info and the updated model_info + SS = load([specificprojdir '/full_variable_set_' p.prevtstamp], 'mcmc_info'); + + marray = mcmc_get_walkers({p.prevtstamp}, {SS.mcmc_info.runsim_info.nIter},... + projdir); + % assume the projdir where this data is stored is the same one as the + % one created at the start of this file + + + pID = 1:length(mai.estNames); + marray_cut = mcmc_cut(marray, pID, flipud((mai.paramRanges)')); + if size(marray_cut, 2) < ri.nW + error('too few initial points'); + elseif size(marray_cut, 2) > ri.nW + marray_cut = marray_cut(:,1:ri.nW, :); + end + mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'UserInitialize', marray_cut(:,:,end), 'multiplier', p.multiplier); +end + + % keep this commented unless using to copy paste into the linux server + % window +% [mi,mai, ri, tstamp, projdir, di] = proj_protein_constgfp3ii_linux(... +% 'prevtstamp', '20180402_160013',... +% 'stepSize', 1.01, 'nW', 400, 'nPoints', 2e4, 'thinning', 30,... +% 'nIter', 80, 'parallel', true, 'multiplier', 2, 'stdev', 5); +end + + + + diff --git a/mcmc_simbio/projects/proj_protein_constgfp5i.m b/mcmc_simbio/projects/proj_protein_constgfp5i.m new file mode 100644 index 0000000..6b87cf3 --- /dev/null +++ b/mcmc_simbio/projects/proj_protein_constgfp5i.m @@ -0,0 +1,109 @@ +%% MCMC toolbox demo - proj_protein_constgfp3iv.m +% +% const gfp 3, +% artificial data, +% separate, 2 extracts. +% Check if the CSPs line up exactly. +% kf fixed. +% +% This file was modeled after the file proj_protein_constgfp3i.m. +% +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. + close all +% clear all +% clc +[tstamp1, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info struct. +% for the two extracts +mobj = model_protein5; +mcmc_info = mcmc_info_constgfp5ii(mobj); +mi = mcmc_info.model_info; + +cpol = 100; % nM +cribo = 50; %nM + +rkfdG = 10; % nM-1s-1 +rkrdG = 600; % s-1 +rkcm = 0.001; %s-1 + +rkfpG = 10; % nM-1s-1 +rkrpG = 300; % s-1 +rkcp = 1/36; + +rdel_m = log(2)/720; % 12 min half life of mrna + + masterVector = log([... + rkfdG;rkrdG;rkfpG;rkrpG;rkcm;rkcp;rdel_m;cpol;cribo]); + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector), [exp(masterVector(1:end-5)); 2*exp(masterVector((end-4):end))]}); + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; + +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + + +%% +% Run the MCMC +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + +%% +mi1 = mcmc_runsim_v2(tstamp1, projdir, di(1), mcmc_info,... + 'InitialDistribution', 'LHS'); + +%% plot stuff +tstamptouse = tstamp1; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +mcmc_plot(marray, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +% mcmc_plot(marray, mi1.namesUnord,'ks', true, 'scatter', false); +% mcmc_plot(marray, mi1.namesUnord,'transparency', 0.05); +titls = {'E1 dG 10';'E1 dG 30';'E1 dG 60'}; +lgds = {}; +mvarray1 = masterVecArray(marray, mai); + +% paramMaps accesses the full mastervec (as opposed to just the estimated values) +% to give the full vector of (unordered) values for a model. +marrayOrd = mvarray1(mi1(1).paramMaps(mi1(1).orderingIx, 1),:,:); + +fhandle = mcmc_trajectories(mi1(1).emo, di(1), mi1(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); + +% %% 3D plot +% % the 3d saving here is trickier: 11 Choose 3 is 165. So for 2 extracts, there are 330 plots. +% % this is too many. Not going to plot any at the moment. + +%% Estimate the parameters for the second extract. +[tstamp2, projdir, st] = project_init; +mi2 = mcmc_runsim_v2(tstamp2, projdir, di(2), mcmc_info,... + 'InitialDistribution', 'LHS'); + +%% plot stuff +tstamptouse = tstamp2; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +mcmc_plot(marray, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +% mcmc_plot(marray, mi2.namesUnord,'ks', true, 'scatter', false); +% mcmc_plot(marray, mi2.namesUnord,'transparency', 0.05); +titls = {'E2 dG 10';'E2 dG 30';'E2 dG 60'}; +lgds = {}; +mvarray2 = masterVecArray(marray, mai); + +% paramMaps accesses the full mastervec (as opposed to just the estimated values) +% to give the full vector of (unordered) values for a model. +marrayOrd = mvarray2(mi2(1).paramMaps(mi2(1).orderingIx, 1),:,:); + +fhandle = mcmc_trajectories(mi2(1).emo, di(2), mi2(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract2'); + diff --git a/mcmc_simbio/projects/proj_protein_constgfp5ii.m b/mcmc_simbio/projects/proj_protein_constgfp5ii.m new file mode 100644 index 0000000..c73d422 --- /dev/null +++ b/mcmc_simbio/projects/proj_protein_constgfp5ii.m @@ -0,0 +1,98 @@ +%% MCMC toolbox demo - proj_protein_constgfp3i.m +% +% const gfp 3, artificial data, separate, 2 extracts. Check if the CSPs line up exactly. +% +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. +% close all +% clear all +% clc +[tstamp, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info struct. + +mobj = model_protein5; + +mcmc_info = mcmc_info_constgfp5ii(mobj); + +mi = mcmc_info.model_info; + + + +rkfdG = 1; % nM-1s-1 +rkrdG = 60; % s-1 + +rkfpG = 2; % nM-1s-1 +rkrpG = 60; % s-1 + +cpol1 = 100; % nM +cribo1 = 50; %nM +rkcm1 = 0.001; %s-1 +rkcp1 = 1/36; +rdel_m1 = log(2)/720; % 12 min half life of mrna +cpol2 = cpol1*2; +cribo2 = cribo1*2; +rkcm2 = rkcm1*2; +rkcp2 = rkcp1*2; +rdel_m2 = rdel_m1*2; + +masterVector = log([... + rkfdG;rkrdG;rkfpG;rkrpG;rkcm1;rkcp1;rdel_m1;cpol1;cribo1;rkcm2;... + rkcp2;rdel_m2;cpol2;cribo2]); + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:4 5:9])), exp(masterVector([1:4 10:14]))}); + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); +% Run the MCMC +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + +marray = mcmc_get_walkers({'20180322_155221'}, {20}, projdir); +pID = 1:length(mai.estNames); +marray_cut = mcmc_cut(marray, pID, flipud((mai.paramRanges)')); +if size(marray_cut, 2) < ri.nW + error('too few initial points'); +elseif size(marray_cut, 2) > ri.nW + marray_cut = marray_cut(:,1:ri.nW, :); +end +%% +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'UserInitialize', marray_cut(:,:,end), 'multiplier', 2); +% 'UserInitialize', marray_cut(:,:,end) +% 'InitialDistribution', 'LHS' + +%% plot stuff +tstamptouse = tstamp; +close all +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); + +mcmc_plot(marray([1:2 3:7], :,1:6:end), mai.estNames([1:2 3:7]),... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +figure +mcmc_plot(marray([1:2 8:12], :,1:6:end), mai.estNames([1:2 8:12]),... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract2'); + +titls = {'E1 dG 10';'E1 dG 30';'E1 dG 60'}; + +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(1), mi(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +titls = {'E2 dG 10';'E2 dG 30';'E2 dG 60'}; +fhandle = mcmc_trajectories(mi(1).emo, di(2), mi(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract2'); + diff --git a/mcmc_simbio/projects/proj_tetR1i.m b/mcmc_simbio/projects/proj_tetR1i.m new file mode 100644 index 0000000..b8bdf25 --- /dev/null +++ b/mcmc_simbio/projects/proj_tetR1i.m @@ -0,0 +1,170 @@ +%% MCMC toolbox demo - proj_protein_constgfp3iv.m +% +% const gfp 3, +% artificial data, +% separate, 2 extracts. +% Check if the CSPs line up exactly. +% kf fixed. +% +% This file was modeled after the file proj_protein_constgfp3i.m. +% +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. + close all +% clear all +% clc +[tstamp1, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info struct. +% for the two extracts +mobj = model_tetR_repression1; +mcmc_info = mcmc_info_tetR_1i(mobj); +mi = mcmc_info.model_info; + +cpol = 100; % nM +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkfdT = 5; +rkrdT = 300; +rkcp = 0.012; %s-1 +rkfdimTet = 20; % nM-1s-1 +rkrdimTet = 10; % s-1 +rkfseqTet = 20; % nM-1s-1 +rkrseqTet = 10; % s-1 + +masterVector = log([... +rkfdG +rkrdG +rkfdT +rkrdT +rkfdimTet +rkrdimTet +rkfseqTet +rkrseqTet +rkcp +cpol]); + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector), [exp(masterVector(1:end-2)); 0.024; 200]}); + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; + +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); + + +%% +% Run the MCMC +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + +%% +mi1 = mcmc_runsim_v2(tstamp1, projdir, di(1), mcmc_info,... + 'InitialDistribution', 'LHS'); + +%% plot stuff +figure +tstamptouse = tstamp1; +marray1 = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +mcmc_plot(marray1, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +% mcmc_plot(marray, mi1.namesUnord,'ks', true, 'scatter', false); +% mcmc_plot(marray, mi1.namesUnord,'transparency', 0.05); +titls = {'E1 dT 0.5 dG 10';'E1 dT 0.5 dG 30';'E1 dT 0.5 dG 60'; +'E1 dT 2 dG 10';'E1 dT 2 dG 30';'E1 dT 2 dG 60'; +'E1 dT 8 dG 10';'E1 dT 8 dG 30';'E1 dT 8 dG 60';}; +lgds = {}; +mvarray1 = masterVecArray(marray1, mai); + +% paramMaps accesses the full mastervec (as opposed to just the estimated values) +% to give the full vector of (unordered) values for a model. +marrayOrd = mvarray1(mi1(1).paramMaps(mi1(1).orderingIx, 1),:,:); + +fhandle = mcmc_trajectories(mi1(1).emo, di(1), mi1(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); + +%% 3D plot +% the 3d saving here is trickier: 6 Choose 3 is 20. So for 2 extracts, there are 40 plots. +% this is too many. I think I am going to just plot some small subset of the 3-wise plots. +% in particular, I am interested in the covariation of the two ESPs wrt the CSPs! I do not really +% care about the covariation within the CSPs. Cool realization! +pToPlot = [5 6 1; 5 6 2; 5 6 3; 5 6 4;]; +% I am also mildly curious about the covatiation of the tetR repression system parameters +% All of these are CSPs. +pToPlot = [pToPlot; 2 3 4]; + +% labellist = {'kf' 'kr' 'kc1' 'Pol1' 'kc2' 'Pol2' }; +labellist = mai.estNames; +for plotID = 1:size(pToPlot, 1) + mstacked = marray1(:,:)'; + figure + XX = mstacked(1:2:end, [pToPlot(plotID,1)]); + YY = mstacked(1:2:end, [pToPlot(plotID,2)]); + ZZ = mstacked(1:2:end, [pToPlot(plotID,3)]); + scatter3(XX,YY,ZZ) + xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) + ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) + zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) + title('covariation in Extract 1', 'FontSize', 20) + saveas(gcf, [projdir '/simdata_' tstamp1 '/3dfig_ext1_' num2str(plotID) '_' tstamp1]); +end + + +%% Estimate the parameters for the second extract. +[tstamp2, projdir, st] = project_init; +mi2 = mcmc_runsim_v2(tstamp2, projdir, di(2), mcmc_info,... + 'InitialDistribution', 'LHS'); + +%% plot stuff +tstamptouse = tstamp2; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); +marray2 = marray; +mcmc_plot(marray2, mai.estNames, 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +% mcmc_plot(marray, mi2.namesUnord,'ks', true, 'scatter', false); +% mcmc_plot(marray, mi2.namesUnord,'transparency', 0.05); +titls = {'E2 dT 0.5 dG 10';'E2 dT 0.5 dG 30';'E2 dT 0.5 dG 60'; +'E2 dT 2 dG 10';'E2 dT 2 dG 30';'E2 dT 2 dG 60'; +'E2 dT 8 dG 10';'E2 dT 8 dG 30';'E2 dT 8 dG 60';}; +lgds = {}; +mvarray2 = masterVecArray(marray2, mai); + +% paramMaps accesses the full mastervec (as opposed to just the estimated values) +% to give the full vector of (unordered) values for a model. +marrayOrd = mvarray2(mi2(1).paramMaps(mi2(1).orderingIx, 1),:,:); + +fhandle = mcmc_trajectories(mi2(1).emo, di(2), mi2(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract2'); +%% +% the 3d saving here is trickier: 6 Choose 3 is 20. So for 2 extracts, there are 40 plots. +% this is too many. I think I am going to just plot some small subset of the 3-wise plots. +% in particular, I am interested in the covariation of the two ESPs wrt the CSPs! I do not really +% care about the covariation within the CSPs. Cool realization! +pToPlot = [5 6 1; 5 6 2; 5 6 3; 5 6 4;]; + +% I am also mildly curious about the covatiation of the tetR repression system parameters +% All of these are CSPs. +pToPlot = [pToPlot; 2 3 4]; + +% labellist = {'kf' 'kr' 'kc1' 'Pol1' 'kc2' 'Pol2' }; +labellist = mai.estNames; +for plotID = 1:size(pToPlot, 1) + mstacked = marray2(:,:)'; + figure + XX = mstacked(1:2:end, [pToPlot(plotID,1)]); + YY = mstacked(1:2:end, [pToPlot(plotID,2)]); + ZZ = mstacked(1:2:end, [pToPlot(plotID,3)]); + scatter3(XX,YY,ZZ) + xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) + ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) + zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) + title('covariation in Extract 2', 'FontSize', 20) + saveas(gcf, [projdir '/simdata_' tstamp2 '/3dfig_ext2_' num2str(plotID) '_' tstamp2]); +end \ No newline at end of file diff --git a/mcmc_simbio/projects/proj_tetR1ii.m b/mcmc_simbio/projects/proj_tetR1ii.m new file mode 100644 index 0000000..79bf06b --- /dev/null +++ b/mcmc_simbio/projects/proj_tetR1ii.m @@ -0,0 +1,127 @@ +%% MCMC toolbox demo - proj_protein_constgfp3i.m +% +% const gfp 3, artificial data, separate, 2 extracts. Check if the CSPs line up exactly. +% +% Vipul Singhal, +% California Institute of Technology +% 2018 + +%% initialize the directory where things are stored. +% close all +% clear all +% clc +[tstamp, projdir, st] = project_init; + +%% We first define the model, mcmc_info struct, and the data_info struct. + +mobj = model_tetR_repression1; + +mcmc_info = mcmc_info_tetR_1ii(mobj); + +mi = mcmc_info.model_info; + +rkfdG = 5; % nM-1s-1 +rkrdG = 300; % s-1 +rkfdT = 5; +rkrdT = 300; +rkfdimTet = 20; % nM-1s-1 +rkrdimTet = 10; % s-1 +rkfseqTet = 20; % nM-1s-1 +rkrseqTet = 10; % s-1 +rkcp1 = 0.012; %s-1 +rkcp2 = 0.024; %s-1 +cpol1 = 100; % nM +cpol2 = 200; % nM +activeNames = ... + {'kfdG'; 'krdG'; 'kfdT'; 'krdT'; 'kfdimTet'; 'krdimTet'; 'kfseqTet';... + 'krseqTet'; 'kcp'; 'pol'}; + +masterVector = log([... +rkfdG ;rkrdG;rkfdT;rkrdT;rkfdimTet;rkrdimTet;rkfseqTet;rkrseqTet;rkcp1;rkcp2;cpol1;cpol2]); + +di = data_artificial_v2({mobj}, {0:180:7200}, {mi.measuredSpecies}, ... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:9 11])), exp(masterVector([1:8 10 12]))}); + +da_extract1 = di(1).dataArray; +da_extract2 = di(2).dataArray; +tv = di(1).timeVector; +mcmc_trajectories([], di, [], [], [], [], 'just_data_info', true); +% Run the MCMC +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'InitialDistribution', 'LHS'); + +%% plot stuff +close all +tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {8:ri.nIter}, projdir); +mcmc_plot(marray([1:4 5 7], :,:), mai.estNames([1:4 5 7]), 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +figure +mcmc_plot(marray([1:4 6 8], :,:), mai.estNames([1:4 6 8]), 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract2'); +titls = {'E1 dT 0.5 dG 10';'E1 dT 0.5 dG 30';'E1 dT 0.5 dG 60'; +'E1 dT 2 dG 10';'E1 dT 2 dG 30';'E1 dT 2 dG 60'; +'E1 dT 8 dG 10';'E1 dT 8 dG 30';'E1 dT 8 dG 60';}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(1), mi(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract1'); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +titls = {'E2 dT 0.5 dG 10';'E2 dT 0.5 dG 30';'E2 dT 0.5 dG 60'; +'E2 dT 2 dG 10';'E2 dT 2 dG 30';'E2 dT 2 dG 60'; +'E2 dT 8 dG 10';'E2 dT 8 dG 30';'E2 dT 8 dG 60';}; +fhandle = mcmc_trajectories(mi(1).emo, di(2), mi(1), marrayOrd, titls, lgds,... + 'SimMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring', '_extract2'); + +%% 3D plot +% the 3d saving here is trickier: 6 Choose 3 is 20. So for 2 extracts, there are 40 plots. +% this is too many. I think I am going to just plot some small subset of the 3-wise plots. +% in particular, I am interested in the covariation of the two ESPs wrt the CSPs! I do not really +% care about the covariation within the CSPs. Cool realization! + +% mstacked = mvarray(:,:)'; % <---- THIS IS WRONG! use: +mstacked = marray(:,:)'; + + +pToPlot = [ 5 7 1 ; 5 7 2 ; 5 7 3 ; 5 7 4 ;]; +% CSP on the vertical axis to conform to schematics in presentations. + +% I am also mildly curious about the covatiation of the tetR repression system parameters +% All of these are CSPs. +pToPlot = [pToPlot; 2 3 4]; +labellist = mai.estNames; +for plotID = 1:size(pToPlot, 1) + figure + XX = mstacked(1:end, [pToPlot(plotID,1)]); + YY = mstacked(1:end, [pToPlot(plotID,2)]); + ZZ = mstacked(1:end, [pToPlot(plotID,3)]); + scatter3(XX,YY,ZZ) + xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) + ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) + zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) + title('covariation in Extract 1', 'FontSize', 20) + saveas(gcf, [projdir '/simdata_' tstamptouse '/3dfig_ext1_' num2str(plotID) '_' tstamptouse]); +end +% +pToPlot = [6 8 1; 6 8 2 ; 6 8 3; 6 8 4;]; +for plotID = 1:size(pToPlot, 1) + figure + XX = mstacked(1:end, [pToPlot(plotID,1)]); + YY = mstacked(1:end, [pToPlot(plotID,2)]); + ZZ = mstacked(1:end, [pToPlot(plotID,3)]); + scatter3(XX,YY,ZZ) + xlabel(labellist{pToPlot(plotID,1)}, 'FontSize', 20) + ylabel(labellist{pToPlot(plotID,2)}, 'FontSize', 20) + zlabel(labellist{pToPlot(plotID,3)}, 'FontSize', 20) + title('covariation in Extract 2', 'FontSize', 20) + saveas(gcf, [projdir '/simdata_' tstamptouse '/3dfig_ext2_' num2str(plotID) '_' tstamptouse]); +end + + + diff --git a/mcmc_simbio/projects/proj_tierra2018_calibration.m b/mcmc_simbio/projects/proj_tierra2018_calibration.m new file mode 100644 index 0000000..42ee225 --- /dev/null +++ b/mcmc_simbio/projects/proj_tierra2018_calibration.m @@ -0,0 +1,129 @@ +% proj_tierra2018_calibration +% The calibration step for the Tierra Biosciences dataset. +% +% Script for the correction of aTc data from Tierra biosciences can be +% found in the file proj_tierra2018_correction. +% +% Overview: this demo uses the following files: +% +% +% - mcmc_info_tierra2018_calib +% Constructs the mcmc_info struct. +% +% - model_protein3.m +% Constructs a constitutive expression model. +% + +close all +[tstamp, projdir, st] = project_init; + +mobj = model_protein3; + +mcmc_info = mcmc_info_tierra2018_calib(mobj); + +mi = mcmc_info.model_info; + +%% Get experimental data +% di = data_info_tierra2018; +close all +di = tierradataset; +di(3).timeVector = di(3).timeVector(1:81); +di(4).timeVector = di(4).timeVector(1:81); +di(3).dataArray = di(3).dataArray(1:81, :,:,:)/10; +di(4).dataArray = di(4).dataArray(1:81, :,:,:)/10; + +mcmc_trajectories([], di(3:4), [], [], [], [], 'just_data_info', true); + +% Manually find a set of parameters that get you in the approximate realm of +% the experimental data (Otherwise a lot more computation is needed.. +% completely doable, but takes some time / cluster access, which I do not +% currently have). + +% manually pick parameter values +rkfdG = 0.5; % nM-1s-1 +rkrdG = 30; % s-1 +rkcp1 = 0.12; %s-1 +rkcp2 = 0.24; %s-1 +cpol1 = 4; % nM +cpol2 = 1.5; % nM + +masterVector = log([... +rkfdG +rkrdG +rkcp1 +rkcp2 +cpol1 +cpol2]); + +% simulate the data +di_artificial = data_artificial_v2({mobj}, {di(3).timeVector}, {mi.measuredSpecies},... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:2 3 5])), exp(masterVector([1:2 4 6]))}); + +da_extract1 = di_artificial(1).dataArray; +da_extract2 = di_artificial(2).dataArray; +tv = di_artificial(1).timeVector; + +% Plot the data against experimental data from the previous step. + +mcmc_trajectories([], di_artificial, [], [], [], [], 'just_data_info', true); + +%% +ri = mcmc_info.runsim_info; + +mai = mcmc_info.master_info; + +%% +% '20181105_224005'}, {1:8} +% 20181106_010736 +% latest sim: 20181106_101841, 4iters, +% + +% 20181106_141312 % 5iter +%20181106_145414 %12iter +marraytemp = mcmc_get_walkers({'20181106_145414'}, {1:12}, projdir); +% next iter NOT run yet, nov 5, 11 am. +msz=size(marraytemp); %5 46 440 + +initialization_matrix = [marraytemp(:,:,end)]; +clear marraytemp + +tic +%% +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'pausemode', true,... + 'multiplier', 4, 'UserInitialize', initialization_matrix); +%'UserInitialize', initialization_matrix +% 'InitialDistribution', 'LHS' + +toc +%% Save Stuff + +tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); + +% plot parameter distribution corner plot, and markov chains. +mcmc_plot(marray, mai.estNames,... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse,... + 'extrafignamestring', '_tierra_calib_t1'); + +%% + +titls = {'dG 1';'dG 2';'dG 4';'dG 8'}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(3), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'meanstd', 'ExpMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_tierra_calib_t1'); + +%% +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(4), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'meanstd','ExpMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_tierra_calib_t1'); \ No newline at end of file diff --git a/mcmc_simbio/projects/proj_tierra2018_calibration_B.m b/mcmc_simbio/projects/proj_tierra2018_calibration_B.m new file mode 100644 index 0000000..c93e968 --- /dev/null +++ b/mcmc_simbio/projects/proj_tierra2018_calibration_B.m @@ -0,0 +1,150 @@ +% proj_tierra2018_calibration_B +% The calibration step for the Tierra Biosciences dataset. +% Second data set that Abel sent me on Nov 6 2018. +% Here we first try to correct the tetR Repression data +% using the calibration from the pTet constututive expression data. +% +% Script for the correction of tetR data from Tierra biosciences can be +% found in the file proj_tierra2018_correction_B. +% +% Overview: this demo uses the following files: +% +% +% - mcmc_info_tierra2018_calib +% Constructs the mcmc_info struct. +% +% - model_protein3.m +% Constructs a constitutive expression model. +% + +close all +[tstamp, projdir, st] = project_init; + +mobj = model_protein3; + +mcmc_info = mcmc_info_tierra2018_calib_B(mobj); + +mi = mcmc_info.model_info; + +% Get experimental data +% di = data_info_tierra2018; +close all +di = tierradataset('dataset11062018'); +% load the constitutive expression data +di(3).timeVector = di(3).timeVector(1:41); +di(4).timeVector = di(4).timeVector(1:41); +di(3).dataArray = di(3).dataArray(1:41, :,:,:)/10; +di(4).dataArray = di(4).dataArray(1:41, :,:,:)/10; + +mcmc_trajectories([], di(3:4), [], [], [], [], 'just_data_info', true); + +% Manually find a set of parameters that get you in the approximate realm of +% the experimental data (Otherwise a lot more computation is needed.. +% completely doable, but takes some time / cluster access, which I do not +% currently have). +%% +% manually pick parameter values +rkfdG = 0.5; % nM-1s-1 +rkrdG = 30; % s-1 +rkcp1 = 0.12; %s-1 +rkcp2 = 0.24; %s-1 +cpol1 = 4; % nM +cpol2 = 1.5; % nM + +masterVector = log([... +rkfdG +rkrdG +rkcp1 +rkcp2 +cpol1 +cpol2]); + +% simulate the data +di_artificial = data_artificial_v2({mobj}, {di(3).timeVector}, {mi.measuredSpecies},... + {mi.dosedNames}, {mi.dosedVals}, {mi.namesUnord},... + {exp(masterVector([1:2 3 5])), exp(masterVector([1:2 4 6]))}); + +da_extract1 = di_artificial(1).dataArray; +da_extract2 = di_artificial(2).dataArray; +tv = di_artificial(1).timeVector; + +% Plot the data against experimental data from the previous step. + +mcmc_trajectories([], di_artificial, [], [], [], [], 'just_data_info', true); + +%% +ri = mcmc_info.runsim_info; + +mai = mcmc_info.master_info; + +%% + +% 20181110_144647, 12 iter, nW: 200, mai.paramRanges: +% -0.2125 14.7875 +% -7.1203 7.8797 +% -7.1203 7.8797 +% -4.5945 10.4055 +% -4.5945 10.4055 + +% 20181110_160929, 12 iter, nW: 200, mai.paramRanges: +% -5.2125 22.7875 +% -10.1203 7.8797 +% -10.1203 7.8797 +% -7.5945 10.4055 +% -7.5945 10.4055 + +% 20181110_192828, nIter: 6, nW = 200 +% -5.2125 22.7875 +% -10.1203 12.8797 +% -10.1203 12.8797 +% -7.5945 10.4055 +% -7.5945 10.4055 + + % 20181111_191438 nIter = 20 + % 20181111_211323 = 10 +marraytemp = mcmc_get_walkers({'20181111_211323'}, {1:10}, projdir); +msz=size(marraytemp); %5 46 440 +initialization_matrix = marraytemp(:,:,end); +clear marraytemp + +tic +%% + +mi = mcmc_runsim_v2(tstamp, projdir, di, mcmc_info,... + 'pausemode', true,... + 'multiplier', 1.7,'UserInitialize', initialization_matrix); +%'UserInitialize', initialization_matrix +% 'InitialDistribution', 'LHS' +% 'UserInitialize', initialization_matrix + +toc +%% Save Stuff + +tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {1:ri.nIter}, projdir); + +% plot parameter distribution corner plot, and markov chains. +mcmc_plot(marray, mai.estNames,... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse,... + 'extrafignamestring', '_tierra_calib_testB'); + +%% + +titls = {'dG 1';'dG 2';'dG 4';'dG 8'}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(3), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'meanstd', 'ExpMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_tierra_calib_testB'); + +%% +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 2),:,:); +fhandle = mcmc_trajectories(mi(1).emo, di(4), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'meanstd','ExpMode', 'curves', 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse, 'extrafignamestring',... + '_tierra_calib_testB'); \ No newline at end of file diff --git a/mcmc_simbio/projects/proj_tierra2018_correction1.m b/mcmc_simbio/projects/proj_tierra2018_correction1.m new file mode 100644 index 0000000..e829d82 --- /dev/null +++ b/mcmc_simbio/projects/proj_tierra2018_correction1.m @@ -0,0 +1,196 @@ +% proj_tierra2018_correction1 +% The first correction step step for the Tierra Biosciences dataset. +% We first do CSP fixing in the calibration data, and generate the Fixed ESPs +% in the calibration datasets for both E2 (for correction step 1) and E1 +% (for correction step 2). +% +% This file performs Correction step 1. +% Here we fix the ESPs in E2. +% +% Overview: this demo uses the following files: +% +% - data_info_constructor_tierra2018.m +% Constructs the data info object. +% +% - mcmc_info_tierra2018_calib +% Constructs the mcmc_info struct. +% +% - model_protein3.m +% Constructs a constitutive expression model. +% +% + +close all +clear all + + +[tstamp, projdir, st] = project_init; + +%% Get the ESP parameters form the calibration data. +% 11.6.18 +% In the calibration data, the krdG is the CSP, and so we fix its value to +% its meadian value from the simulation with ID 20181106_010736 +% with 40iter and 1.5 stepsize and tightening 1 +% (or the more fine version: 20181106_101841, stepsize 1.1, +% 4iter, and tightening 10. .) +% +% Actually Use this: 20181105_112220, 10iter. This is because +% the above ones were are very large parameter values, and I think +% integration tolerance systematic errors might be occuring. +% + +calib_projdir = ... + ['/Users/vipulsinghal/Dropbox/Documents/toolbox/txtlsim_vsfork2017/'... + 'mcmc_simbio/projects/proj_tierra2018_calibration']; +marraytemp = mcmc_get_walkers({'20181105_112220'}, {1:10}, calib_projdir); + +msz=size(marraytemp) +marraytemp2 = marraytemp(:, :)'; +clear marraytemp +msz=size(marraytemp2) + +% take the median value for the krdg parameter, +% and use the index of that value to pick the point to use. +[sorted_krdg, Ix] = sort(marraytemp2(:,1)); +mediansIx = Ix(ceil(length(Ix)/2)); +median_fullparam_log = marraytemp2(mediansIx, :) +format short g +median_fullparam = exp(median_fullparam_log) + +% Using these, we can pick the ESPs for both the reference and +% candidate extracts: +ref_esp_ix = [2,4]; +can_esp_ix = [3,5]; + +ref_esp = median_fullparam(ref_esp_ix); +can_esp = median_fullparam(can_esp_ix); +csp_cutpoint = median_fullparam(1); +% set the model esp parameters to the can_esp + +%% + +mobj = model_aTc_induc1; + +mcmc_info = mcmc_info_tierra2018_corr(mobj, can_esp, csp_cutpoint); + +mi = mcmc_info.model_info; + +%% Get experimental data +di = tierradataset; +di(5).timeVector = di(5).timeVector(1:121); +di(6).timeVector = di(6).timeVector(1:121); +di(5).dataArray = di(5).dataArray(1:121, :,:,:); +di(6).dataArray = di(6).dataArray(1:121, :,:,:); + +mcmc_trajectories([], di(6), [], [], [], [], 'just_data_info', true); + +% Manually find a set of parameters that get you in the approximate realm of +% the experimental data (Otherwise a lot more computation is needed.. +% completely doable, but takes some time / cluster access, which I do not +% currently have). + +% manually pick parameter values +cpol = can_esp(2); % nM +rkfdG = .5; % nM-1s-1 +rkrdG = csp_cutpoint(1); % s-1 +rkfdT = .2; +rkrdT = 30; +rkcp = can_esp(1); %s-1 +frate = .5; +rrate = 20; +rkfdimTet = frate ; % nM-1s-1 +rkrdimTet = rrate; % s-1 +rkfseqTet = frate ; % nM-1s-1 +rkrseqTet = rrate; % s-1 +% rkfdimaTc = frate ; +% rkrdimaTc = rrate; +rkfseqaTc = frate ; +rkrseqaTc = rrate; + +masterVector = log([... +rkfdG +rkrdG +rkfdT +rkrdT +rkfdimTet +rkrdimTet +rkfseqTet +rkrseqTet +rkfseqaTc +rkrseqaTc +rkcp +cpol]); + +% rkfdimaTc +% rkrdimaTc + +% simulate the data +% di_artificial = data_artificial_v2({mobj}, {di(6).timeVector}, ... +% {mi.measuredSpecies},... +% {mi.dosedNames}, {mi.dosedVals}, ... +% {mi.namesUnord},... +% {exp(masterVector), exp(masterVector)}); + + di_artificial = data_artificial_v2(mobj, di(6).timeVector, ... + mi.measuredSpecies,... + mi.dosedNames, mi.dosedVals, ... + mi.namesUnord,... + exp(masterVector)); +da_test_cand = di_artificial(1).dataArray; +tv = di_artificial(1).timeVector; + +% Plot the data against experimental data from the previous step. + +mcmc_trajectories([], di_artificial, [], [], [], [], 'just_data_info', true); +ri = mcmc_info.runsim_info; + +mai = mcmc_info.master_info; +%% +marraytemp = mcmc_get_walkers({'20181107_135851'}, {1:2}, projdir); +% next iter NOT run yet, nov 5, 11 am. +msz=size(marraytemp); %5 46 440 + +initialization_matrix = [marraytemp(:,:,end)]; +clear marraytemp + +mi = mcmc_runsim_v2(tstamp, projdir, di(6), mcmc_info,... + 'pausemode', true,... + 'multiplier', 2,'UserInitialize', initialization_matrix); +% +% 'InitialDistribution', 'LHS' + + +%% +tstamptouse = tstamp;%'20181107_084423'; +projdir_usethis = projdir; +%['/Users/vipulsinghal/Dropbox/Documents/toolbox/txtlsim_vsfork2017/mcmc_simbio/projects/proj_tierra2018_correction1'] + +addpath(projdir_usethis) +%load(['full_variable_set_' tstamptouse]) + +% tstamptouse = tstamp; +marray = mcmc_get_walkers({tstamptouse}, {1:10}, projdir_usethis); + +% plot parameter distribution corner plot, and markov chains. +mcmc_plot(marray, mai.estNames,... + 'savematlabfig', true, 'savejpeg', true,... + 'projdir', projdir_usethis, 'tstamp', tstamptouse,... + 'extrafignamestring', '_tierra_corr1_t1'); + +%% + +titls = {'aTc 10000';'aTc 1000';'aTc 100';'aTc 10';'aTc 1'}; +lgds = {}; +mvarray = masterVecArray(marray, mai); +% +marrayOrd = mvarray(mi(1).paramMaps(mi(1).orderingIx, 1),:,:); +% +fhandle = mcmc_trajectories(mi(1).emo, di(6), mi(1), marrayOrd,... + titls, lgds,... + 'SimMode', 'meanstd', 'ExpMode', 'curves', 'savematlabfig', true,... + 'savejpeg', true,... + 'projdir', projdir, 'tstamp', tstamptouse,... + 'extrafignamestring', '_tierra_corr_t1'); + + +%% \ No newline at end of file diff --git a/mcmc_simbio/projects/proj_tierra2018_correction2.m b/mcmc_simbio/projects/proj_tierra2018_correction2.m new file mode 100644 index 0000000..bf93ad5 --- /dev/null +++ b/mcmc_simbio/projects/proj_tierra2018_correction2.m @@ -0,0 +1,596 @@ +% correction step 2 for the tierra data test 1 +% (direct aTc correction for calibration via constitutive expression +% of the ptet promoter. ) +% +% Vipul Singhal, Caltech, 2018 +% + + +%% get the ESPs from the calibration experiments (this is the same as in the +% first part of the file: +% ['/Users/vipulsinghal/Dropbox/Documents/toolbox/txtlsim_vsfork2017'... +% '/mcmc_simbio/projects/proj_tierra2018_correction1.m'] +calib_projdir = ... + ['/Users/vipulsinghal/Dropbox/Documents/toolbox/txtlsim_vsfork2017/'... + 'mcmc_simbio/projects/proj_tierra2018_calibration']; +marraytemp = mcmc_get_walkers({'20181105_112220'}, {1:10}, calib_projdir); + +msz=size(marraytemp) +marraytemp2 = marraytemp(:, :)'; +clear marraytemp +msz=size(marraytemp2) + +% take the median value for the krdg parameter, +% and use the index of that value to pick the point to use. +[sorted_krdg, Ix] = sort(marraytemp2(:,1)); +mediansIx = Ix(ceil(length(Ix)/2)); +median_fullparam_log = marraytemp2(mediansIx, :) +format short g +median_fullparam = exp(median_fullparam_log) + +% Using these, we can pick the ESPs for both the reference and +% candidate extracts: +ref_esp_ix = [2,4]; +can_esp_ix = [3,5]; + +ref_esp = median_fullparam(ref_esp_ix); +can_esp = median_fullparam(can_esp_ix); +csp_cutpoint = median_fullparam(1); + +%% get the CSPs from correction step 1: + +% use the CSPs such that the krdG value is the closest to the +% one fixed in the calibration step. +projdir_corr1 = ['/Users/vipulsinghal/Dropbox/Documents/toolbox/txtlsim_vsfork2017/'... + 'mcmc_simbio/projects/proj_tierra2018_correction1']; +tstamp_corr1 = '20181107_145154'; +% 10 iterations +num_iter = 10; +load([projdir_corr1 '/simdata_' tstamp_corr1... + '/full_variable_set_' tstamp_corr1 '.mat']) +% do this properly, with only the right variables. + +%% correction step 1 runsim info and master info +ri = mcmc_info.runsim_info; +mai = mcmc_info.master_info; + +%% +marray_corr1 = mcmc_get_walkers({tstamp_corr1}, {1:num_iter}, projdir_corr1); +marray_corr1=marray_corr1(:, :, 51:100); +msz=size(marray_corr1) + +%% +% code to use from test015: +mstacked_corr1 = marray_corr1(:,:)'; +log(csp_cutpoint) + +% the corresponding CSPs are: +tol = 2; % pretty bad tol. +% ie, the param for csp estimated from the calibration experiment is +% totally off from the set of values used in the correction step. oh well. +% this does not violate the parameter consistency theorem in any way, but +% might lead to a future result. + +cc = intersect(find(mstacked_corr1(:, 1)>log(csp_cutpoint)-tol),... + find(mstacked_corr1(:, 1)1 && currR= F(currR) + currL = currL - 1 ; + else + currR =currR + 1; + end + newIdx = currL:currR; + try + currmass = trapz(XI(newIdx), F(newIdx)); + catch + disp('error...') + end + +end + +bds = [XI(currL) XI(currR)]; +% note that the limits of the XI need not coincide with the limits of the +% the samples, since XI is the support of the gaussian fitted to the +% samples. +end + diff --git a/mcmc_simbio/src/catMC.m b/mcmc_simbio/src/catMC.m new file mode 100644 index 0000000..955d81e --- /dev/null +++ b/mcmc_simbio/src/catMC.m @@ -0,0 +1,10 @@ +function mcat = catMC(datafiles) +%catMC Take Markov Chains from GWMCMC and concat them + +load(datafiles{1}, 'm'); +mcat = m; +for i = 2:length(datafiles) + load(datafiles{i}, 'm'); + mcat = cat(3, mcat, m); +end + diff --git a/mcmc_simbio/src/computeDataStats.m b/mcmc_simbio/src/computeDataStats.m new file mode 100644 index 0000000..f16eab2 --- /dev/null +++ b/mcmc_simbio/src/computeDataStats.m @@ -0,0 +1,45 @@ +function [summst, spreadst] = computeDataStats(dataArray, dispmode) +% da: data array +% datasummary: mean, median or none +% dataspread: curves, std or none. +% rD: replicates dimension + +switch dispmode + case 'mean' + summst = mean(dataArray, 3); + spreadst = []; + case 'median' + % sum over the time dimension + % tD MUST be 1 for this to work.. I tried being fully general, + % but what is the point? It's unnecessarily difficult. + % compute the indexes of the median (in terms of sum / integral) + % curves over the replicates + [ix, mdvals] = medianIndex(sum(dataArray, 1), 3); + % again, rD MUST be 3. + summst = medianReplicate(dataArray, ix); + % spreadstatistic is the empty vector + spreadst = []; + case 'meanstd' + summst = mean(dataArray, 3); + spreadst = std(dataArray, 0, 3); + case 'meancurves' + summst = mean(dataArray, 3); + spreadst = dataArray; + case 'medianstd' + [ix, mdvals] = medianIndex(sum(dataArray, 1), 3); + summst = medianReplicate(dataArray, ix); + spreadst = std(dataArray, 0, 3); + case 'mediancurves' + [ix, mdvals] = medianIndex(sum(dataArray, 1), 3); + summst = medianReplicate(dataArray, ix); + spreadst = allButMedianCurve(dataArray, ix); + case 'curves' + summst = []; + spreadst = dataArray; + + otherwise + error(['Invalid data display mode. Must be one of: ''mean'','... + ' ''median'' , ''meanstd'',''medianstd'', ''meancurves'','... + ' ''mediancurves'', ''curves''.']) +end +end diff --git a/mcmc_simbio/src/computeFitOption.m b/mcmc_simbio/src/computeFitOption.m new file mode 100644 index 0000000..1daaa59 --- /dev/null +++ b/mcmc_simbio/src/computeFitOption.m @@ -0,0 +1,19 @@ +function currda = computeFitOption(da, fo) + % fo is the fit option. + % da is the data array + switch fo + case 'FitMedian' + % Compute the curvewise median of the data. + [ix, mdvals] = medianIndex(sum(da, 1), 3); + currda = medianReplicate(da, ix); + case 'FitMean' + currda = mean(da, 3); + case 'FitAll' + currda = da; + otherwise + error(... + ['Invalid fit option. Read the documentation'... + ' for how to specify inputs.']) + end + +end \ No newline at end of file diff --git a/mcmc_simbio/src/create_mobj_RNAdeg.m b/mcmc_simbio/src/create_mobj_RNAdeg.m new file mode 100644 index 0000000..2388225 --- /dev/null +++ b/mcmc_simbio/src/create_mobj_RNAdeg.m @@ -0,0 +1,265 @@ +function [ Mobj1 ] = create_mobj_RNAdeg(extract, varargin) +% commit test +%create_mobj_geneexpr Create a standard TXTL toolbox gene expression +%circuit +% [ Mobj1 ] = create_mobj_geneexpr(extract, speciesGroups, globalKdRules) +% +% Input Arguments: +% +% extract = a string specifying the extract batch for the txtl Modeling +% toolbox, example: 'E30VNPRL' +% +% speciesGroups = a structure containing info on additional species to be +% created which are the sum of some of the existing species. The main +% example of this is TotalRNA, which is the sum of all the GFP RNA in +% the system. +% +% globalKdRules is a struct which looks like this: +% globalKdRules = +% +% 2x1 struct array with fields: +% +% rxStr +% paramName +% kdVal +% fVal +% +% globalKdRules(1) +% rxStr: '[RNA rbs--deGFP] + Ribo <-> [Ribo:RNA...' +% paramName: 'TXTL_RBS_Ribo_deGFP' +% kdVal: 452.1739 +% fVal: 0.2300 +% +% See the file function_design_script for an example of the construction of +% the struct. +% +% OUTPUT Arguments +% Mobj1: a Simbiology model Object containing the constitutive gene +% expression circuit. +% +if nargin > 1 + tspan = varargin{1}; + customtime = true; +end + +% Create the standard model object +tube1 = txtl_extract(extract); +tube2 = txtl_buffer(extract); +tube3 = txtl_newtube('gene_expression'); +txtl_add_dna(tube3, ... + 'p70(50)', 'rbs(20)', 'deGFP(1000)', 0, 'plasmid'); +Mobj1 = txtl_combine([tube1, tube2, tube3]); +cs1 = getconfigset(Mobj1); +set(cs1.RuntimeOptions, 'StatesToLog', 'all'); + +set(cs1.SolverOptions, 'AbsoluteToleranceScaling', 1); +set(cs1.SolverOptions, 'AbsoluteTolerance', 1.0e-6); +set(cs1.SolverOptions, 'AbsoluteToleranceStepSize', tspan(end)*1.0e-6*0.1); +set(cs1.SolverOptions, 'RelativeTolerance', 1.0e-6); +% try: AbsoluteToleranceStepSize = AbsoluteTolerance * StopTime * 0.1 +tic +if customtime + [simData1] = txtl_runsim(Mobj1,tspan(end)); + else +[simData1] = txtl_runsim(Mobj1,8*60*60); +end +toc + +%% Globalize irreverisble reactions too +rxstr = {'[protein deGFP] -> [protein deGFP*]' + '[term_RNAP70:DNA p70--rbs--deGFP] -> RNAP70 + [DNA p70--rbs--deGFP]' + '[RNA rbs--deGFP:RNase] -> RNase' + '[AA:AGTP:Ribo:RNA rbs--deGFP:RNase] -> AGTP + AA + Ribo + RNase' + '[Ribo:RNA rbs--deGFP:RNase] -> Ribo + RNase'}; + +pname_cat = {'TXTL_PROT_deGFP_MATURATION' +'TXTL_RNAPBOUND_TERMINATION_RATE' +'TXTL_RNAdeg_catalysis' +'TXTL_RNAdeg_catalysis' +'TXTL_RNAdeg_catalysis'}; + +k_cat = {0.00231049 +0.05 +0.00277778 +0.00277778 +0.00277778}; + +globalCatalysisParams = struct('rxStr', rxstr,... + 'paramName', pname_cat,... + 'paramValue', k_cat); + + +for k = 1: length(globalCatalysisParams) +createFrateRules(Mobj1, globalCatalysisParams(k).rxStr,... + globalCatalysisParams(k).paramName,... + globalCatalysisParams(k).paramValue); +end + +%% Define the parameters to move to global scope to allow for Kd estimation +rxstr = {'[RNA rbs--deGFP] + Ribo <-> [Ribo:RNA rbs--deGFP]'; + '[DNA p70--rbs--deGFP] + RNAP70 <-> [RNAP70:DNA p70--rbs--deGFP]' + 'RNAP + [protein sigma70] <-> RNAP70' + '[RNAP70:DNA p70--rbs--deGFP] + AGTP <-> [AGTP:RNAP70:DNA p70--rbs--deGFP]' + '[RNAP70:DNA p70--rbs--deGFP] + CUTP <-> [CUTP:RNAP70:DNA p70--rbs--deGFP]' + '[AGTP:RNAP70:DNA p70--rbs--deGFP] + CUTP <-> [CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP]' + '[CUTP:RNAP70:DNA p70--rbs--deGFP] + AGTP <-> [CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP]' + '[Ribo:RNA rbs--deGFP] + AA + AGTP <-> [AA:AGTP:Ribo:RNA rbs--deGFP]' + '[RNA rbs--deGFP] + RNase <-> [RNA rbs--deGFP:RNase]' + '[AA:AGTP:Ribo:RNA rbs--deGFP] + RNase <-> [AA:AGTP:Ribo:RNA rbs--deGFP:RNase]' + '[Ribo:RNA rbs--deGFP] + RNase <-> [Ribo:RNA rbs--deGFP:RNase]'}; + +pname = {'TXTL_RBS_Ribo_deGFP'; + 'TXTL_P70_RNAPbound_deGFP' + 'TXTL_RNAP_S70' + 'TXTL_NTP_RNAP_1' + 'TXTL_NTP_RNAP_1' + 'TXTL_NTP_RNAP_2' + 'TXTL_NTP_RNAP_2' + 'TXTL_AA' + 'TXTL_RNAdeg' + 'TXTL_RNAdeg' + 'TXTL_RNAdeg'}; +% using defaults from above. +Kdval = {104/0.23; + 725.07/0.06; + 0.1/10; + 1.2e+10 / 100000; + 1.2e+10 / 100000; + 1.2e7 / 100; + 1.2e7 / 100; + 325046 / 9.055; + 2000 / 10; + 2000 / 10; + 2000 / 10}; +Fval = {0.23; + 0.06; + 10; + 100000; + 100000; + 100; + 100; + 9.055; + 10; + 10; + 10}; + +globalKdRules = struct('rxStr', rxstr,... + 'paramName', pname,... + 'kdVal', Kdval,... + 'fVal', Fval); + + +% Create Kd rules, adding parameters to the global scope +for k = 1: length(globalKdRules) +createKdRules(Mobj1, globalKdRules(k).rxStr,... + globalKdRules(k).paramName,... + globalKdRules(k).kdVal,... + globalKdRules(k).fVal); +end + + + +%% +speciesGroups = struct('summedSpeciesName', {'TotalRNA'},... + 'speciesToSum', {{'[RNA rbs--deGFP]', '[Ribo:RNA rbs--deGFP]',... + '[AA:AGTP:Ribo:RNA rbs--deGFP]', '[RNA rbs--deGFP:RNase]',... + '[AA:AGTP:Ribo:RNA rbs--deGFP:RNase]', '[Ribo:RNA rbs--deGFP:RNase]'}}); + +% Create the rule for total RNA (or total whatever) +for i = 1:length(speciesGroups) + expr = [speciesGroups(i).summedSpeciesName ' = ']; + nSpToSum = length(speciesGroups(i).speciesToSum); + for j = 1:nSpToSum-1 + expr = [expr speciesGroups(i).speciesToSum{j} ' + ']; + end + expr = [expr speciesGroups(i).speciesToSum{nSpToSum}]; + txtl_addspecies(Mobj1, speciesGroups(i).summedSpeciesName, 0); + ruleObj = addrule(Mobj1, expr, 'repeatedAssignment'); +end + +%% Create a rule for consumption reaction using length based formula + +%%%% FOR TX +% calculate tx rate. the rna len can be extracted from the userdata if +% desired. dont need to do that just yet. +rnalen = 20 + 1000; %utr length and gene length in base pairs. +elongrate = 1.5 ; % the parameter name is k_elon. this gets varied in estimation. +rx = sbioselect(Mobj1, 'reaction', ... + '[CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP] -> [term_RNAP70:DNA p70--rbs--deGFP] + [RNA rbs--deGFP]'); % +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_transcription_rate1'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_transcription_rate1'; + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_transcription_rate1')) +addparameter(Mobj1, 'TXTL_transcription_rate1', elongrate/rnalen); +end + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'k_elon')) +addparameter(Mobj1, 'k_elon', elongrate); +end + +ruleStr = ['TXTL_transcription_rate1 = k_elon/' num2str(rnalen)]; + +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +rx = sbioselect(Mobj1, 'reaction', ... + '[CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP] -> RNAP70 + [DNA p70--rbs--deGFP]'); % +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_NTP_consumption'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_k_con_TX'; + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_k_con_TX')) +addparameter(Mobj1, 'TXTL_k_con_TX', (rnalen/4-1)*(elongrate/rnalen)); +end + +ruleStr = ['TXTL_k_con_TX = (' num2str(rnalen) '/4-1)*(k_elon/' num2str(rnalen) ')']; +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +%%%% FOR TL +genelen = 1000; %utr length and gene length in base pairs. +elongrate_prot = 4 ; % the parameter name is k_elon. this gets varied in estimation. + +rx = sbioselect(Mobj1, 'reaction', ... + '[AA:AGTP:Ribo:RNA rbs--deGFP] -> [RNA rbs--deGFP] + [protein deGFP] + Ribo'); % + +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_TL_rate'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_TL_rate'; + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_TL_rate')) +addparameter(Mobj1, 'TXTL_TL_rate', elongrate_prot/genelen); +end + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'k_elon_prot')) +addparameter(Mobj1, 'k_elon_prot', elongrate_prot); +end + +ruleStr = ['TXTL_TL_rate = k_elon_prot/' num2str(genelen)]; + +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +rx = sbioselect(Mobj1, 'reaction', ... + '[AA:AGTP:Ribo:RNA rbs--deGFP] -> [RNA rbs--deGFP] + Ribo'); % + +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_TL_AA_consumption'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_k_con_TL'; + + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_k_con_TL')) +addparameter(Mobj1, 'TXTL_k_con_TL', (genelen-1)*(elongrate_prot/genelen)); +end + +ruleStr = ['TXTL_k_con_TL = (' num2str(genelen) '-1)*(k_elon_prot/' num2str(genelen) ')']; +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +end + diff --git a/mcmc_simbio/src/create_mobj_geneexpr.m b/mcmc_simbio/src/create_mobj_geneexpr.m new file mode 100644 index 0000000..edc80c1 --- /dev/null +++ b/mcmc_simbio/src/create_mobj_geneexpr.m @@ -0,0 +1,267 @@ +function [ Mobj1 ] = create_mobj_geneexpr(extract, varargin) +% commit test +%create_mobj_geneexpr Create a standard TXTL toolbox gene expression +%circuit +% [ Mobj1 ] = create_mobj_geneexpr(extract, speciesGroups, globalKdRules) +% +% Input Arguments: +% +% extract = a string specifying the extract batch for the txtl Modeling +% toolbox, example: 'E30VNPRL' +% +% speciesGroups = a structure containing info on additional species to be +% created which are the sum of some of the existing species. The main +% example of this is TotalRNA, which is the sum of all the GFP RNA in +% the system. +% +% globalKdRules is a struct which looks like this: +% globalKdRules = +% +% 2x1 struct array with fields: +% +% rxStr +% paramName +% kdVal +% fVal +% +% globalKdRules(1) +% rxStr: '[RNA rbs--deGFP] + Ribo <-> [Ribo:RNA...' +% paramName: 'TXTL_RBS_Ribo_deGFP' +% kdVal: 452.1739 +% fVal: 0.2300 +% +% See the file function_design_script for an example of the construction of +% the struct. +% +% OUTPUT Arguments +% Mobj1: a Simbiology model Object containing the constitutive gene +% expression circuit. +% +if nargin > 1 + tspan = varargin{1}; + customtime = true; +end + +% Create the standard model object +tube1 = txtl_extract(extract); +tube2 = txtl_buffer(extract); +tube3 = txtl_newtube('gene_expression'); +txtl_add_dna(tube3, ... + 'p70(50)', 'rbs(20)', 'deGFP(1000)', 1, 'plasmid'); +Mobj1 = txtl_combine([tube1, tube2, tube3]); +cs1 = getconfigset(Mobj1); +set(cs1.RuntimeOptions, 'StatesToLog', 'all'); + +set(cs1.SolverOptions, 'AbsoluteToleranceScaling', 1); +set(cs1.SolverOptions, 'AbsoluteTolerance', 1.0e-6); +set(cs1.SolverOptions, 'AbsoluteToleranceStepSize', tspan(end)*1.0e-6*0.1); +set(cs1.SolverOptions, 'RelativeTolerance', 1.0e-6); +% try: AbsoluteToleranceStepSize = AbsoluteTolerance * StopTime * 0.1 +tic +if customtime + [simData1] = txtl_runsim(Mobj1,tspan(end)); + else +[simData1] = txtl_runsim(Mobj1,8*60*60); +end +toc + +%% Globalize irreverisble reactions too +rxstr = {'[protein deGFP] -> [protein deGFP*]' + '[term_RNAP70:DNA p70--rbs--deGFP] -> RNAP70 + [DNA p70--rbs--deGFP]' + '[RNA rbs--deGFP:RNase] -> RNase' + '[AA:AGTP:Ribo:RNA rbs--deGFP:RNase] -> AGTP + AA + Ribo + RNase' + '[Ribo:RNA rbs--deGFP:RNase] -> Ribo + RNase'}; + +pname_cat = {'TXTL_PROT_deGFP_MATURATION' +'TXTL_RNAPBOUND_TERMINATION_RATE' +'TXTL_RNAdeg_catalysis' +'TXTL_RNAdeg_catalysis' +'TXTL_RNAdeg_catalysis'}; + +k_cat = {0.00231049 +0.05 +0.00277778 +0.00277778 +0.00277778}; + +globalCatalysisParams = struct('rxStr', rxstr,... + 'paramName', pname_cat,... + 'paramValue', k_cat); + + +for k = 1: length(globalCatalysisParams) +createFrateRules(Mobj1, globalCatalysisParams(k).rxStr,... + globalCatalysisParams(k).paramName,... + globalCatalysisParams(k).paramValue); +end + +%% Define the parameters to move to global scope to allow for Kd estimation +rxstr = {'[RNA rbs--deGFP] + Ribo <-> [Ribo:RNA rbs--deGFP]'; + '[DNA p70--rbs--deGFP] + RNAP70 <-> [RNAP70:DNA p70--rbs--deGFP]' + 'RNAP + [protein sigma70] <-> RNAP70' + '[RNAP70:DNA p70--rbs--deGFP] + AGTP <-> [AGTP:RNAP70:DNA p70--rbs--deGFP]' + '[RNAP70:DNA p70--rbs--deGFP] + CUTP <-> [CUTP:RNAP70:DNA p70--rbs--deGFP]' + '[AGTP:RNAP70:DNA p70--rbs--deGFP] + CUTP <-> [CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP]' + '[CUTP:RNAP70:DNA p70--rbs--deGFP] + AGTP <-> [CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP]' + '[Ribo:RNA rbs--deGFP] + AA + AGTP <-> [AA:AGTP:Ribo:RNA rbs--deGFP]' + '[RNA rbs--deGFP] + RNase <-> [RNA rbs--deGFP:RNase]' + '[AA:AGTP:Ribo:RNA rbs--deGFP] + RNase <-> [AA:AGTP:Ribo:RNA rbs--deGFP:RNase]' + '[Ribo:RNA rbs--deGFP] + RNase <-> [Ribo:RNA rbs--deGFP:RNase]'}; + +pname = {'TXTL_RBS_Ribo_deGFP'; + 'TXTL_P70_RNAPbound_deGFP' + 'TXTL_RNAP_S70' + 'TXTL_NTP_RNAP_1' + 'TXTL_NTP_RNAP_1' + 'TXTL_NTP_RNAP_2' + 'TXTL_NTP_RNAP_2' + 'TXTL_AA' + 'TXTL_RNAdeg' + 'TXTL_RNAdeg' + 'TXTL_RNAdeg'}; + +% using defaults from above. +Kdval = {104/0.23; + 725.07/0.06; + 0.1/10; + 1.2e+10 / 100000; + 1.2e+10 / 100000; + 1.2e7 / 100; + 1.2e7 / 100; + 325046 / 9.055; + 2000 / 10; + 2000 / 10; + 2000 / 10}; + +Fval = {0.23; + 0.06; + 10; + 100000; + 100000; + 100; + 100; + 9.055; + 10; + 10; + 10}; + +globalKdRules = struct('rxStr', rxstr,... + 'paramName', pname,... + 'kdVal', Kdval,... + 'fVal', Fval); + + +% Create Kd rules, adding parameters to the global scope +for k = 1: length(globalKdRules) +createKdRules(Mobj1, globalKdRules(k).rxStr,... + globalKdRules(k).paramName,... + globalKdRules(k).kdVal,... + globalKdRules(k).fVal); +end + + + +%% +speciesGroups = struct('summedSpeciesName', {'TotalRNA'},... + 'speciesToSum', {{'[RNA rbs--deGFP]', '[Ribo:RNA rbs--deGFP]',... + '[AA:AGTP:Ribo:RNA rbs--deGFP]', '[RNA rbs--deGFP:RNase]',... + '[AA:AGTP:Ribo:RNA rbs--deGFP:RNase]', '[Ribo:RNA rbs--deGFP:RNase]'}}); + +% Create the rule for total RNA (or total whatever) +for i = 1:length(speciesGroups) + expr = [speciesGroups(i).summedSpeciesName ' = ']; + nSpToSum = length(speciesGroups(i).speciesToSum); + for j = 1:nSpToSum-1 + expr = [expr speciesGroups(i).speciesToSum{j} ' + ']; + end + expr = [expr speciesGroups(i).speciesToSum{nSpToSum}]; + txtl_addspecies(Mobj1, speciesGroups(i).summedSpeciesName, 0); + ruleObj = addrule(Mobj1, expr, 'repeatedAssignment'); +end + +%% Create a rule for consumption reaction using length based formula + +%%%% FOR TX +% calculate tx rate. the rna len can be extracted from the userdata if +% desired. dont need to do that just yet. +rnalen = 20 + 1000; %utr length and gene length in base pairs. +elongrate = 1.5 ; % the parameter name is k_elon. this gets varied in estimation. +rx = sbioselect(Mobj1, 'reaction', ... + '[CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP] -> [term_RNAP70:DNA p70--rbs--deGFP] + [RNA rbs--deGFP]'); % +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_transcription_rate1'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_transcription_rate1'; + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_transcription_rate1')) +addparameter(Mobj1, 'TXTL_transcription_rate1', elongrate/rnalen); +end + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'k_elon')) +addparameter(Mobj1, 'k_elon', elongrate); +end + +ruleStr = ['TXTL_transcription_rate1 = k_elon/' num2str(rnalen)]; + +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +rx = sbioselect(Mobj1, 'reaction', ... + '[CUTP:AGTP:RNAP70:DNA p70--rbs--deGFP] -> RNAP70 + [DNA p70--rbs--deGFP]'); % +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_NTP_consumption'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_k_con_TX'; + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_k_con_TX')) +addparameter(Mobj1, 'TXTL_k_con_TX', (rnalen/4-1)*(elongrate/rnalen)); +end + +ruleStr = ['TXTL_k_con_TX = (' num2str(rnalen) '/4-1)*(k_elon/' num2str(rnalen) ')']; +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +%%%% FOR TL +genelen = 1000; %utr length and gene length in base pairs. +elongrate_prot = 4 ; % the parameter name is k_elon. this gets varied in estimation. + +rx = sbioselect(Mobj1, 'reaction', ... + '[AA:AGTP:Ribo:RNA rbs--deGFP] -> [RNA rbs--deGFP] + [protein deGFP] + Ribo'); % + +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_TL_rate'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_TL_rate'; + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_TL_rate')) +addparameter(Mobj1, 'TXTL_TL_rate', elongrate_prot/genelen); +end + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'k_elon_prot')) +addparameter(Mobj1, 'k_elon_prot', elongrate_prot); +end + +ruleStr = ['TXTL_TL_rate = k_elon_prot/' num2str(genelen)]; + +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +rx = sbioselect(Mobj1, 'reaction', ... + '[AA:AGTP:Ribo:RNA rbs--deGFP] -> [RNA rbs--deGFP] + Ribo'); % + +pTarget = sbioselect(rx, 'type', 'parameter', 'Name', 'TXTL_TL_AA_consumption'); +pTarget.delete; +rx.KineticLaw.ParameterVariableNames = 'TXTL_k_con_TL'; + + +if isempty(sbioselect(Mobj1,'Type','Parameter', 'Name', 'TXTL_k_con_TL')) +addparameter(Mobj1, 'TXTL_k_con_TL', (genelen-1)*(elongrate_prot/genelen)); +end + +ruleStr = ['TXTL_k_con_TL = (' num2str(genelen) '-1)*(k_elon_prot/' num2str(genelen) ')']; +if isempty(sbioselect(Mobj1,'Type','Rule', 'Rule', ruleStr)) + addrule(Mobj1, ruleStr, 'initialAssignment'); +end + +end + diff --git a/mcmc_simbio/src/curvewiseMedian.m b/mcmc_simbio/src/curvewiseMedian.m new file mode 100644 index 0000000..12ec50f --- /dev/null +++ b/mcmc_simbio/src/curvewiseMedian.m @@ -0,0 +1,17 @@ +function medianval = curvewiseMedian(dataMat) + % Compute the median of a set of curves given in dataMat + % (of dimensions #points x #curves) using the max value of + % each curve to order the curves. If there is an even number + % of curves, then the larger curve is used. + + + [mx, I] = sort(max(dataMat)); + if mod(numel(I), 2)==0 + ix = I(numel(I)/2+1); + else + ix = I((numel(I)+1)/2); + end + medianval = dataMat(:,ix); +end + + diff --git a/mcmc_simbio/src/data_artificial.m b/mcmc_simbio/src/data_artificial.m new file mode 100644 index 0000000..e1e43e6 --- /dev/null +++ b/mcmc_simbio/src/data_artificial.m @@ -0,0 +1,210 @@ +function [data_info] = data_artificial(mobj, mi, tv, varargin) +% use simbiology model object to generate artificial data. +% +% Dosing and measurement strategy are defined by the mcmc_info struct (mi), +% and the time points defined by the vector tv. Additional +% name-value pair arguments can be specified. +% +% REQUIRED INPUTS +% mobj: simbiology model object. +% +% mi: mcmc_info struct. Type 'help mcmc_info' in the command line to read more +% about this. +% +% tv: a vector of timepoints which the data in the output data array will +% correspond to. These are the poitns at which the model will be simulated to +% compute the data values. +% +% +% OPTIONAL NAME VALUE PAIR ARGUMENTS - used to populate the data_info struct. +% +% 'dataInfo': Human readable description of the data. Should be specified using +% the format that makes it printable with the function fprintf. (so newline +% characters \n at the ver least, for example.). If not specified, its value is +% simply 'Artificial Data' +% +% 'timeUnits': String specifying the units of the time axis. Default is 'seconds' +% +% 'measuredNames': A 1 x number of measured species cell array of the strings +% specifying which species are dosed. If not specified, the strings from the +% mcmc_info struct will be used. When the measured species there is an aggregate +% of a number of species in the model, a string [xxxxxx ' + ...'] is used, where +% xxxx is the first string within the list of species whose concentrations get +% added. +% +% 'dataUnits': A cell array of strings specifying the units of the measured +% species. If not specified, the default is 'nM'. +% +% 'dosedNames': A 1 x number of dosed species cell array of the strings specifying +% which species are dosed. These strings should correspond to the strings in +% the dosedNames field in the mcmc_info struct. If this is not specified, then +% those very values from the mcmc_info struct are used. +% +% 'doseUnits': A cell array of strings specifying the units of the +% dosed species. If not specified, the default is 'nM'. +% +% 'params', VALUE, where VALUE is a vector of nonnegative parameter values, +% ordered according to mi.names_unord (the unordered array of parameter and +% species names. ) +% Note that the parameter values are NOT log transformed, ie, they all lie +% in the nonnegative orthant. Parameter +% values are an optional argument, and if this argument is not specified, +% then the geometric mean of the rows (each of which has 2 elements) of +% exp(mi.paramranges) will be as the parameter value. +% +% 'noise', VALUE; where VALUE is a vector of standard deviations of the gaussian +% noise added to the data, each element corresponding to one measured +% species. +% +% 'replicates', VALUE, where VALUE is a positive interger of the number +% of replicates. +% +% OUTPUTS: This function returns a data_info struct with fields +% ------------------------------------------ +% +% 'dataInfo': A human readable text description of the data. If not specified +% as a name value pair input argument, the string 'Artificial Data' is used. +% +% 'timeVector': vector of timepoints, same as tv, a required positional input. +% +% 'timeUnits': units of the time vector. Allowed options are: +% 'seconds', 'minutes', 'hours', 'days', 'weeks'. If no units are specified +% as a name value input, then the units are specified as 'seconds'. +% +% 'dataArray': An array contianing the raw data that is generated by simulating +% the data according to the mcmc_info struct. Typically has dimensions +% corresponding to timepoints x measured outputs x replicates x doses. +% +% 'measuredNames': A 1 x number of measured species cell array of the strings +% specifying which species are dosed. These are not strings corresponding to +% the species in the model. Takes from the corresponding name value pair input +% argument. If not specified, the values are taken from the mcmc_info struct. +% +% 'dataUnits': A 1 x number measured species cell array of units corresponding to +% the raw data in the dataArray. If no units are specified, then nM are used. +% +% 'dimensionLabels': a 1 by length(size(data_info.dataArray)) cell array of +% labels for the dimensions of the dataArray. +% +% 'dosedNames': A 1 x number of dosed species cell array of the strings specifying +% which species are dosed. These are not strings corresponding to the species +% in the model. See mcmc_info constructor functions for that. +% +% 'dosedVals': A matrix of dose values of size +% # of dosed species by # of dose combinations +% +% 'doseUnits': A 1 x number of dosed species cell array of strings specifying the +% units of the dosed species. If no units are +% +% ------------------------------------------ +% +% + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: + +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. + +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + + + +pdefaults = geomean(exp(mi.paramranges), 2); % !TODO check this +ms = mi.measuredSpecies; +noisedefaults = zeros(length(ms), 1); +dimensionLabels = {'time points', 'measured species', 'replicates', 'doses'}; +p = inputParser; + +addParameter(p, 'replicates', 1, @isnumeric); % +addParameter(p, 'params', pdefaults, @isnumeric); +addParameter(p, 'noise', noisedefaults, @isnumeric); % the default is 0 +addParameter(p, 'timeUnits', 'seconds', @ischar); +addParameter(p, 'dataUnits', 'nM'); +addParameter(p, 'dosedVals', mi.dosedVals); +addParameter(p, 'dosedNames', mi.dosedNames); +addParameter(p, 'doseUnits', 'nM'); +addParameter(p, 'params', pdefaults); +addParameter(p, 'dataInfo', ['Artificial Data']); +addParameter(p, 'measuredNames', mi.measuredSpecies); % default: empty strings +addParameter(p, 'dimensionLabels', dimensionLabels); + +parse(p, varargin{:}); +p = p.Results; + +configsetObj = getconfigset(mobj, 'active'); +set(configsetObj, 'StopTime', tv(end)); + +noisevec = p.noise; +% initialize the data array +da = zeros(length(tv), length(mi.measuredSpecies),... + p.replicates, size(mi.dosedVals, 2)); +ms = mi.measuredSpecies; +% for each dose, simulate the model + +dv = mi.dosedVals; + +% set parameters and species initial concentrations in the model +paramnames = mi.names_unord; +paramvals = p.params; +for i = 1:length(paramnames) + p1 = sbioselect(mobj.parameters, 'Name', paramnames{i}); + if ~isempty(p1) + set(p1, 'Value', paramvals(i); + end +end + +for i = 1:length(paramnames) + s1 = sbioselect(mobj.species, 'Name', paramnames{i}); + if ~isempty(s1) + set(s1, 'InitialValue', paramvals(i)) + end +end + +% set dose values, simulate model, and populate output data array. +for dID = 1:size(dv, 2) + % set the dose value using the mcmc_info struct + for i = 1:length(p.dosedNames) + s1 = sbioselect(mobj.species, 'Name', p.dosedNames{i}); + if ~isempty(s1) + set(s1, 'InitialValue', p.dosedVals(i)) + end + end + + % simulate the model. + sd = sbiosimulate(mobj); + sd = resample(sd, tv); + for msID = 1:length(ms) + measuredSpecies = ms{msID}; + spSD = selectbyname(sd, measuredSpecies); + summed_trajectories = sum(spSD.Data, 2); + % add noise if needed. + for rID = 1:p.replicates + da(:, msID, rID, dID) = ... + summed_trajectories + noisevec(msID)*randn(length(tv), 1); + end + end +end + +% make the data_info struct +di = struct('dataInfo', {p.dataInfo}, ... + 'timeVector' {tv}, ... + 'timeUnits', {p.timeUnits} + 'dataArray' {da},... + 'measuredNames', {p.measuredNames},... + 'dataUnits', {p.dataUnits} + 'dimensionLabels', {p.dimensionLabels}, ... + 'dosedNames' {p.dosedNames},... + 'dosedVals', {p.dosedVals}, ... + 'doseUnits', {p.doseUnits}) diff --git a/mcmc_simbio/src/data_artificial_v2.m b/mcmc_simbio/src/data_artificial_v2.m new file mode 100644 index 0000000..98bc2c9 --- /dev/null +++ b/mcmc_simbio/src/data_artificial_v2.m @@ -0,0 +1,338 @@ +function [di] = data_artificial_v2(mobj, tv, measuredSpecies, ... + dosedNames, dosedVals, activeNames, activeValues, varargin) +% [di] = data_artificial_v2(mobj, tv, measuredSpecies, ... +% dosedNames, dosedVals, activeNames, activeValues, varargin) +% use simbiology model object to generate artificial data. +% +% There are two ways to define the inputs. In the SCALAR MODE, we have: +% +% mobj: simbiology model object +% +% tv: a vector of timepoints which the data in the output data array will +% correspond to. These are the poitns at which the model will be simulated to +% compute the data values. +% +% measuredSpecies: this is a cell array of cell arrays of strings. The +% species in the inner cell array get summed to create the measured +% species. The order of the outer cell array defines the ordering of the +% second dimension of the dataArray property of the data_info struct that +% is output. +% +% dosedNames: Cell array of species that get dosed. +% +% dosedVals: matrix of size #number of dosed species x number of dose +% combinations. +% +% activeNames: cell array of parameters and species names to set in the +% model. +% +% activeValues: vector of corresponding values. Note that the parameter +% values are NOT log transformed, ie, they all lie in the nonnegative orthant. +% +% In the CELL MODE, the main difference is that the data_info output struct +% an now be a non singleton array of length nDatasets. The activeValues +% input is now a cell array of numerical vectors. It has dimensions +% nDatasets x 1 or 1 x nDatasets. +% +% mobj: A 1x1 cell containing a simbiology model object, or a row or column +% cell array of length nDatasets containing model objects. +% +% tv: A 1x1 cell containing a vector of timepoints, or a row or column +% cell array of length nDatasets containing vectors of timepoints. See +% scalar version documentation above for more info. +% +% measuredSpecies: A 1x1 cell containing a cell array of cell arrays of +% strings, or a row or column cell array of length nDatasets containing +% cell array of cell arrays of strings. See scalar version documentation +% above for more info. +% +% dosedNames: A 1x1 cell containing a Cell array of species that get dosed, +% or a row or column cell array of length nDatasets containing +% cell arrays of species that get dosed. See scalar version documentation +% above for more info. +% +% dosedVals: A 1x1 cell containing a matrix of size +% #number of dosed species x number of dose combinations, +% or a row or column cell array of length nDatasets containing +% matrix of size #number of dosed species x number of dose combinations. +% See scalar version documentation above for more info. +% +% activeNames: A 1x1 cell containing a cell array of parameters and species +% names to set, or a row or column cell array of length nDatasets containing +% cell arrays of parameters and species names to set. See scalar version +% documentation above for more info. +% +% activeValues: A 1x1 cell containing a vector of corresponding values, +% or a row or column cell array of length nDatasets containing +% vectors of corresponding values. See scalar version documentation +% above for more info. +% +% +% OPTIONAL NAME VALUE PAIR ARGUMENTS - used to populate the data_info struct. +% If cell mode is active, then all of these are correspindingly encapsulated +% in a 1x1 cell or a row or column cell array of length nDatasets. +% +% 'dataInfo': Human readable description of the data. Should be specified using +% the format that makes it printable with the function fprintf. (so newline +% characters \n at the ver least, for example.). If not specified, its value is +% simply 'Artificial Data'. +% +% 'timeUnits': String specifying the units of the time axis. Default is 'seconds' +% +% 'dataUnits': A cell array of strings specifying the units of the measured +% species. If not specified, the default is 'nM'. +% +% 'doseUnits': A cell array of strings specifying the units of the +% dosed species. If not specified, the default is 'nM'. +% +% 'noise', VALUE; where VALUE is a vector of standard deviations of the gaussian +% noise added to the data, each element corresponding to one measured +% species. +% +% 'replicates', VALUE, where VALUE is a positive interger of the number +% of replicates. +% +% +% OUTPUTS: This function returns a data_info struct with fields +% ------------------------------------------ +% +% 'dataInfo': A human readable text description of the data. If not specified +% as a name value pair input argument, the string 'Artificial Data' is used. +% +% 'timeVector': vector of timepoints, same as tv, a required positional input. +% +% 'timeUnits': units of the time vector. Allowed options are: +% 'seconds', 'minutes', 'hours', 'days', 'weeks'. If no units are specified +% as a name value input, then the units are specified as 'seconds'. +% +% 'dataArray': An array contianing the raw data that is generated by simulating +% the data according to the mcmc_info struct. Typically has dimensions +% corresponding to timepoints x measured outputs x replicates x doses. +% +% 'measuredNames': A 1 x number of measured species cell array of the strings +% specifying which species are dosed. These are not strings corresponding to +% the species in the model. Takes from the corresponding name value pair input +% argument. If not specified, the values are taken from the mcmc_info struct. +% +% 'dataUnits': A 1 x number measured species cell array of units corresponding to +% the raw data in the dataArray. If no units are specified, then nM are used. +% +% 'dimensionLabels': a 1 by length(size(data_info.dataArray)) cell array of +% labels for the dimensions of the dataArray. +% +% 'dosedNames': A 1 x number of dosed species cell array of the strings specifying +% which species are dosed. These are not strings corresponding to the species +% in the model. See mcmc_info constructor functions for that. +% +% 'dosedVals': A matrix of dose values of size +% # of dosed species by # of dose combinations +% +% 'doseUnits': A 1 x number of dosed species cell array of strings specifying the +% units of the dosed species. If no units are +% + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: + +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. + +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + + +% check the sizes of things: +if iscell(activeValues) + nDatasets = length(activeValues); + cellMode = true; +else + nDatasets = 1; + cellMode = false; +end + +if nDatasets == 1 + assert(~iscell(mobj) && length(mobj) == 1); + assert(~iscell(tv) ); + assert(~iscell(dosedVals)); +end + +if cellMode + + p = inputParser; + + addParameter(p, 'replicates', {1}); % + addParameter(p, 'noise', 'default'); % the default is 0 + addParameter(p, 'timeUnits', {'seconds'}); + addParameter(p, 'dataUnits', {'nM'}); + addParameter(p, 'doseUnits', {'nM'}); + addParameter(p, 'dataInfo', {'Artificial Data'}); + addParameter(p, 'dimensionLabels', {{'time points', 'measured species',... + 'replicates', 'doses'}}); + parse(p, varargin{:}); + p = p.Results; + for i = 1:nDatasets + currmeasuredSpecies = cellcontents(measuredSpecies, i); + + if strcmp(p.noise, 'default') + noisevec = zeros(length(currmeasuredSpecies), 1); + else + noisevec = cellcontents(p.noise, i); + end + nReplicates = cellcontents(p.replicates, i); + timeUnits = cellcontents(p.timeUnits, i); + dataUnits = cellcontents(p.dataUnits, i); + doseUnits = cellcontents(p.doseUnits, i); + dataInfo = cellcontents(p.dataInfo, i); + dimensionLabels = cellcontents(p.dimensionLabels, i); + currmobj = cellcontents(mobj, i); + currtv = cellcontents(tv, i); + currdosedNames = cellcontents(dosedNames, i); + currdosedVals = cellcontents(dosedVals, i); + curractiveNames = cellcontents(activeNames, i); + if ~length(activeValues)>1 + error('what on earth?') + end + curractiveValues = activeValues{i}; + + da = computeArtificialData(currmobj, currtv, noisevec, ... + currmeasuredSpecies, nReplicates, currdosedVals, currdosedNames, ... + curractiveNames, curractiveValues); + + if i ==1 + % make the data_info struct + di = struct('dataInfo', {dataInfo}, ... + 'timeVector', {currtv}, ... + 'timeUnits', {timeUnits},... + 'dataArray', {da},... + 'measuredNames', {currmeasuredSpecies},... + 'dataUnits', {dataUnits},... + 'dimensionLabels', {dimensionLabels}, ... + 'dosedNames', {currdosedNames},... + 'dosedVals', {currdosedVals}, ... + 'doseUnits', {doseUnits}); + else + currdi = struct('dataInfo', {dataInfo}, ... + 'timeVector', {currtv}, ... + 'timeUnits', {timeUnits},... + 'dataArray', {da},... + 'measuredNames', {currmeasuredSpecies},... + 'dataUnits', {dataUnits},... + 'dimensionLabels', {dimensionLabels}, ... + 'dosedNames', {currdosedNames},... + 'dosedVals', {currdosedVals}, ... + 'doseUnits', {doseUnits}); + di = [di; currdi]; + end + end +else + noisedefaults = zeros(length(measuredSpecies), 1); + dimensionLabels = {'time points', 'measured species', 'replicates', 'doses'}; + p = inputParser; + addParameter(p, 'replicates', 1, @isnumeric); % + addParameter(p, 'noise', noisedefaults, @isnumeric); % the default is 0 + addParameter(p, 'timeUnits', 'seconds', @ischar); + addParameter(p, 'dataUnits', 'nM'); + addParameter(p, 'doseUnits', 'nM'); + addParameter(p, 'dataInfo', 'Artificial Data'); + addParameter(p, 'dimensionLabels', dimensionLabels); + parse(p, varargin{:}); + p = p.Results; + noisevec = p.noise; + nReplicates = p.replicates; + + da = computeArtificialData(mobj, tv, noisevec, measuredSpecies, ... + nReplicates, dosedVals, dosedNames, activeNames, activeValues); + + % make the data_info struct + di = struct('dataInfo', {p.dataInfo}, ... + 'timeVector', {tv}, ... + 'timeUnits', {p.timeUnits},... + 'dataArray', {da},... + 'measuredNames', {measuredSpecies},... + 'dataUnits', {p.dataUnits},... + 'dimensionLabels', {p.dimensionLabels}, ... + 'dosedNames', {dosedNames},... + 'dosedVals', {dosedVals}, ... + 'doseUnits', {p.doseUnits}); + +end +end + +function da = computeArtificialData(mobj, tv, noisevec, measuredSpecies, ... + nReplicates, dosedVals, dosedNames, activeNames, activeValues) + +configsetObj = getconfigset(mobj, 'active'); +set(configsetObj, 'StopTime', tv(end)); + + +% initialize the data array +da = zeros(length(tv), length(measuredSpecies),... + nReplicates, size(dosedVals, 2)); + +% set parameters and species initial concentrations in the model +for i = 1:length(activeNames) + p1 = sbioselect(mobj.parameters, 'Name', activeNames{i}); + if ~isempty(p1) + set(p1, 'Value', activeValues(i)); + end +end + +for i = 1:length(activeNames) + s1 = sbioselect(mobj.species, 'Name', activeNames{i}); + if ~isempty(s1) + set(s1, 'InitialAmount', activeValues(i)) + end +end + +% set dose values, simulate model, and populate output data array. +for dID = 1:size(dosedVals, 2) + % set the dose value using the mcmc_info struct + for i = 1:length(dosedNames) + s1 = sbioselect(mobj.species, 'Name', dosedNames{i}); + if ~isempty(s1) + set(s1, 'InitialAmount', dosedVals(i, dID)) + end + end + + % simulate the model. + sd = sbiosimulate(mobj); + sd = resample(sd, tv); + for msID = 1:length(measuredSpecies) + currmeasuredSpecie = measuredSpecies{msID}; + spSD = selectbyname(sd, currmeasuredSpecie); + summed_trajectories = sum(spSD.Data, 2); + % add noise if needed. + for rID = 1:nReplicates + da(:, msID, rID, dID) = ... + summed_trajectories + ... + noisevec(msID)*randn(length(tv), 1); + end + end +end + + +end + +function currcellcontents = cellcontents(cellarray, count) +assert(iscell(cellarray)); + +if length(cellarray)>1 + currcellcontents = cellarray{count}; +else + currcellcontents = cellarray{1}; +end + +end + + + diff --git a/mcmc_simbio/src/exportmobj.m b/mcmc_simbio/src/exportmobj.m new file mode 100644 index 0000000..a55baba --- /dev/null +++ b/mcmc_simbio/src/exportmobj.m @@ -0,0 +1,107 @@ +function [da, mi, tv] = exportmobj(mi, data_info, fitOption) +%exportmobj export simbiology model object and update the model info struct +%with the ordering indices and the exported model object. + + + +nTopo = length(mi); +nGeom = zeros(length(mi),1); +for i = 1:nTopo + nGeom(i) = length(mi(i).dataToMapTo); +end + +% V2 data +% for each topology geometry pair, compute the data to fit +% ASSUME that the data dimensions across geometries is the same +% only differs across topologies at most. +% Despite this, we allow for different geometries within a topology +% to point to different data info array elements, just that all of +% these elements must have the same number of timepoints, measured species, +% replicates and dosing combinations. (version 3 of this code can be even +% more general, with each topo-geom pair getting its own cell. this will be +% slower.) + +da = cell(nTopo, 1); +emo = cell(nTopo, 1); +tv = cell(nTopo, 1); +for i = 1:nTopo % each topology + % each of the nGeom(i) geometries has a data info array element it points to. + % since the dimensions of these is assumed to be equal, we just use the first + % one to set the empty array: + data_info_element = mi(i).dataToMapTo(1); + currda = data_info(data_info_element).dataArray; + % Transform Experimental - compute mean or median or nothing + currda = computeFitOption(currda, fitOption); + da{i} = currda; + tv{i} = data_info(data_info_element).timeVector; + for j = 2:nGeom(i) % each geometry + data_info_element = mi(i).dataToMapTo(j); + currda = data_info(data_info_element).dataArray; + % Transform Experimental - compute mean or median or nothing + currda = computeFitOption(currda, fitOption); + % concatenate in the 5th dimension (the geometries dimension.) + da{i} = cat(5, da{i}, currda); + end + % EXPORT MODEL object to get it ready for MCMC + % the resulting object is of class SimBiology.export.Model + % documentation: + % https://www.mathworks.com/help/simbio/ref/simbiology.export.model-class.html + % Sven Mesecke's blog post on using the exported model class for + % parameter inference applicaton. + % http://sveme.org/how-to-use-global-optimization-toolbox-algorithms-for- + % simbiology-parameter-estimation-in-parallel-part-i.html + + mobj = mi(i).modelObj; + + enuo = mi(i).namesUnord;% estimated names unordered + + ep = sbioselect(mobj, 'Type', 'parameter', 'Name', ... + mi(i).namesUnord);% est parameters + + es = sbioselect(mobj, 'Type', 'species', 'Name', ... + mi(i).namesUnord);% est species + + aps = [ep; es]; % active parameters and species + + % reorder the parameter and species so they are in the same order as that + % in the model. + eno = cell(length(aps), 1);% est names ordered + ds = sbioselect(mobj, 'Type', 'species', 'Name', mi(i).dosedNames); + emo{i} = export(mobj, [ep; es; ds]); % exported model object, dosed species names. + SI = emo{i}.SimulationOptions; + + % each of the nGeom(i) geometries has a data info array element it points to. + % since the dimensions of these is assumed to be equal, we just use the first + % one to set the empty array: + data_info_element = mi(i).dataToMapTo(1); + SI.StopTime = data_info(data_info_element).timeVector(end); + accelerate(emo{i}); + + mi(i).emo = emo{i}; % exported model object. + orderingIx = zeros(length(aps),1); + orderingIx2 = orderingIx; + for k = 1:length(aps) + eno{k} = aps(k).Name; + for kk = 1:length(enuo) + if strcmp(eno{k}, enuo{kk} ) + orderingIx(k) = kk; % eno = enuo(orderingIx); + % the kth element of orderingIx is kk. so the kth element of + % enuo(orderingIx) is enuo(kk). But this is just eno(k). And eno + % has the property of the kth element being eno(k). (as seen + % from "if eno{k} == enuo{kk} ") + + orderingIx2(kk) = k; %i.e., enuo = eno(orderingIx2); + % the kkth element of orderingIx2 is k. so the kk th element of + % eno(orderingIx2) is eno(k). But the vector with this property is + % simply enuo. (as seen from "if eno{k} == enuo{kk} ") + end + end + end + + mi(i).orderingIx = orderingIx; % these two arrays will be VERY useful. + mi(i).orderingIx2 = orderingIx2; % this one being the second. + mi(i).namesOrd = eno; % est names ordered. +end + +end + diff --git a/mcmc_simbio/src/gen_residuals.m b/mcmc_simbio/src/gen_residuals.m new file mode 100644 index 0000000..6bfe328 --- /dev/null +++ b/mcmc_simbio/src/gen_residuals.m @@ -0,0 +1,144 @@ +function res = gen_residuals(logp, exportedMdlObj, data_array, timevec, ... + dosedInitVals, measuredSpecies, varargin) +% generate residuals for MCMC +% dosedInitVals needs to be a nDoseCombinations x nSpeciesToDose matrix. +% the concentration of this matrix needs some work from the dosedArray struct. +% logp should be a column vector. in exportedMdlObj, we already have the things +% that can be varied to be just the parameters to be estimated and the species that can be dosed, in that order. +% measuresSpecies is a cell array of strings. +% (unlike the struct in objFun_multiDoseSpecies.m) +% data_array is a length(timevec) * nDoseCombinations by nMeasuredSPecies array. +% the order of the data is the same as the order of the dosing, and needs to be maintained carefully +% +p = inputParser; +% estimation structure is a matrix that specifies which parameters in logp +% to apply to the individual exported model objects. Ie, which parameters +% in logp get estimated for each (model, data, timevec, dose, measured +% species) tuple. +p.addParameter('multiopt_params', [], @isnumeric); +p.parse(varargin{:}) +p = p.Results; +multiparam = p.multiopt_params; + +if ~iscell(exportedMdlObj) + + meanVals = mean(data_array); + + % meanVals = mean(reshape(data_array, [size(data_array, 1) size(data_array, 3)]),1); + wt = sum(meanVals)./meanVals; %hight mean = lower wt + %!TODO is the mean the right statistic, or is the median or max a better statistic? + relWt = wt/sum(wt); + CONC_temp = zeros(length(timevec)*size(dosedInitVals,1), length(measuredSpecies)); + for i = 1:size(dosedInitVals,1) + + sd = simulate(exportedMdlObj, [exp(logp); dosedInitVals(i,:)']); + sd = resample(sd, timevec); + + spSD = selectbyname(sd, measuredSpecies); + % !TODO :test if i can just feed in the cell of strings. + CONC_temp((i-1)*length(timevec) + 1:(i-1)*length(timevec) +... + length(timevec), :) = spSD.Data; % and then just do this. + end + ExpData = data_array; + if isequal(size(CONC_temp), size(ExpData)) + residuals = repmat(relWt, size(CONC_temp,1),1).*(CONC_temp - ExpData); + res = residuals(:); + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important + else + error('txtl_toolbox:objFun_multiDoseSpecies:incompatibleArrays',... + 'The simulation and experimental data need to be the same sizes.') + end +else + % start combined optimization mode + % check that all the arrays are in the right format. + % I think using cell arrays here is probably very slow. But lets try it + % nonetheless for generality + if ~iscell(data_array) || ~iscell(timevec) || ~iscell(dosedInitVals) ... + || ~iscell(measuredSpecies) + error('txtl_toolbox:genresiduals:inputsIncompatible1',... + 'Cell array expected for data, time vector, dose values, and measured species') + end + assert(isequal(length(exportedMdlObj), length(data_array), ... + length(timevec), length(dosedInitVals), length(measuredSpecies)), ... + 'txtl_toolbox:genresiduals:inputsIncompatible2',... + 'The number of cells in the input arrays must be equal') + + nOpt = length(exportedMdlObj); +% preallocate redidual array for speed + +nres = zeros(nOpt, 1); + + + for kk = 1:nOpt + mS = measuredSpecies{kk}; + tv = timevec{kk}; + nms = length(mS); + nres(kk) = length(tv)*nms; + + end + res = zeros(sum(nres), 1); + + for kk = 1:nOpt + [~, ~, V] = find(multiparam(kk, :)); + lp = logp(V); + da = data_array{kk}; + dIV = dosedInitVals{kk}; + mS = measuredSpecies{kk}; + tv = timevec{kk}; + eMO = exportedMdlObj{kk}; + + meanVals = mean(da); + % meanVals = mean(reshape(data_array, [size(data_array, 1) size(data_array, 3)]),1); + wt = sum(meanVals)./meanVals; %hight mean = lower wt + %!TODO is the mean the right statistic, or is the median or max a better statistic? + relWt = wt/sum(wt); + CONC_temp = zeros(length(tv)*size(dIV,1), length(mS)); + for i = 1:size(dIV,1) + + sd = simulate(eMO, [exp(lp); dIV(i,:)']); + sd = resample(sd, tv); + + spSD = selectbyname(sd, mS); + % !TODO :test if i can just feed in the cell of strings. + CONC_temp((i-1)*length(tv) + 1:(i-1)*length(tv) +length(tv), :) ... + = spSD.Data; % and then just do this. + end + ExpData = da; + if isequal(size(CONC_temp), size(ExpData)) + residuals = repmat(relWt, size(CONC_temp,1),1).*(CONC_temp - ExpData); + residx = (sum(nres(1:(kk-1)))+1):sum(nres(1:kk)); + res(residx) = residuals(:); + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important + else + error('simulation and experimental data must have same sizes') + end + + end + + % I am not sure how to get away from a for loop. + % !TODO: Ask AS, RMM, or Sam about code vectorization. + % in exportedMdlObjArray, we have already set the dosing. + % All that remains to be set is the parameters. + + + + + % if i can somehow have a construct that only takes params as inputs, + % and then just runs the models and subtracts from the data, instead of + % doing the fol loop after the simulations. + % I dont think I can do that. So lets not try. zzz + + % can try interp1 or sd.resample to see which is faster. + % I think deciding which strategy is faster will require some testing. + + % for now lets just use the naive sd.resample method. + % If its abysmally slow, can try to modify this code to have interp1, + % with the measured species indices predetermined. +end + diff --git a/mcmc_simbio/src/gen_residuals2.m b/mcmc_simbio/src/gen_residuals2.m new file mode 100644 index 0000000..8607d7a --- /dev/null +++ b/mcmc_simbio/src/gen_residuals2.m @@ -0,0 +1,63 @@ +function res = gen_residuals2(logp, exportedMdlObj, data_array, timevec, ... + dosedInitVals, measuredSpecies) +% NOT COMPLETE YET +% this is different from gen_residuals in that the data_array here is +% #timepoints X #doses X #measured species. +% +% dosedInitVals needs to be a nDoseCombinations x nSpeciesToDose matrix. +% the concentration of this matrix needs some work from the dosedArray struct. +% +% logp should be a column vector. in exportedMdlObj, we already have the things +% that can be varied to be just the parameters to be estimated and the species that can be dosed, in that order. +% measuresSpecies is a cell array of strings. +% (unlike the struct in objFun_multiDoseSpecies.m) +% +meanVals = mean(mean(data_array,1),2); + +% meanVals = mean(reshape(data_array, [size(data_array, 1) size(data_array, 3)]),1); +wt = sum(meanVals)./meanVals; %hight mean = lower wt +%!TODO is the mean the right statistic, or is the median or max a better statistic? +relWt = wt/sum(wt); +CONC_temp = zeros(length(timevec)*size(dosedInitVals,1), length(measuredSpecies)); +for i = 1:size(dosedInitVals,1) + + sd = simulate(exportedMdlObj, [exp(logp); dosedInitVals(i,:)']); + sd = resample(sd, timevec); + + spSD = selectbyname(sd, measuredSpecies); + % !TODO :test if i can just feed in the cell of strings. + CONC_temp((i-1)*length(timevec) + 1:(i-1)*length(timevec) +length(timevec), :) = spSD.Data; % and then just do this. +end +ExpData = data_array; +if isequal(size(CONC_temp), size(ExpData)) + residuals = repmat(relWt, size(CONC_temp,1),1).*(CONC_temp - ExpData); + res = residuals(:); + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important +else + error('txtl_toolbox:objFun_multiDoseSpecies:incompatibleArrays',... + 'The simulation and experimental data need to be the same sizes.') +end + +% I am not sure how to get away from a for loop. +% !TODO: Ask AS, RMM, or Sam about code vectorization. +% in exportedMdlObjArray, we have already set the dosing. +% All that remains to be set is the parameters. + + + + +% if i can somehow have a construct that only takes params as inputs, +% and then just runs the models and subtracts from the data, instead of +% doing the fol loop after the simulations. +% I dont think I can do that. So lets not try. zzz + +% can try interp1 or sd.resample to see which is faster. +% I think deciding which strategy is faster will require some testing. + +% for now lets just use the naive sd.resample method. +% If its abysmally slow, can try to modify this code to have interp1, +% with the measured species indices predetermined. + diff --git a/mcmc_simbio/src/gen_residuals_3.m b/mcmc_simbio/src/gen_residuals_3.m new file mode 100644 index 0000000..80aba2e --- /dev/null +++ b/mcmc_simbio/src/gen_residuals_3.m @@ -0,0 +1,178 @@ +function res = gen_residuals_3(logp, em, data_array, tv, ... + dv, ms, varargin) +% generate residuals for MCMC. +% - logp is a vector of log transformed parameter (and species) values. +% +% - em is an exported simbiology model object +% +% - da (data array) is a matlab array of numbers with dimensions of size +% #timepoints x #measured variables x #ICs (aka dose combinations) +% +% - tv is a time vector, just a vector of timepoints in seconds +% +% - dv is a matrix of dose vals of size # species to +% dose x # dose combinations. We do not need to specify the names of the +% species to dose because +% that gets done when the exported model object gets made. +% +% - ms is a cell array of subcells of strings. so for example, we have +% {{'species a'}, {'species b', 'species c'}} then the first output of the +% model is the trajectory of species a, and the second output is the sum of +% the trajectores of species b and c. These two outputs will respectively +% correspond to the first column and the second column of the data array +% (for a given dose). Note that the strings 'species x' must correspond to +% species in the model object. +% +% There is also a combined optimization mode that will exist in the future. +% I have not really completed it yet, so ignore that for now. The idea is: +% It is a way to estimate shared parameters across models when I want +% to use different data to estimate sets of parameters that overlap in +% different ways across the data. +% +% Vipul Singhal, CIT 2017 + +p = inputParser; +% estimation structure is a matrix that specifies which parameters in logp +% to apply to the individual exported model objects. Ie, which parameters +% in logp get estimated for each (model, data, timevec, dose, measured +% species) tuple. +p.addParameter('multiopt_params', [], @isnumeric); +p.parse(varargin{:}) +p = p.Results; +multiparam = p.multiopt_params; +summed_trajectories = zeros(length(tv), 1); + +if ~iscell(em) + meanVals = mean(mean(data_array, 1), 3); % a 1 by # measured variables array + wt = sum(meanVals)./meanVals; %hight mean = lower wt + relWt = wt/sum(wt); % note that relWt is a row vector. + + CONC_temp = zeros(length(tv), length(ms), size(dv,2)); + for i = 1:size(dv,2) + sd = simulate(em, [exp(logp); dv(:,i)]); + sd = resample(sd, tv); + for j = 1:length(ms) + measuredspecies = ms{j}; + spSD = selectbyname(sd, measuredspecies); + summed_trajectories = sum(spSD.Data, 2); + CONC_temp(:, j, i) = summed_trajectories; + end + end + + ExpData = data_array; + if isequal(size(CONC_temp), size(ExpData)) % both must be #timepoints x + % # measured species x # dose combinations + relWt_tiled = repmat(relWt, size(CONC_temp,1), 1, size(CONC_temp,3)); + residuals = relWt_tiled.*(CONC_temp - ExpData); + res = residuals(:); +% scaledsimdata = relWt_tiled.*(CONC_temp); +% scaleddata = relWt_tiled.*(ExpData); +% figure +% for ccol = 1:2 +% for rrow = 1:4 +% subplot(4, 2, (rrow-1)*2+ccol) +% plot(tv, CONC_temp(:,ccol, rrow), 'r', tv, scaledsimdata(:,ccol, rrow), 'b'); +% hold on +% plot(tv, ExpData(:,ccol, rrow), ':r', tv, scaleddata(:,ccol, rrow), ':b'); +% hold on +% plot(tv, residuals(:,ccol, rrow), 'k'); +% end +% end + +% figure +% for ccol = 1:2 +% for rrow = 1:4 +% subplot(4, 2, (rrow-1)*2+ccol) +% plot(tv, scaledsimdata(:,ccol, rrow), 'b'); +% hold on +% plot(tv, scaleddata(:,ccol, rrow), ':b'); +% hold on +% plot(tv, residuals(:,ccol, rrow), 'k'); +% end +% end + + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important + else + error('txtl_toolbox:objFun_multiDoseSpecies:incompatibleArrays',... + 'The simulation and experimental data need to be the same sizes.') + end +else + % start combined optimization mode % write this properly later + % check that all the arrays are in the right format. + % I think using cell arrays here is probably very slow. But lets try it + % nonetheless for generality + if ~iscell(data_array) || ~iscell(tv) || ~iscell(dv) ... + || ~iscell(ms) + error('txtl_toolbox:genresiduals:inputsIncompatible1',... + 'Cell array expected for data, time vector, dose values, and measured species') + end + assert(isequal(length(em), length(data_array), ... + length(tv), length(dv), length(ms)), ... + 'txtl_toolbox:genresiduals:inputsIncompatible2',... + 'The number of cells in the input arrays must be equal') + + nOpt = length(em); +% preallocate redidual array for speed + + nres = zeros(nOpt, 1); + + + for kk = 1:nOpt + mS = ms{kk}; + tv = tv{kk}; + nms = length(mS); + nres(kk) = length(tv)*nms; + + end + res = zeros(sum(nres), 1); + + for kk = 1:nOpt + [~, ~, V] = find(multiparam(kk, :)); + lp = logp(V); + da = data_array{kk}; + dIV = dv{kk}; + mS = ms{kk}; + tv = tv{kk}; + eMO = em{kk}; + + meanVals = mean(da); + % meanVals = mean(reshape(data_array, [size(data_array, 1) size(data_array, 3)]),1); + wt = sum(meanVals)./meanVals; %hight mean = lower wt + %!TODO is the mean the right statistic, or is the median or max a better statistic? + relWt = wt/sum(wt); + CONC_temp = zeros(length(tv)*size(dIV,1), length(mS)); + for i = 1:size(dIV,1) + + sd = simulate(eMO, [exp(lp); dIV(i,:)']); + sd = resample(sd, tv); + + spSD = selectbyname(sd, mS); + % !TODO :test if i can just feed in the cell of strings. + CONC_temp((i-1)*length(tv) + 1:(i-1)*length(tv) +length(tv), :) ... + = spSD.Data; % and then just do this. + end + ExpData = da; + if isequal(size(CONC_temp), size(ExpData)) + residuals = repmat(relWt, size(CONC_temp,1),1).*(CONC_temp - ExpData); + residx = (sum(nres(1:(kk-1)))+1):sum(nres(1:kk)); + res(residx) = residuals(:); + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important + else + error('simulation and experimental data must have same sizes') + end + + end + % can try interp1 or sd.resample to see which is faster. + % I think deciding which strategy is faster will require some testing. + + % for now lets just use the naive sd.resample method. + % If its abysmally slow, can try to modify this code to have interp1, + % with the measured species indices predetermined. +end + diff --git a/mcmc_simbio/src/gen_residuals_3_debug.m b/mcmc_simbio/src/gen_residuals_3_debug.m new file mode 100644 index 0000000..4f1e7a2 --- /dev/null +++ b/mcmc_simbio/src/gen_residuals_3_debug.m @@ -0,0 +1,179 @@ +function [res, residuals, scaledsimdata, scaleddata] ... + = gen_residuals_3_debug(logp, em, data_array, tv, ... + dv, ms, varargin) +% generate residuals for MCMC. +% - logp is a vector of log transformed parameter (and species) values. +% +% - em is an exported simbiology model object +% +% - da (data array) is a matlab array of numbers with dimensions of size +% #timepoints x #measured variables x #ICs (aka dose combinations) +% +% - tv is a time vector, just a vector of timepoints in seconds +% +% - dv is a matrix of dose vals of size # species to +% dose x # dose combinations. We do not need to specify the names of the +% species to dose because +% that gets done when the exported model object gets made. +% +% - ms is a cell array of subcells of strings. so for example, we have +% {{'species a'}, {'species b', 'species c'}} then the first output of the +% model is the trajectory of species a, and the second output is the sum of +% the trajectores of species b and c. These two outputs will respectively +% correspond to the first column and the second column of the data array +% (for a given dose). Note that the strings 'species x' must correspond to +% species in the model object. +% +% There is also a combined optimization mode that will exist in the future. +% I have not really completed it yet, so ignore that for now. The idea is: +% It is a way to estimate shared parameters across models when I want +% to use different data to estimate sets of parameters that overlap in +% different ways across the data. +% +% Vipul Singhal, CIT 2017 + +p = inputParser; +% estimation structure is a matrix that specifies which parameters in logp +% to apply to the individual exported model objects. Ie, which parameters +% in logp get estimated for each (model, data, timevec, dose, measured +% species) tuple. +p.addParameter('multiopt_params', [], @isnumeric); +p.parse(varargin{:}) +p = p.Results; +multiparam = p.multiopt_params; +summed_trajectories = zeros(length(tv), 1); + +if ~iscell(em) + meanVals = mean(mean(data_array, 1), 3); % a 1 by # measured variables array + wt = sum(meanVals)./meanVals; %hight mean = lower wt + relWt = wt/sum(wt); % note that relWt is a row vector. + + CONC_temp = zeros(length(tv), length(ms), size(dv,2)); + for i = 1:size(dv,2) + sd = simulate(em, [exp(logp); dv(:,i)]); + sd = resample(sd, tv); + for j = 1:length(ms) + measuredspecies = ms{j}; + spSD = selectbyname(sd, measuredspecies); + summed_trajectories = sum(spSD.Data, 2); + CONC_temp(:, j, i) = summed_trajectories; + end + end + + ExpData = data_array; + if isequal(size(CONC_temp), size(ExpData)) % both must be #timepoints x + % # measured species x # dose combinations + relWt_tiled = repmat(relWt, size(CONC_temp,1), 1, size(CONC_temp,3)); + residuals = relWt_tiled.*(CONC_temp - ExpData); + res = residuals(:); + scaledsimdata = relWt_tiled.*(CONC_temp); + scaleddata = relWt_tiled.*(ExpData); +% figure +% for ccol = 1:2 +% for rrow = 1:4 +% subplot(4, 2, (rrow-1)*2+ccol) +% plot(tv, CONC_temp(:,ccol, rrow), 'r', tv, scaledsimdata(:,ccol, rrow), 'b'); +% hold on +% plot(tv, ExpData(:,ccol, rrow), ':r', tv, scaleddata(:,ccol, rrow), ':b'); +% hold on +% plot(tv, residuals(:,ccol, rrow), 'k'); +% end +% end + +% figure +% for ccol = 1:2 +% for rrow = 1:4 +% subplot(4, 2, (rrow-1)*2+ccol) +% plot(tv, scaledsimdata(:,ccol, rrow), 'b'); +% hold on +% plot(tv, scaleddata(:,ccol, rrow), ':b'); +% hold on +% plot(tv, residuals(:,ccol, rrow), 'k'); +% end +% end + + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important + else + error('txtl_toolbox:objFun_multiDoseSpecies:incompatibleArrays',... + 'The simulation and experimental data need to be the same sizes.') + end +else + % start combined optimization mode % write this properly later + % check that all the arrays are in the right format. + % I think using cell arrays here is probably very slow. But lets try it + % nonetheless for generality + if ~iscell(data_array) || ~iscell(tv) || ~iscell(dv) ... + || ~iscell(ms) + error('txtl_toolbox:genresiduals:inputsIncompatible1',... + 'Cell array expected for data, time vector, dose values, and measured species') + end + assert(isequal(length(em), length(data_array), ... + length(tv), length(dv), length(ms)), ... + 'txtl_toolbox:genresiduals:inputsIncompatible2',... + 'The number of cells in the input arrays must be equal') + + nOpt = length(em); +% preallocate redidual array for speed + + nres = zeros(nOpt, 1); + + + for kk = 1:nOpt + mS = ms{kk}; + tv = tv{kk}; + nms = length(mS); + nres(kk) = length(tv)*nms; + + end + res = zeros(sum(nres), 1); + + for kk = 1:nOpt + [~, ~, V] = find(multiparam(kk, :)); + lp = logp(V); + da = data_array{kk}; + dIV = dv{kk}; + mS = ms{kk}; + tv = tv{kk}; + eMO = em{kk}; + + meanVals = mean(da); + % meanVals = mean(reshape(data_array, [size(data_array, 1) size(data_array, 3)]),1); + wt = sum(meanVals)./meanVals; %hight mean = lower wt + %!TODO is the mean the right statistic, or is the median or max a better statistic? + relWt = wt/sum(wt); + CONC_temp = zeros(length(tv)*size(dIV,1), length(mS)); + for i = 1:size(dIV,1) + + sd = simulate(eMO, [exp(lp); dIV(i,:)']); + sd = resample(sd, tv); + + spSD = selectbyname(sd, mS); + % !TODO :test if i can just feed in the cell of strings. + CONC_temp((i-1)*length(tv) + 1:(i-1)*length(tv) +length(tv), :) ... + = spSD.Data; % and then just do this. + end + ExpData = da; + if isequal(size(CONC_temp), size(ExpData)) + residuals = repmat(relWt, size(CONC_temp,1),1).*(CONC_temp - ExpData); + residx = (sum(nres(1:(kk-1)))+1):sum(nres(1:kk)); + res(residx) = residuals(:); + % scale the residuals by relative weight. THis emphasizes + % the species which are low conc, + % and deemphasizes species which are high in magnitude + % to make them all equally important + else + error('simulation and experimental data must have same sizes') + end + + end + % can try interp1 or sd.resample to see which is faster. + % I think deciding which strategy is faster will require some testing. + + % for now lets just use the naive sd.resample method. + % If its abysmally slow, can try to modify this code to have interp1, + % with the measured species indices predetermined. +end + diff --git a/mcmc_simbio/src/gen_residuals_4.m b/mcmc_simbio/src/gen_residuals_4.m new file mode 100644 index 0000000..aeb8006 --- /dev/null +++ b/mcmc_simbio/src/gen_residuals_4.m @@ -0,0 +1,82 @@ +function llike = gen_residuals_4(logp, em, data_array, tv, ... + dv, ms, logresvec, stdev) +%{ +% This code is an intermediary between existing code (gen_residuals_3) and +% the future code that will be the final version. Basically here I +% prototype the capability for the computation of the log likelihood a bit. +% In particular I try to reduce the size of the matrices that must be kept +in memory by computing the log likelihood in parts. +% +% +% - logp is a vector of log transformed parameter (and species) values. +% +% - em is an exported simbiology model object +% +% - da (data array) is a matlab array of numbers with dimensions: +% dim 1: has the length of tv +% dim 2: species (length is # of measured species) +% dim 3: replicates (#replicates) +% dim 4: dosing / ICs +% +% +% - tv is a time vector, just a vector of timepoints in seconds +% +% - dv is a matrix of dose vals of size # species to +% dose x # dose combinations. We do not need to specify the names of the +% species to dose because +% that gets done when the exported model object gets made. +% +% - ms is a cell array of subcells of strings. so for example, we have +% {{'species a'}, {'species b', 'species c'}} then the first output of the +% model is the trajectory of species a, and the second output is the sum of +% the trajectores of species b and c. These two outputs will respectively +% correspond to the first column and the second column of the data array +% (for a given dose). Note that the strings 'species x' must correspond to +% species in the model object. +% +% There is also a combined optimization mode that I will attempt to build here +% soon The idea is: +% It is a way to estimate shared parameters across models when I want +% to use different data to estimate sets of parameters that overlap in +% different ways across the data. +% +% Vipul Singhal, CIT 2017 + %} + + meanVals = mean(mean(mean(data_array, 1), 3), 4); + % a 1 by # measured variables array + wt = sum(meanVals)./meanVals; %hight mean = lower wt + relWt = wt/sum(wt); % note that relWt is a row vector. + + CONC_temp = zeros(length(tv), length(ms), 1, size(dv,2)); + llike = 0; + for i = 1:size(dv,2) + sd = simulate(em, [exp(logp); dv(:,i)]); + sd = resample(sd, tv); + for j = 1:length(ms) + % COMPUTE THE SIMULATED TRAJECTORY + % each set of measures species to sum - can remove the loop + % if each species is individual.. in the main version have a + % different mode + measuredspecies = ms{j}; + spSD = selectbyname(sd, measuredspecies); + summed_trajectories = sum(spSD.Data, 2); + CONC_temp(:, j,1, i) = summed_trajectories; + + % COMPUTE THE RESIDUAL - can use repmat here because the + % matrices are probably not big enough to slow the code down. + % On the other hand, the for loop to do the replicates might + % actually slow the code down. + relWt_tiled = repmat(relWt(1,j), size(CONC_temp,1), 1, size(data_array,3)); + replicatedsimdata = repmat(CONC_temp(:, j,1, i), [1, 1, size(data_array, 3)]); + residuals = relWt_tiled.*(replicatedsimdata - data_array(:, j, :, i)); + res = residuals(:); + llike = llike + sum(logresvec(res, stdev)); + end + end + +end + + + + diff --git a/mcmc_simbio/src/gen_residuals_5.m b/mcmc_simbio/src/gen_residuals_5.m new file mode 100644 index 0000000..0cb1477 --- /dev/null +++ b/mcmc_simbio/src/gen_residuals_5.m @@ -0,0 +1,130 @@ +function llike = gen_residuals_5(logp, em, data_array, tv, ... + dv, ms, logresvec, stdev, parametermap) + +%{ + % Actually the simplest thing I can do rigth now is just shared CSPs. + % this pretty much follows the function log_likelihood_sharedCSP.m + % so 1 model topo. and two geometries (extracts)... or actually any number + % of geometries. and a spec for which + params are the CSPs, and which are the extract specific parameters and + extract specific species. + % + + THIS VERSION OF THE CODE IS THE FIRST CUT AT BUILDING MODELS WHERE + PARAMETERS ARE SHARED ACROSS GEOMETRIES AND TOPOLOGIES +% This version of the code is not the one where we have fully general paraneter sharing +across topologies and geometries (that comes later, and I suspect will be +slower). Here we simply have up to 2 topologies (calibration and test) and +2 geometries (before going to the full arbitrary sharing, we will +generalize this to arbitrary # of topos and geos, but still in the crossed +calib-corr method. + + +% in the more general code: em is an array of exported model objects. Here +we just have em1 and em2 for the two topologies. +% +% I will start with considering the following modes: + x0b, x2b, x0, +% +% - logp is a vector of log transformed parameter (and species) values. +% +% - em is an exported simbiology model object +% +% - da (data array) is a matlab array of numbers with dimensions: +% dim 1: has the length of tv +% dim 2: species (length is # of measured species) +% dim 3: replicates (#replicates) +% dim 4: dosing / ICs +% dim 5: extracts (geometries) +% +% +% - tv is a time vector, just a vector of timepoints in seconds +% +% - dv is a matrix of dose vals of size # species to +% dose x # dose combinations. We do not need to specify the names of the +% species to dose because +% that gets done when the exported model object gets made. +% +% - ms is a cell array of subcells of strings. so for example, we have +% {{'species a'}, {'species b', 'species c'}} then the first output of the +% model is the trajectory of species a, and the second output is the sum of +% the trajectores of species b and c. These two outputs will respectively +% correspond to the first column and the second column of the data array +% (for a given dose). Note that the strings 'species x' must correspond to +% species in the model object. +% +% There is also a combined optimization mode that I will attempt to build here +% soon The idea is: +% It is a way to estimate shared parameters across models when I want +% to use different data to estimate sets of parameters that overlap in +% different ways across the data. +% +% Vipul Singhal, CIT 2017 + %} + + + + espIX = parametermap{1}; + cspIX = parametermap{2}; + nESP = length(espIX); % the ESP indices in the model (not in logpjoint) + nCSP = length(cspIX); % the CSP indices in the model (not in logpjoint) + + + nEnv = size(data_array, 5); + meanVals = mean(mean(mean(mean(data_array, 1), 3), 4), 5); + % a 1 by # measured variables array + wt = sum(meanVals)./meanVals; %hight mean = lower wt + relWt = wt/sum(wt); % note that relWt is a row vector. + + CONC_temp = zeros(length(tv), length(ms)); + % dont need the other dimensions! remove from gen_residuals_4 too. + + + paramvec = zeros(nESP+nCSP,1); + cspindices = (nESP*nEnv+1):length(logp); + logpcsp = logp(cspindices); + paramvec(cspIX) = logpcsp; + + + llike = 0; + for envID = 1:nEnv + % pick out the relevant parameters from the joint parameter vector + espindices = (envID-1)*nESP + (1:nESP); + logpesp = logp(espindices); + paramvec(espIX) = logpesp; + + + % set the vector of parameters and species that get estimated (ie + % non dosing values to simulate with) + + for i = 1:size(dv,2) + sd = simulate(em, [exp(paramvec); dv(:,i)]); + sd = resample(sd, tv); + for j = 1:length(ms) + % COMPUTE THE SIMULATED TRAJECTORY + % each set of measures species to sum - can remove the loop + % if each species is individual.. in the main version have a + % different mode + measuredspecies = ms{j}; + spSD = selectbyname(sd, measuredspecies); + summed_trajectories = sum(spSD.Data, 2); + CONC_temp(:, j) = summed_trajectories; + + % COMPUTE THE RESIDUAL - can use repmat here because the + % matrices are probably not big enough to slow the code down. + % On the other hand, the for loop to do the replicates might + % actually slow the code down. + relWt_tiled = repmat(relWt(1,j), size(CONC_temp,1), 1, size(data_array,3)); + replicatedsimdata = repmat(CONC_temp(:, j), [1, 1, size(data_array, 3)]); + residuals = relWt_tiled.*(replicatedsimdata - data_array(:, j, :, i,envID)); + res = residuals(:); + llike = llike + sum(logresvec(res, stdev)); + end + end + end + +end + + + + diff --git a/mcmc_simbio/src/gen_residuals_v2.m b/mcmc_simbio/src/gen_residuals_v2.m new file mode 100644 index 0000000..0f6339e --- /dev/null +++ b/mcmc_simbio/src/gen_residuals_v2.m @@ -0,0 +1,160 @@ +function llike = gen_residuals_v2(logp, estParamIx, fixedMasterVec, data_array,... + timeVec, mi, logresvec, stdev) +% This code is for computing the log likelihood with parameters spread out over +% multiple models (network topologies) - geometries. +%{ OLD DOCUMENTATION from gen_residuals_4 +% This code is an intermediary between existing code (gen_residuals_3) and +% the future code that will be the final version. Basically here I +% prototype the capability for the computation of the log likelihood a bit. +% In particular I try to reduce the size of the matrices that must be kept +% in memory by computing the log likelihood in parts. +% +% +% - logp is a vector of log transformed parameter (and species) values. +% +% - em is an exported simbiology model object +% +% - da (data array) is a matlab array of numbers with dimensions: +% dim 1: has the length of tv +% dim 2: species (length is # of measured species) +% dim 3: replicates (#replicates) +% dim 4: dosing / ICs +% +% +% - tv is a time vector, just a vector of timepoints in seconds +% +% - dv is a matrix of dose vals of size # species to +% dose x # dose combinations. We do not need to specify the names of the +% species to dose because +% that gets done when the exported model object gets made. +% +% - mspecies is a cell array of subcells of strings. so for example, we have +% {{'species a'}, {'species b', 'species c'}} then the first output of the +% model is the trajectory of species a, and the second output is the sum of +% the trajectores of species b and c. These two outputs will respectively +% correspond to the first column and the second column of the data array +% (for a given dose). Note that the strings 'species x' must correspond to +% species in the model object. +% +% There is also a combined optimization mode that I will attempt to build here +% soon The idea is: +% It is a way to estimate shared parameters across models when I want +% to use different data to estimate sets of parameters that overlap in +% different ways across the data. +% +% Vipul Singhal, CIT 2017 + %} + + % the unpacking happens in steps. (quite similar to integrableLHS_v2) + +fixedMasterVec(estParamIx) = logp; +fullMasterVec = fixedMasterVec; + + + + + + + llike = 0; + + for kk = 1:length(mi) % for each topo + if isfield(mi(kk), 'experimentWeighting') + if ~isempty(mi(kk).experimentWeighting) + % The relative importance of this topology + topoWeight = mi(kk).experimentWeighting; % a scalar number + else + topoWeight = 1; + end + else + topoWeight = 1; + end + + + pmaps = mi(kk).paramMaps; + + % ds = struct('names', {mi(kk).dosednames},... + % 'dosematrix', mi(kk).dosedvals); + + dv = mi(kk).dosedVals; % can doses be reordered in the export process? +% This is very important to check. + if isfield(mi(kk), 'doseWeighting') + % the dose weighting muse have size 1 by number of dose + % combintations. + if isequal(size(mi(kk).doseWeighting), [1 size(dv,2)]) + + % The ralative importance of this topology is given by + doseWeight = mi(kk).doseWeighting; + else + % otherwise, all the doses are equally weighted. + doseWeight = ones(1,size(dv,2)); + end + else + doseWeight = ones(1,size(dv,2)); + end + + em = mi(kk).emo; + mspecies = mi(kk).measuredSpecies; + % for each geom + for hh = 1:size(pmaps, 2) + + % pmaps is in the order defined by the unordered list of names + % in each model info. we need to reorder these indices. + pIX_tg = pmaps(mi(kk).orderingIx, hh); + % THIS REORDERING STEP IS VERY IMPORTANT. + + pvec_tg = fullMasterVec(pIX_tg); + da = data_array{kk}(:, :, :, :, hh); + tv = timeVec{kk}; + + meanVals = mean(mean(mean(da, 1), 3), 4); + % a 1 by # measured variables array + wt = sum(meanVals)./meanVals; %hight mean = lower wt + relWt = wt/sum(wt); % note that relWt is a row vector. + + CONC_temp = zeros(length(tv), length(mspecies), 1, size(dv,2)); + + for ii = 1:size(dv,2) + + % pvec_tg needs to be in the ordered state, ie, + % mi(kk).namesOrd. + + %try + sd = simulate(em, [exp(pvec_tg); dv(:,ii)]); + %catch ME + % disp(ME.identifier); + %end + + sd = resample(sd, tv); + for jj = 1:length(mspecies) + % COMPUTE THE SIMULATED TRAJECTORY + % each set of measures species to sum - can remove the loop + % if each species is individual.. in the main version have a + % different mode + measuredspecies = mspecies{jj}; + spSD = selectbyname(sd, measuredspecies); + summed_trajectories = sum(spSD.Data, 2); + CONC_temp(:, jj,1, ii) = summed_trajectories; + + % COMPUTE THE RESIDUAL - can use repmat here because the + % matrices are probably not big enough to slow the code down. + % On the other hand, the for loop to do the replicates might + % actually slow the code down. + relWt_tiled = repmat(relWt(1,jj), size(CONC_temp,1), 1, size(da,3)); + replicatedsimdata = repmat(CONC_temp(:, jj,1, ii), [1, 1, size(da, 3)]); + residuals = relWt_tiled.*(replicatedsimdata - da(:, jj, :, ii)); + + % multiply the residuals with the topology's relative + % importance, and the dose's relative importance. + res = topoWeight*doseWeight(ii)*residuals(:); + llike = llike + sum(logresvec(res, stdev)); + end + end + end + end + + +end + + + + diff --git a/mcmc_simbio/src/generateStandardPlots.m b/mcmc_simbio/src/generateStandardPlots.m new file mode 100644 index 0000000..0c7f114 --- /dev/null +++ b/mcmc_simbio/src/generateStandardPlots.m @@ -0,0 +1,55 @@ +function mcat_l10 = generateStandardPlots(datafiles, titlestr, legends, plotmode) +% datafiles need to be from the same batch, so that things like +% {'simulatedDataMatrix', 'dosedInitVals','measuredSpecies',... +% 'exportedMdlObj','tspan'}; +% are compatible + + +if strcmp(plotmode, 'txtl') + vars = {'simulatedDataMatrix', 'dosedInitVals','measuredSpecies',... + 'exportedMdlObj','tspan'}; + load(datafiles{1}, vars{:}); + mcat = catMC(datafiles); + mcat_l10 = mcat/log(10); + plotEstimTraces003(mcat,exportedMdlObj,tspan, ... + simulatedDataMatrix, dosedInitVals,... + measuredSpecies);% , 'paramID', [1 2 3] +elseif strcmp(plotmode, 'simplemodel') + + mcat = catMC(datafiles); + mcat_l10 = mcat/log(10); +end + +clear mcat + +plotChains(mcat_l10, 100, legends); +suptitle(titlestr) + +figure +[C,lags,ESS]=eacorr(mcat_l10); +plot(lags,C,'.-',lags([1 end]),[0 0],'k'); +grid on +xlabel('lags') +ylabel('autocorrelation'); +text(lags(end),0,sprintf('Effective Sample Size (ESS): %.0f_ ',ceil(mean(ESS))),... + 'verticalalignment','bottom','horizontalalignment','right') +suptitle(titlestr) +% +figure; +ecornerplot_vse(mcat_l10,... + 'ess', 30,'ks',true, 'color',[.6 .35 .3], ... + 'names', legends, 'fontsize', 16) +% suptitle(titlestr) + +figure; +ecornerplot_vse(mcat_l10,... + 'scatter', true,'transparency',0.01, 'color',[.6 .35 .3], ... + 'names', legends); +suptitle(titlestr); + + + + + +end + diff --git a/mcmc_simbio/src/gwmcmc/.gitattributes b/mcmc_simbio/src/gwmcmc/.gitattributes new file mode 100755 index 0000000..bdb0cab --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/.gitattributes @@ -0,0 +1,17 @@ +# Auto detect text files and perform LF normalization +* text=auto + +# Custom for Visual Studio +*.cs diff=csharp + +# Standard to msysgit +*.doc diff=astextplain +*.DOC diff=astextplain +*.docx diff=astextplain +*.DOCX diff=astextplain +*.dot diff=astextplain +*.DOT diff=astextplain +*.pdf diff=astextplain +*.PDF diff=astextplain +*.rtf diff=astextplain +*.RTF diff=astextplain diff --git a/mcmc_simbio/src/gwmcmc/.gitignore b/mcmc_simbio/src/gwmcmc/.gitignore new file mode 100755 index 0000000..958969c --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/.gitignore @@ -0,0 +1,94 @@ + + +repoexclude/ +demos/*.zip +demos/*/ + + +# Windows image file caches +Thumbs.db +ehthumbs.db + +# Folder config file +Desktop.ini + +# Recycle Bin used on file shares +$RECYCLE.BIN/ + +# Windows Installer files +*.cab +*.msi +*.msm +*.msp + + +# ========================= +# Operating System Files +# ========================= + +# OSX +# ========================= + +.DS_Store +.AppleDouble +.LSOverride + +# Icon must ends with two \r. +Icon + + +# Thumbnails +._* + +# Files that might appear on external disk +.Spotlight-V100 +.Trashes + +# +# List of files that should be excluded from the public repository. +# + +#this is for preparing the example data +prepare*.m + +#Standard exclude files for matlab: +*.asv +*.m~ +*.mex* +slprj/ + + + +#standard gitignore rules. .. Some not relevant for this particular project but excluded anyway: +#compiled +*.com +*.pyc +*.class +*.dll +*.exe +*.o +*.so + +#compressed files +*.zip +*.7z +*.dmg +*.gz +*.iso +*.jar +*.rar +*.tar + +#logs & dbs +*.log +*.sql +*.sqllite + +#OS generated files +.DS_Store +.DS_Store? +*._ +.Spotlight-V100 +*.Trashes +ehthumbs.db +Thumbs.db diff --git a/mcmc_simbio/src/gwmcmc/README.md b/mcmc_simbio/src/gwmcmc/README.md new file mode 100755 index 0000000..96e0966 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/README.md @@ -0,0 +1,36 @@ + +GWMCMC +======================= + +GWMCMC is an implementation of the Goodman and Weare 2010 Affine +invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +enables bayesian inference. The problem with many traditional MCMC samplers +is that they can have slow convergence for badly scaled problems, and that +it is difficult to optimize the random walk for high-dimensional problems. +This is where the GW-algorithm really excels as it is affine invariant. It +can achieve much better convergence on badly scaled problems. It is much +simpler to get to work straight out of the box, and for that reason it +truly deserves to be called the MCMC hammer. + +![line fitting example ecornerplot](html/ex_linefit_05.png) + +Authors: [Aslak Grinsted](http://www.glaciology.net) + + +## Examples + ++ [Line fitting example](html/ex_linefit.md) ++ [Rosenbrock banana](html/ex_rosenbrockbanana.md) ++ [Be happy](html/ex_behappy.md) ++ [Fitting a trend change model](html/ex_breakfit.md) + + +## Licensing + +The majority of the code is licensed under a very permissive MIT license, but some routines and example data are licensed under other terms. See licensing details in LICENSE.txt and individual files. + + + +## Acknowledgements + +This software has been developed at [Centre for Ice and Climate](http://www.iceandclimate.nbi.ku.dk), Niels Bohr Institute, University of Copenhagen. It is partly inspired by emcee for python, but not modelled after it. diff --git a/mcmc_simbio/src/gwmcmc/animation.m b/mcmc_simbio/src/gwmcmc/animation.m new file mode 100755 index 0000000..2f7ec25 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/animation.m @@ -0,0 +1,83 @@ +n=netcdfobj('C:\Users\Aslak\HugeData\GriddedData\had4_krig_v2_0_0.nc'); + +T=permute(n.vars.temperature_anomaly.value,[2 1 3]); +x=n.vars.longitude.value; +y=n.vars.latitude.value; +[X,Y]=meshgrid(x,y); +up=5; +X=imresize(X,up,'bilinear');Y=imresize(Y,up,'bilinear'); +t=double(n.vars.time.value); +ix=tsM(2) + m=permute(m,[2 3 1]); + else + m=permute(m,[1 3 2]); + end +end + + +M=size(m,1);W=size(m,2);T=size(m,3); +N=W*T; + +lags=(0:T-1)'; +nfft = 2^nextpow2(2*T-1); +C=nan(T,M); +for mix=1:M + c=zeros(T,1); + mm=mean(m(mix,:)); %center data using ensemble average! + for wix=1:W + d=m(mix,wix,:); + d=d(:)-mm; + r=ifft( abs(fft(d,nfft)).^2); + c=c+r(1:T); + end +% C(:,mix)=c./(T-lags); %biased/unbiased +% C(:,mix)=C(:,mix)./C(1,mix); + C(:,mix)=c./c(1); +end +if isreal(m) + C=real(C); +end + +if nargout>2 + ESS=nan(1,size(C,2)); + %we use N/(1+2*sum(ACF)) (eqn 7.11 in DBDA2) + %Here, I assume that the ACF can be approximated with exp(-k/lag) + + for ii=1:size(C,2) + kix=find(C(:,ii)<=0.5,1);%we determine k at the lag where C~=0.5; + if isempty(kix), kix=2; end %use lag1 as fall-back for short chains. TODO:warn? + if (C(kix,ii)<0.05)&&(kix==2), ESS(ii)=N;continue;end %essentially no autocorrelation... + k=-log(C(kix,ii))./lags(kix); + sumACF=1/(exp(k)-1); %http://functions.wolfram.com/ElementaryFunctions/Exp/23/01/0001/ + ESS(ii)=N/(1+2*sumACF); + end +end + +if nargout==0 + plot(lags,C,'.-',lags([1 end]),[0 0],'k'); + grid on + xlabel('lags') + ylabel('autocorrelation'); + clearvars lags C ESS +end diff --git a/mcmc_simbio/src/gwmcmc/ecornerplot.m b/mcmc_simbio/src/gwmcmc/ecornerplot.m new file mode 100755 index 0000000..d907aff --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/ecornerplot.m @@ -0,0 +1,198 @@ +function H=ecornerplot(m,varargin) +%% Corner plot with allowance for effective sample size +% +% ecornerplot(m,[parameter,value]) +% +% INPUTS: +% m: a matrix of values that should be plotted in the corner plot. +% +% +% When m is a 3d matrix (ndims(m)==3), then it is assumed to have the form +% MxWxT as output from GWMCMC, where M is the number of parameters, W is +% the number of walkers, and T is the number of steps in each markov chain. +% +% +% NAMED PARAMETERS: +% range: Restrict visual limits to central [99.5] percentile. +% names: A cell of strings with labels for each parameter in m. +% ks: enable kernel smoothing density instead of histograms [false] +% support: a 2xM matrix with upper and lower limits. +% ess: effective sample size. [default: auto-determine ess using EACORR.] +% - used to adjust bandwidth estimates in kernel density estimates. +% scatter: show scatter plot instead of 2d-kernel density estimate [true if #points<2000]. +% fullmatrix: display upper corner of plotmatrix. [false] +% color: A color-theme for the plot. [.5 .5 .5]. +% grid: show grid. [false]. +% +% +% Notes: The 2d kernel density contours are plotted as the highest density +% contours at intervals of 10%:20%:90% +% +% EXAMPLE: +% mu = [1 -1 1]; C= [.9 .4 .1; .4 .3 .2; .1 .2 1]; +% m=mvnrnd(mu,C,6000); +% m(:,3)=exp(m(:,3)/2); +% ecornerplot(m,'support',[nan nan;nan nan; 0 nan]','ks',true); +% +% Aslak Grinsted 2015 + +if nargin==0 + close all + m=randn(10000,4);%m(:,2)=m(:,2)+100000; + ecornerplot(m,'ks',true,'fullmatrix',false,'grid',true); + error('test mode... ') + return +end + +p = inputParser; +p.addOptional('range',99.5,@isnumeric); +p.addOptional('names',{},@iscellstr); +p.addOptional('ks',false,@islogical); %TODO: allow definition of support? +p.addOptional('support',[]); +p.addOptional('grid',false,@islogical); +p.addOptional('scatter',nan,@islogical); +p.addOptional('fullmatrix',false,@islogical); +p.addOptional('color',[1 1 1]*.5,@(x)all(size(x)==[1 3])) +p.addOptional('ess',[]); +% p.addOptional('truth',[fa],@isnumeric); +p.parse(varargin{:}); +p=p.Results; + + +if (size(m,1)Np + error('Effective Sample Size (ess) must be smaller than number of samples') +end +if M>20 + error('Too many dimensions. You probably don''t want to make that many subplots. ') +end +if isnan(p.scatter) + p.scatter=Np<2000; +end + +p.range=prctile(m,[50+[-1 1]*p.range/2 0 100]); %first 2 +rng=p.range(4,:)-p.range(3,:); +if isempty(p.support),p.support=nan(2,M);end +ix=isnan(p.support(1,:)); p.support(1,ix)=p.range(3,ix)-rng(ix)/4; +ix=isnan(p.support(2,:)); p.support(2,ix)=p.range(4,ix)+rng(ix)/4; + + + + +for ii=length(p.names)+1:M + p.names{ii}=sprintf('m_{%.0f}',ii); +end +% for ii=size(p.truth,2)+1:M +% p.truth(ii,:)=nan; +% end +if p.grid + p.grid='on'; +else + p.grid='off'; +end + +clf +H=nan(M); +for r=1:M + for c=1:max(r,M*p.fullmatrix) + H(r,c)=subaxis(M,M,c,r,'s',0.01,'mb',0.12,'mt',0.05,'ml',0.12,'mr',0.0); + if c==r + if p.ks + [F,X,bw]=ksdensity(m(:,r),'support',p.support(:,r)); %TODO: use ESS + if p.ess0, set(gca,'Ylim',p.range(1:2,r)); end + end + if r==M, xlabel(['^{ }' p.names{c} '_{ }']);end + if (c==1)&(r>1-p.fullmatrix), ylabel(['^{ }' p.names{r} '_{ }']);end + if diff(p.range(1:2,c))>0, set(gca,'Xlim',p.range(1:2,c)'); end + end + +end +h=H(:,2:end);h(isnan(h))=[]; +set(h,'YTickLabel',[]) +h=H(1:M-1,:);h(isnan(h))=[]; +set(h,'XTickLabel',[]) +colormap(bsxfun(@minus,[1 1 1],linspace(0,.7,300)')); + + +%LINK the axes for zooming: +hlink={}; +drawnow +lh=cellfun(@(x)double(x),get(H(~isnan(H)),'Xlabel')); +set(lh,'units','normalized') +set(lh,'position',min(cell2mat(get(lh,'position')))); +hlink{end+1}=linkprop(lh,'position'); +lh=cellfun(@(x)double(x),get(H(~isnan(H)),'Ylabel')); +set(lh,'units','normalized'); +set(lh,'position',min(cell2mat(get(lh,'position')))); +hlink{end+1}=linkprop(lh,'position'); + +for ii=1:M + h=H(:,ii); h(isnan(h))=[]; + set(h,'XLimMode','manual') + hlink{end+1}=linkprop(h,'XLim'); + h=H(ii,1:ii-1); h(isnan(h))=[]; + set(h,'YLimMode','manual') + hlink{end+1}=linkprop(h,{'YLim','YTick'}); +end +setappdata(gcf,'aplotmatrix_linkprop_handles',hlink) + + +if nargout==0 + clearvars H +end diff --git a/mcmc_simbio/src/gwmcmc/ecornerplot_vse.m b/mcmc_simbio/src/gwmcmc/ecornerplot_vse.m new file mode 100644 index 0000000..b66c782 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/ecornerplot_vse.m @@ -0,0 +1,214 @@ +function H=ecornerplot_vse(m,varargin) +%% Corner plot with allowance for effective sample size +% +% ecornerplot(m,[parameter,value]) +% +% INPUTS: +% m: a matrix of values that should be plotted in the corner plot. +% +% +% When m is a 3d matrix (ndims(m)==3), then it is assumed to have the form +% MxWxT as output from GWMCMC, where M is the number of parameters, W is +% the number of walkers, and T is the number of steps in each markov chain. +% +% +% NAMED PARAMETERS: +% range: Restrict visual limits to central [99.5] percentile. +% names: A cell of strings with labels for each parameter in m. +% ks: enable kernel smoothing density instead of histograms [false] +% support: a 2xM matrix with upper and lower limits. +% ess: effective sample size. [default: auto-determine ess using EACORR.] +% - used to adjust bandwidth estimates in kernel density estimates. +% scatter: show scatter plot instead of 2d-kernel density estimate +% [true if #points<2000]. +% fullmatrix: display upper corner of plotmatrix. [false] +% color: A color-theme for the plot. [.5 .5 .5]. +% grid: show grid. [false]. +% +% +% Notes: The 2d kernel density contours are plotted as the highest density +% contours at intervals of 10%:20%:90% +% +% EXAMPLE: +% mu = [1 -1 1]; C= [.9 .4 .1; .4 .3 .2; .1 .2 1]; +% m=mvnrnd(mu,C,6000); +% m(:,3)=exp(m(:,3)/2); +% ecornerplot(m,'support',[nan nan;nan nan; 0 nan]','ks',true); +% +% Aslak Grinsted 2015 + +if nargin==0 + close all + m=randn(10000,4);%m(:,2)=m(:,2)+100000; + ecornerplot_vse(m,'ks',true,'fullmatrix',false,'grid',true); + error('test mode... ') + return +end + +p = inputParser; +p.addOptional('range',99.5,@isnumeric); +p.addOptional('names',{},@iscellstr); +p.addOptional('ks',false,@islogical); %TODO: allow definition of support? +p.addOptional('support',[]); +p.addOptional('grid',false,@islogical); +p.addOptional('scatter',nan,@islogical); +p.addOptional('fullmatrix',false,@islogical); +p.addOptional('color',[1 1 1]*.5,@(x)all(size(x)==[1 3])) +p.addOptional('ess',[]); +p.addOptional('transparency', 0.2, @isnumeric); +p.addOptional('fontsize', 10, @isnumeric); +p.addOptional('nbins', [], @isnumeric); +% p.addOptional('truth',[fa],@isnumeric); +p.parse(varargin{:}); +p=p.Results; + + +if (size(m,1)Np + error('Effective Sample Size (ess) must be smaller than number of samples') +end +if M>25 + error('Too many dimensions. You probably don''t want to make that many subplots. ') +end +if isnan(p.scatter) + p.scatter=Np<2000; +end + +p.range=prctile(m,[50+[-1 1]*p.range/2 0 100]); %first 2 +rng=p.range(4,:)-p.range(3,:); +if isempty(p.support),p.support=nan(2,M);end +ix=isnan(p.support(1,:)); p.support(1,ix)=p.range(3,ix)-rng(ix)/4; +ix=isnan(p.support(2,:)); p.support(2,ix)=p.range(4,ix)+rng(ix)/4; + +for ii=length(p.names)+1:M + p.names{ii}=sprintf('m_{%.0f}',ii); +end +% for ii=size(p.truth,2)+1:M +% p.truth(ii,:)=nan; +% end +if p.grid + p.grid='on'; +else + p.grid='off'; +end +ss = get(0, 'screensize'); +figure +set(gcf, 'Position', [ss(3)*(1-1/1.05) ss(4)*(1-1/1.15) ss(3)/1.05 ss(4)/1.15]); + +% clf +% ff = gcf; +% set(ff, 'Position', [100 100 900 600]) +H=nan(M); +for r=1:M + for c=1:max(r,M*p.fullmatrix) + H(r,c)=subaxis(M,M,c,r,'s',0.01,'mb',0.12,'mt',0.05,'ml',0.12,'mr',0.0); + if c==r + if p.ks + [F,X,bw]=ksdensity(m(:,r),'support',p.support(:,r)); %TODO: use ESS + if p.ess0, set(gca,'Ylim',p.support(1:2,r)); end + %!VSE: changed p.range to p.support + end + if r==M, xlabel(['^{ }' p.names{c} '_{ }'], 'FontSize', p.fontsize);end + if (c==1)&(r>1-p.fullmatrix), ylabel(['^{ }' p.names{r} '_{ }'], 'FontSize', p.fontsize);end + if diff(p.support(1:2,c))>0, set(gca,'Xlim',p.support(1:2,c)'); end + end + +end +h=H(:,2:end);h(isnan(h))=[]; +set(h,'YTickLabel',[]) +h=H(1:M-1,:);h(isnan(h))=[]; +set(h,'XTickLabel',[]) +colormap(bsxfun(@minus,[1 1 1],linspace(0,.7,300)')); + + +%LINK the axes for zooming: +hlink={}; +drawnow +lh=cellfun(@(x)double(x),get(H(~isnan(H)),'Xlabel')); +set(lh,'units','normalized') +set(lh,'position',min(cell2mat(get(lh,'position')))); +hlink{end+1}=linkprop(lh,'position'); +lh=cellfun(@(x)double(x),get(H(~isnan(H)),'Ylabel')); +set(lh,'units','normalized'); +set(lh,'position',min(cell2mat(get(lh,'position')))); +hlink{end+1}=linkprop(lh,'position'); + +for ii=1:M + h=H(:,ii); h(isnan(h))=[]; + set(h,'XLimMode','manual') + hlink{end+1}=linkprop(h,'XLim'); + h=H(ii,1:ii-1); h(isnan(h))=[]; + set(h,'YLimMode','manual') + hlink{end+1}=linkprop(h,{'YLim','YTick'}); +end +setappdata(gcf,'aplotmatrix_linkprop_handles',hlink) + + +if nargout==0 + clearvars H +end diff --git a/mcmc_simbio/src/gwmcmc/ex_behappy.m b/mcmc_simbio/src/gwmcmc/ex_behappy.m new file mode 100755 index 0000000..fc747a8 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/ex_behappy.m @@ -0,0 +1,33 @@ +%% Don't worry, be Happy +% +% Sampling a smiley face likelihood function. +% + +%% Smiley face equation +% Formulate a likelihood function inspired by an equation of a smiley face. +% +% Source Michael Borcherds: https://twitter.com/mike_geogebra/status/135391208703930369 + +logHappiness=@(m)1-exp(1e-4*((m(1)^4+2*m(1)^2*m(2)^2-0.3*m(1)^2*m(2)-40.75*m(1)^2+m(2)^4-m(2)^3-40.75*m(2)^2+25*m(2)+393.75)*((m(1)+3)^2+(m(2)-7)^2-1)*((m(1)-3)^2+(m(2)-7)^2-1)*(m(1)^2+(m(2)-2)^2-64))); + + +%% Draw samples from the distribution using GWMCMC +% +% Now we apply the MCMC hammer to draw samples from the logHappiness distribution. +% + +[models,logP]=gwmcmc(randn(2,100),logHappiness,100000,'ThinChain',2); +models(:,:,1:end*.2)=[]; +models=models(:,:)'; + + +plot(models(:,1),models(:,2),'yo','markerfacecolor',[1 1 0]*.8); + +axis equal off + + +title('GWMCMC says: "Don''t Worry, Be Happy!"'); + + +%% Important links +% Bobby McFerrin on youtube: https://www.youtube.com/watch?v=d-diB65scQU \ No newline at end of file diff --git a/mcmc_simbio/src/gwmcmc/ex_breakfit.m b/mcmc_simbio/src/gwmcmc/ex_breakfit.m new file mode 100755 index 0000000..1fba0f7 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/ex_breakfit.m @@ -0,0 +1,147 @@ +%% Fitting a trend-change model to a time series +% +% This code fits a trend-change model to a historical time series of sea level +% in Amsterdam with gaps. +% + +%% Input data +% +% Amsterdam sea level from this source: http://www.psmsl.org/data/longrecords/ + +Y=[1700 -152; 1701 -158; 1702 -132; 1703 -172; 1704 -135; 1705 -167; 1706 -192; 1707 -153; 1708 -149; 1709 -187; 1710 -168; 1711 -140; 1712 -129; 1713 -151; + 1714 -106; 1715 -172; 1716 -168; 1717 -164; 1718 -185; 1719 -182; 1720 -109; 1721 -146; 1722 -141; 1723 -99; 1724 -145; 1725 -166; 1726 -108; 1727 -136; + 1728 -195; 1729 -176; 1730 -148; 1731 -108; 1732 -134; 1733 -160; 1734 -165; 1735 -181; 1736 -109; 1737 -92; 1738 -152; 1739 -123; 1740 -124; 1741 -122; + 1742 -154; 1743 -144; 1744 -148; 1745 -178; 1746 -178; 1747 -142; 1748 -147; 1749 -167; 1766 -175; 1767 -111; 1768 -160; 1769 -86; 1770 -94; 1771 -87; + 1772 -142; 1773 -143; 1774 -135; 1775 -127; 1776 -150; 1777 -131; 1778 -155; 1779 -131; 1780 -130; 1781 -134; 1782 -160; 1783 -157; 1784 -173; 1785 -178; + 1786 -178; 1787 -125; 1788 -204; 1789 -161; 1790 -109; 1791 -92; 1792 -150; 1793 -154; 1794 -118; 1795 -121; 1796 -157; 1797 -134; 1798 -135; 1799 -177; + 1800 -175; 1801 -90; 1802 -159; 1803 -172; 1804 -130; 1805 -142; 1806 -106; 1807 -105; 1808 -183; 1809 -151; 1810 -128; 1811 -137; 1812 -141; 1813 -150; + 1814 -185; 1815 -144; 1816 -113; 1817 -102; 1818 -160; 1819 -158; 1820 -194; 1821 -123; 1822 -125; 1823 -198; 1824 -97; 1825 -87; 1826 -126; 1827 -97; + 1828 -124; 1829 -119; 1830 -141; 1831 -94; 1832 -141; 1833 -106; 1834 -77; 1835 -105; 1836 -96; 1837 -88; 1838 -117; 1839 -114; 1840 -111; 1841 -85; + 1842 -132; 1843 -57; 1844 -53; 1845 -90; 1846 -80; 1847 -118; 1848 -141; 1849 -101; 1850 -91; 1851 -102; 1852 -97; 1853 -113; 1854 -49; 1855 -111; + 1856 -85; 1857 -145; 1858 -137; 1859 -102; 1860 -113; 1861 -94; 1862 -125; 1863 -121; 1864 -161; 1865 -157; 1866 -93; 1867 -58; 1868 -91; 1869 -75; 1870 -129; + 1871 -141; 1874 -110; 1875 -125; 1876 -80; 1877 -43; 1878 -60; 1879 -79; 1880 -31; 1881 -64; 1882 -74; 1883 -58; 1884 -54; 1885 -75; 1886 -88; 1887 -64; 1888 -86; + 1889 -53; 1890 -84; 1891 -94; 1892 -78; 1893 -67; 1894 -92; 1895 -74; 1896 -81; 1897 -82; 1898 -32; 1899 -36; 1900 -67; 1901 -45; 1902 -62; 1903 -25; 1904 -58; 1905 -32; + 1906 -34; 1907 -75; 1908 -66; 1909 -36; 1910 -12; 1911 -24; 1912 -7; 1913 -22; 1914 0; 1915 7; 1916 -5; 1917 -37; 1918 -44; 1919 -38; 1920 14; 1921 -10; + 1922 -16; 1923 -38;1925 29]; +t=Y(:,1); +Y=Y(:,2); + + +%% Define trend change forward model: + +forwardmodel=@(t,m)(m(1)*(tm(3))).*(t-m(3))+m(4); + +%% Make an initial guess for the model parameters. +p=polyfit(t-mean(t),Y,1); +m0=[p(1) p(1) mean(t) p(2)]'; +sigma=std(Y-forwardmodel(t,m0)); +m0=[m0 ; log(sigma)]; + + + +%% Likelihood +% +% We assume the data are normally distributed around the forward model. +% + +% First we define a helper function equivalent to calling log(normpdf(x,mu,sigma)) +% but has higher precision because it avoids truncation errors associated with calling +% log(exp(xxx)). +lognormpdf=@(x,mu,sigma)-0.5*((x-mu)./sigma).^2 -log(sqrt(2*pi).*sigma); + + +logLike=@(m)sum(lognormpdf(Y,forwardmodel(t,m),exp(m(5)))); + + +%% Prior information +% +% We want to restrict the model to place the kink-point within the observed +% time interval. All other parameters have a uniform prior. +% + +logprior = @(m)(m(3)>min(t))&(m(3)-5)&&(m(1)<0.5) && (m(2)>0)&&(m(2)<10) && (m(3)>-10)&&(m(3)<1) ; + +%% Find the posterior distribution using GWMCMC +% +% Now we apply the MCMC hammer to draw samples from the posterior. +% +% + +% first we initialize the ensemble of walkers in a small gaussian ball +% around the max-likelihood estimate. +minit=bsxfun(@plus,m_maxlike,randn(3,100)*0.01); + +%% Apply the hammer: +% +% Draw samples from the posterior +% +tic +m=gwmcmc(minit,{logprior logLike},100000,'ThinChain',5,'burnin',.2); +toc + + + +%% Auto-correlation function +% + + +figure +[C,lags,ESS]=eacorr(m); +plot(lags,C,'.-',lags([1 end]),[0 0],'k'); +grid on +xlabel('lags') +ylabel('autocorrelation'); +text(lags(end),0,sprintf('Effective Sample Size (ESS): %.0f_ ',ceil(mean(ESS))),'verticalalignment','bottom','horizontalalignment','right') +title('Markov Chain Auto Correlation') + +%% Corner plot of parameters +% + +figure +ecornerplot(m,'ks',true,'color',[.6 .35 .3]) + +%% Plot of posterior fit +% + +figure +m=m(:,:)'; %flatten the chain + +%plot 100 samples... +for kk=1:100 + r=ceil(rand*size(m,1)); + h=plot(x,forwardmodel(m(r,:)),'color',[.6 .35 .3].^.3); + hold on +end +h(2)=errorbar(x,y,yerr,'ks','markerfacecolor',[1 1 1]*.4,'markersize',4); + +h(4)=plot(x,forwardmodel(m_lsq),'b--','linewidth',2); +h(3)=plot(x,forwardmodel(median(m)),'color',[.6 .35 .3],'linewidth',3); +h(5)=plot(x,forwardmodel(m_true),'r','linewidth',2); + +axis tight +legend(h,'Samples from posterior','Data','GWMCMC median','LSQ fit','Truth') diff --git a/mcmc_simbio/src/gwmcmc/ex_rosenbrockbanana.m b/mcmc_simbio/src/gwmcmc/ex_rosenbrockbanana.m new file mode 100755 index 0000000..ba38a26 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/ex_rosenbrockbanana.m @@ -0,0 +1,65 @@ +%% The MCMC hammer +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% See also: +% +% + +%% Rosenbrock: A badly scaled example +% +% A classical difficult low dimensional problem is the rosenbrock density. +% It is defined by the following log-probability function: +% + +logPfun=@(m) -(100*(m(2,:)-m(1,:).^2).^2 +(1-m(1,:)).^2)/20; + +%lets visualize it: +close all +[X,Y]=meshgrid(-4:.01:6,-1:.02:34); +Z=logPfun([X(:) Y(:)]'); Z=reshape(Z,size(X)); +contour(X,Y,exp(Z)) +colormap(parula) +title('The Rosenbrock banana') +xlim([-4 6]) +ylim([-1 34]) + +%% Apply the MCMC hammer: +% +% Now we apply the Goodman & Weare MCMC sampler and plot the results on top +% + +M=2; %number of model parameters +Nwalkers=40; %number of walkers/chains. +minit=randn(M,Nwalkers); +tic +models=gwmcmc(minit, logPfun,100000,'StepSize',30,'burnin',.2); +toc + + +%flatten the chain: analyze all the chains as one + +models=models(:,:); + +%plot the results + +hold on +plot(models(1,:),models(2,:),'k.') + +legend('Rosenbrock','GWMCMC samples','location','northwest') + + + +%% References: +% * Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 65�80 +% * Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% -Aslak Grinsted 2015 diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc.m b/mcmc_simbio/src/gwmcmc/gwmcmc.m new file mode 100755 index 0000000..64aa852 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc.m @@ -0,0 +1,279 @@ +function [models,logP]=gwmcmc(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 6580 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + + +Nwalkers=size(minit,2); + +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions','Check minit dimensions.\nIt is recommended that there be atleast twice as many walkers in the ensemble as there are model dimension.') +end + +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; + +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix + +models(:,:,1)=minit; + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end + +NPfun=numel(logPfuns); + +%calculate logP state initial pos of walkers +logP=nan(NPfun,Nwalkers,Nkeep); +for wix=1:Nwalkers % walker index + for fix=1:NPfun % function index + + v=logPfuns{fix}(minit(:,wix)); + if islogical(v) %reformulate function so that false=-inf for logical constraints. + v=-1/v;logPfuns{fix}=@(m)-1/logPfuns{fix}(m); %experimental implementation of experimental feature + end + logP(fix,wix,1)=v; + + end +end + +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end +reject=zeros(Nwalkers,1); +curm=models(:,:,1); +curlogP=logP(:,:,1); +progress(0,0,0) +totcount=Nwalkers; +for row=1:Nkeep + for jj=1:p.ThinChain + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + for fix=1:NPfun + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); %have tested workerobjwrapper but that is slower. + if lr(fix+1)>proposedlogP(fix)-cp(fix) || ~isreal(proposedlogP(fix)) || isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix) || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc_light.m b/mcmc_simbio/src/gwmcmc/gwmcmc_light.m new file mode 100644 index 0000000..ed7fd9a --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc_light.m @@ -0,0 +1,377 @@ +function [models,logP, rejProp]=gwmcmc_light(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% +% Originally written by Aslak Grinsted. Edited by Vipul Singhal +% This code differs from the original function gwmcmc or my other version +% gwmcmc_vse in the following ways: +% 1. It allows for the possibility of plotting real time progress in terms +% of scatterplots of the point clouds. +% 2. It does not track the non integrable points as the vse version does. +% In doing so, it should use less memory. +% +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 6580 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + + + +Nwalkers=size(minit,2); % MxW, ie, W = Nwalkers +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions','Check minit dimensions.\nIt is recommended that there be atleast twice as many walkers in the ensemble as there are model dimension.') +end +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + + + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; % !todo what is this used for. + +% models is the matrix of parameter samples that make up the posterior +% distribution. M is the number of parameters. Nkeep is the number of +% steps the full walker ensemble takes after thinning. +% The largest arrays should be of this size, and ideally you dont want to +% keep too many of these in memory too. +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix +models(:,:,1)=minit; % set the first walker positions to be according to minit. + + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end +NPfun=numel(logPfuns); +%calculate logP state initial pos of walkers +% logP is the result of the function(s) evaluation(s) for all walkers, at all +% timesteps we choose to keep +logP=nan(NPfun,Nwalkers,Nkeep); + + +% do the evaluation once for the minit points. (for all the walkers) +for wix=1:Nwalkers % walker index + for fix=1:NPfun % function index + try + v=logPfuns{fix}(minit(:,wix)); + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + v = -Inf; + else + error('there was an unknown error'); % need this, otherwise if there is an unknown error, the catch statement does not report it + %and the code continues to execute. + end + end + if islogical(v) + + + %reformulate function so that false=-inf for logical constraints. + % what happens here is that the first time the logical function + % is called, the value is converted to a -Inf. But more than + % that, even the function is changed from something logical to + % something that returns -Inf every time the conditions are not + % met. very cool. + v=-1/v;logPfuns{fix}=@(m)-1/logPfuns{fix}(m); + %experimental implementation of experimental feature + end + logP(fix,wix,1)=v; + end +end + +% both the priors and the likelihood should be finite at the intial points. +% Ie, the prior must be met and the likelihood must not have tolerance +% errors. +% +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end + + +reject=zeros(Nwalkers,1); +rejProp = nan(1,Nkeep); +curm=models(:,:,1); % current model, all walkers +curlogP=logP(:,:,1); +progress(0,0,0) % show progress bar. +totcount=Nwalkers; + +% ! this line gets removed: preallocate matrix to store parameter values that could not get simulated +% non_integrable_m = nan(M, Nwalkers, Nkeep*p.ThinChain); +for row=1:Nkeep + for jj=1:p.ThinChain + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + %VSE: why -1 here? this is what makes the system fail at 1 + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + % logrand is what we compare our computed log likelihood to. + + totalcount = (row-1)*p.ThinChain+jj; + + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + % no idea why only some of the walkers are chosen to be updated. + for fix=1:NPfun + try + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); + %have tested workerobjwrapper but that is slower. + catch ME + if strcmp(ME.identifier, ... + 'DESuite:ODE15S:IntegrationToleranceNotMet') + % non_integrable_m(:,wix,totalcount) = proposedm(:,wix); + proposedlogP(fix) = -Inf; + else + error('there was an unknown error'); + % need this, otherwise if there is an unknown error, the catch statement does not report it + %and the code continues to execute. + + end + end + + if lr(fix+1)>proposedlogP(fix)-cp(fix) ||... + ~isreal(proposedlogP(fix)) ||... + isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix) || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc_par.m b/mcmc_simbio/src/gwmcmc/gwmcmc_par.m new file mode 100644 index 0000000..deab158 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc_par.m @@ -0,0 +1,282 @@ +function [models,logP]=gwmcmc_par(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% Parallelization fixed gwmcmc +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 6580 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 +% -eidts: vs 2016 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + + +Nwalkers=size(minit,2); + +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions','Check minit dimensions.\nIt is recommended that there be atleast twice as many walkers in the ensemble as there are model dimension.') +end + +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; + +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix + +models(:,:,1)=minit; + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end + +NPfun=numel(logPfuns); + +%calculate logP state initial pos of walkers +logP=nan(NPfun,Nwalkers,Nkeep); +for wix=1:Nwalkers + for fix=1:NPfun + v=logPfuns{fix}(minit(:,wix)); + if islogical(v) %reformulate function so that false=-inf for logical constraints. + v=-1/v;logPfuns{fix}=@(m)-1/logPfuns{fix}(m); %experimental implementation of experimental feature + end + logP(fix,wix,1)=v; + end +end + +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end + + +reject=zeros(Nwalkers,1); + + +curm=models(:,:,1); +curlogP=logP(:,:,1); +progress(0,0,0) +totcount=Nwalkers; +for row=1:Nkeep + for jj=1:p.ThinChain + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + for fix=1:NPfun + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); %have tested workerobjwrapper but that is slower. + if lr(fix+1)>proposedlogP(fix)-cp(fix) || ~isreal(proposedlogP(fix)) || isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix) || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc_vse.m b/mcmc_simbio/src/gwmcmc/gwmcmc_vse.m new file mode 100644 index 0000000..4186ff3 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc_vse.m @@ -0,0 +1,373 @@ +function [models,logP, rejProp]=gwmcmc_vse(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. +% App. Math. Comp. Sci., Vol. 5, No. 1, 65�80 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); + %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); + %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + +Nwalkers=size(minit,2); +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions',... + ['Check minit dimensions.\nIt is recommended that there be atleast'... + ' twice as many walkers in the ensemble as there are model dimension.']) +end +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix +models(:,:,1)=minit; + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end +NPfun=numel(logPfuns); +%calculate logP state initial pos of walkers +% logP is the result of the function(s) evaluation(s) for all walkers, at all +% timesteps we choose to keep +logP=nan(NPfun,Nwalkers,Nkeep); + +disp(['Parallel compute of whether initial positions meet limit criteria using ' num2str(Nwalkers) ' walkers.']); +fix = 1; +tic +currfun = logPfuns{fix}; +parfor wix=1:Nwalkers % walker index + %for fix=1:NPfun % function index + + try + valv=currfun(minit(:,wix)); + if valv + logP(fix,wix,1)= -1; + else + logP(fix,wix,1)= -Inf; + end + + + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + logP(fix,wix,1) = -Inf; % could potentially change this to something like -50, but then + % if here integration tolerances are not met, an error is + % thrown. So we NEED integration tolerances to be met + % initially. This is because of the way things are compared + % in the future iterations, and accepted and rejected. + % Should be -Inf, but I have changed it to -20. in the + % later iterations, keep it at -Inf, so that any future + % integration tol errors also get rejected. + end + end +% if islogical(valv) +% %reformulate function so that false=-inf for logical constraints. +% v=-1/valv;currfun=@(m)-1/currfun(m); +% %experimental implementation of experimental feature +% end +% logP(fix,wix,1)=v; + + %end +end +toc +tic +disp(['Parallel compute of initial position residuals using ' num2str(Nwalkers) ' walkers.']); +fix = 2; +currfun = logPfuns{fix}; +parfor wix=1:Nwalkers % walker index + %for fix=1:NPfun % function index + try + + logP(fix,wix,1)=currfun(minit(:,wix)); + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + logP(fix,wix,1) = -Inf; % could potentially change this to something like -50, but then + % if here integration tolerances are not met, an error is + % thrown. So we NEED integration tolerances to be met + % initially. This is because of the way things are compared + % in the future iterations, and accepted and rejected. + % Should be -Inf, but I have changed it to -20. in the + % later iterations, keep it at -Inf, so that any future + % integration tol errors also get rejected. + end + end + + + %end +end +toc +disp(['End of initial parallel computations, with exit code ' ... + num2str(~all(all(isfinite(logP(:,:,1)))))]); + +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end +reject=zeros(Nwalkers,1); +rejProp = nan(1,Nkeep); +curm=models(:,:,1); % current model, all walkers +curlogP=logP(:,:,1); +progress(0,0,0) +totcount=Nwalkers; +% preallocate matrix to store parameter values that could not get simulated +% non_integrable_m = nan(M, Nwalkers, Nkeep*p.ThinChain); +for row=1:Nkeep + disp(['Step ' num2str(row) ' of ' num2str(Nkeep) '.']) + + for jj=1:p.ThinChain + + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + %VSE: whu -1 here? this is what makes the system fail at 1 + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + % logrand is what we compare our computed log likelihood to. + totalcount = (row-1)*p.ThinChain+jj; + + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + for fix=1:NPfun + try + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); + %have tested workerobjwrapper but that is slower. + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') +% non_integrable_m(:,wix,totalcount) = proposedm(:,wix); + proposedlogP(fix) = -Inf; + end + end + + if lr(fix+1)>proposedlogP(fix)-cp(fix) ||... + ~isreal(proposedlogP(fix)) || isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix)... + || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'�') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',... + pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc_vse_deterministic.m b/mcmc_simbio/src/gwmcmc/gwmcmc_vse_deterministic.m new file mode 100644 index 0000000..129d0cd --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc_vse_deterministic.m @@ -0,0 +1,314 @@ +function [models,logP, non_integrable_m, rejProp]=gwmcmc_vse_deterministic(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 6580 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + +Nwalkers=size(minit,2); +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions','Check minit dimensions.\nIt is recommended that there be atleast twice as many walkers in the ensemble as there are model dimension.') +end +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix +models(:,:,1)=minit; + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end +NPfun=numel(logPfuns); +%calculate logP state initial pos of walkers +% logP is the result of the function(s) evaluation(s) for all walkers, at all +% timesteps we choose to keep +logP=nan(NPfun,Nwalkers,Nkeep); + +for wix=1:Nwalkers % walker index + for fix=1:NPfun % function index + try + v=logPfuns{fix}(minit(:,wix)); + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + v = -Inf; + % could potentially change this to something like -50, but then + % if here integration tolerances are not met, an error is + % thrown. So we NEED integration tolerances to be met + % initially. This is because of the way things are compared + % in the future iterations, and accepted and rejected. + % Should be -Inf, but I have changed it to -20. in the + % later iterations, keep it at -Inf, so that any future + % integration tol errors also get rejected. + end + end + if islogical(v) + %reformulate function so that false=-inf for logical constraints. + v=-1/v;logPfuns{fix}=@(m)-1/logPfuns{fix}(m); + %experimental implementation of experimental feature + end + logP(fix,wix,1)=v; + end +end + +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end +reject=zeros(Nwalkers,1); +rejProp = nan(1,Nkeep); +curm=models(:,:,1); % current model, all walkers +curlogP=logP(:,:,1); +progress(0,0,0) +totcount=Nwalkers; +% preallocate matrix to store parameter values that could not get simulated +non_integrable_m = nan(M, Nwalkers, Nkeep*p.ThinChain); +for row=1:Nkeep + for jj=1:p.ThinChain + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + %VSE: whu -1 here? this is what makes the system fail at 1 + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + % logrand is what we compare our computed log likelihood to. + totalcount = (row-1)*p.ThinChain+jj; + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + for fix=1:NPfun + try + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); %have tested workerobjwrapper but that is slower. + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + non_integrable_m(:,wix,totalcount) = proposedm(:,wix); + proposedlogP(fix) = -Inf; + end + end + + if lr(fix+1)>proposedlogP(fix)-cp(fix) || ~isreal(proposedlogP(fix)) || isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix) || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc_vse_diagnosticmode.m b/mcmc_simbio/src/gwmcmc/gwmcmc_vse_diagnosticmode.m new file mode 100644 index 0000000..6c1e369 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc_vse_diagnosticmode.m @@ -0,0 +1,316 @@ +function [models,logP, non_integrable_m, rejProp]=gwmcmc_vse_diagnosticmode(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% % this is just a copy of the gwmcmc_vse, and contains the non integrable +% points array. Going forward, I am removing that array from the +% computation so reduce the space overhead. +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 6580 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + +Nwalkers=size(minit,2); +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions','Check minit dimensions.\nIt is recommended that there be atleast twice as many walkers in the ensemble as there are model dimension.') +end +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix +models(:,:,1)=minit; + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end +NPfun=numel(logPfuns); +%calculate logP state initial pos of walkers +% logP is the result of the function(s) evaluation(s) for all walkers, at all +% timesteps we choose to keep +logP=nan(NPfun,Nwalkers,Nkeep); + +for wix=1:Nwalkers % walker index + for fix=1:NPfun % function index + try + v=logPfuns{fix}(minit(:,wix)); + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + v = -Inf; % could potentially change this to something like -50, but then + % if here integration tolerances are not met, an error is + % thrown. So we NEED integration tolerances to be met + % initially. This is because of the way things are compared + % in the future iterations, and accepted and rejected. + % Should be -Inf, but I have changed it to -20. in the + % later iterations, keep it at -Inf, so that any future + % integration tol errors also get rejected. + end + end + if islogical(v) + %reformulate function so that false=-inf for logical constraints. + v=-1/v;logPfuns{fix}=@(m)-1/logPfuns{fix}(m); + %experimental implementation of experimental feature + end + logP(fix,wix,1)=v; + end +end + +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end +reject=zeros(Nwalkers,1); +rejProp = nan(1,Nkeep); +curm=models(:,:,1); % current model, all walkers +curlogP=logP(:,:,1); +progress(0,0,0) +totcount=Nwalkers; +% preallocate matrix to store parameter values that could not get simulated +non_integrable_m = nan(M, Nwalkers, Nkeep*p.ThinChain); +for row=1:Nkeep + for jj=1:p.ThinChain + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + %VSE: whu -1 here? this is what makes the system fail at 1 + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + % logrand is what we compare our computed log likelihood to. + totalcount = (row-1)*p.ThinChain+jj; + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + for fix=1:NPfun + try + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); %have tested workerobjwrapper but that is slower. + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + non_integrable_m(:,wix,totalcount) = proposedm(:,wix); + proposedlogP(fix) = -Inf; + end + end + + if lr(fix+1)>proposedlogP(fix)-cp(fix) || ~isreal(proposedlogP(fix)) || isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix) || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/gwmcmc_vse_origjan30_19.m b/mcmc_simbio/src/gwmcmc/gwmcmc_vse_origjan30_19.m new file mode 100644 index 0000000..c0eda78 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/gwmcmc_vse_origjan30_19.m @@ -0,0 +1,323 @@ +function [models,logP, rejProp]=gwmcmc_vse(minit,logPfuns,mccount,varargin) +%% Cascaded affine invariant ensemble MCMC sampler. "The MCMC hammer" +% +% GWMCMC is an implementation of the Goodman and Weare 2010 Affine +% invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling +% enables bayesian inference. The problem with many traditional MCMC samplers +% is that they can have slow convergence for badly scaled problems, and that +% it is difficult to optimize the random walk for high-dimensional problems. +% This is where the GW-algorithm really excels as it is affine invariant. It +% can achieve much better convergence on badly scaled problems. It is much +% simpler to get to work straight out of the box, and for that reason it +% truly deserves to be called the MCMC hammer. +% +% (This code uses a cascaded variant of the Goodman and Weare algorithm). +% +% USAGE: +% [models,logP]=gwmcmc(minit,logPfuns,mccount,[Parameter,Value,Parameter,Value]); +% +% INPUTS: +% minit: an MxW matrix of initial values for each of the walkers in the +% ensemble. (M:number of model params. W: number of walkers). W +% should be atleast 2xM. (see e.g. mvnrnd). +% logPfuns: a cell of function handles returning the log probality of a +% proposed set of model parameters. Typically this cell will +% contain two function handles: one to the logprior and another +% to the loglikelihood. E.g. {@(m)logprior(m) @(m)loglike(m)} +% mccount: What is the desired total number of monte carlo proposals. +% This is the total number, -NOT the number per chain. +% +% Named Parameter-Value pairs: +% 'StepSize': unit-less stepsize (default=2.5). +% 'ThinChain': Thin all the chains by only storing every N'th step (default=10) +% 'ProgressBar': Show a text progress bar (default=true) +% 'Parallel': Run in ensemble of walkers in parallel. (default=false) +% 'BurnIn': fraction of the chain that should be removed. (default=0) +% +% OUTPUTS: +% models: A MxWxT matrix with the thinned markov chains (with T samples +% per walker). T=~mccount/p.ThinChain/W. +% logP: A PxWxT matrix of log probabilities for each model in the +% models. here P is the number of functions in logPfuns. +% +% Note on cascaded evaluation of log probabilities: +% The logPfuns-argument can be specifed as a cell-array to allow a cascaded +% evaluation of the probabilities. The computationally cheapest function should be +% placed first in the cell (this will typically the prior). This allows the +% routine to avoid calculating the likelihood, if the proposed model can be +% rejected based on the prior alone. +% logPfuns={logprior loglike} is faster but equivalent to +% logPfuns={@(m)logprior(m)+loglike(m)} +% +% TIP: if you aim to analyze the entire set of ensemble members as a single +% sample from the distribution then you may collapse output models-matrix +% thus: models=models(:,:); This will reshape the MxWxT matrix into a +% Mx(W*T)-matrix while preserving the order. +% +% +% EXAMPLE: Here we sample a multivariate normal distribution. +% +% %define problem: +% mu = [5;-3;6]; +% C = [.5 -.4 0;-.4 .5 0; 0 0 1]; +% iC=pinv(C); +% logPfuns={@(m)-0.5*sum((m-mu)'*iC*(m-mu))} +% +% %make a set of starting points for the entire ensemble of walkers +% minit=randn(length(mu),length(mu)*2); +% +% %Apply the MCMC hammer +% [models,logP]=gwmcmc(minit,logPfuns,100000); +% models(:,:,1:floor(size(models,3)*.2))=[]; %remove 20% as burn-in +% models=models(:,:)'; %reshape matrix to collapse the ensemble member dimension +% scatter(models(:,1),models(:,2)) +% prctile(models,[5 50 95]) +% +% +% References: +% Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. +% App. Math. Comp. Sci., Vol. 5, No. 1, 65�80 +% Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +% +% WebPage: https://github.com/grinsted/gwmcmc +% +% -Aslak Grinsted 2015 + + +persistent isoctave; +if isempty(isoctave) + isoctave = (exist ('OCTAVE_VERSION', 'builtin') > 0); +end + +if nargin<3 + error('GWMCMC:toofewinputs','GWMCMC requires atleast 3 inputs.') +end +M=size(minit,1); +if size(minit,2)==1 + minit=bsxfun(@plus,minit,randn(M,M*5)); +end + + +p=inputParser; +if isoctave + p=p.addParamValue('StepSize',2,@isnumeric); + %addParamValue is chose for compatibility with octave. Still Untested. + p=p.addParamValue('ThinChain',10,@isnumeric); + p=p.addParamValue('ProgressBar',true,@islogical); + p=p.addParamValue('Parallel',false,@islogical); + p=p.addParamValue('BurnIn',0,@(x)(x>=0)&&(x<1)); + p=p.parse(varargin{:}); +else + p.addParameter('StepSize',2,@isnumeric); + %addParamValue is chose for compatibility with octave. Still Untested. + p.addParameter('ThinChain',10,@isnumeric); + p.addParameter('ProgressBar',true,@islogical); + p.addParameter('Parallel',false,@islogical); + p.addParameter('BurnIn',0,@(x)(x>=0)&&(x<1)); + p.parse(varargin{:}); +end +p=p.Results; + +Nwalkers=size(minit,2); +if size(minit,1)*2>size(minit,2) + warning('GWMCMC:minitdimensions',... + ['Check minit dimensions.\nIt is recommended that there be atleast'... + ' twice as many walkers in the ensemble as there are model dimension.']) +end +if p.ProgressBar + progress=@textprogress; +else + progress=@noaction; +end + +Nkeep=ceil(mccount/p.ThinChain/Nwalkers); %number of samples drawn from each walker +mccount=(Nkeep-1)*p.ThinChain+1; +models=nan(M,Nwalkers,Nkeep); %pre-allocate output matrix +models(:,:,1)=minit; + +if ~iscell(logPfuns) + logPfuns={logPfuns}; +end +NPfun=numel(logPfuns); +%calculate logP state initial pos of walkers +% logP is the result of the function(s) evaluation(s) for all walkers, at all +% timesteps we choose to keep +logP=nan(NPfun,Nwalkers,Nkeep); + +parfor wix=1:Nwalkers % walker index + for fix=1:NPfun % function index + try + v=logPfuns{fix}(minit(:,wix)); + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') + v = -Inf; % could potentially change this to something like -50, but then + % if here integration tolerances are not met, an error is + % thrown. So we NEED integration tolerances to be met + % initially. This is because of the way things are compared + % in the future iterations, and accepted and rejected. + % Should be -Inf, but I have changed it to -20. in the + % later iterations, keep it at -Inf, so that any future + % integration tol errors also get rejected. + end + end + if islogical(v) + %reformulate function so that false=-inf for logical constraints. + v=-1/v;logPfuns{fix}=@(m)-1/logPfuns{fix}(m); + %experimental implementation of experimental feature + end + logP(fix,wix,1)=v; + end +end + +if ~all(all(isfinite(logP(:,:,1)))) + error('Starting points for all walkers must have finite logP') +end +reject=zeros(Nwalkers,1); +rejProp = nan(1,Nkeep); +curm=models(:,:,1); % current model, all walkers +curlogP=logP(:,:,1); +progress(0,0,0) +totcount=Nwalkers; +% preallocate matrix to store parameter values that could not get simulated +% non_integrable_m = nan(M, Nwalkers, Nkeep*p.ThinChain); +for row=1:Nkeep + for jj=1:p.ThinChain + %generate proposals for all walkers + %(done outside walker loop, in order to be compatible with parfor - some penalty for memory): + %-Note it appears to give a slight performance boost for non-parallel. + rix=mod((1:Nwalkers)+floor(rand*(Nwalkers-1)),Nwalkers)+1; %pick a random partner + zz=((p.StepSize - 1)*rand(1,Nwalkers) + 1).^2/p.StepSize; + %VSE: whu -1 here? this is what makes the system fail at 1 + proposedm=curm(:,rix) - bsxfun(@times,(curm(:,rix)-curm),zz); + logrand=log(rand(NPfun+1,Nwalkers)); %moved outside because rand is slow inside parfor + % logrand is what we compare our computed log likelihood to. + totalcount = (row-1)*p.ThinChain+jj; + if p.Parallel + %parallel/non-parallel code is currently mirrored in + %order to enable experimentation with separate optimization + %techniques for each branch. Parallel is not really great yet. + %TODO: use SPMD instead of parfor. + + parfor wix=1:Nwalkers + cp=curlogP(:,wix); + lr=logrand(:,wix); + acceptfullstep=true; + proposedlogP=nan(NPfun,1); + + if lr(1)<(numel(proposedm(:,wix))-1)*log(zz(wix)) + for fix=1:NPfun + try + proposedlogP(fix)=logPfuns{fix}(proposedm(:,wix)); + %have tested workerobjwrapper but that is slower. + catch ME + if strcmp(ME.identifier, 'DESuite:ODE15S:IntegrationToleranceNotMet') +% non_integrable_m(:,wix,totalcount) = proposedm(:,wix); + proposedlogP(fix) = -Inf; + end + end + + if lr(fix+1)>proposedlogP(fix)-cp(fix) ||... + ~isreal(proposedlogP(fix)) || isnan( proposedlogP(fix) ) + %if ~(lr(fix+1)proposedlogP(fix)-curlogP(fix,wix)... + || ~isreal(proposedlogP(fix)) || isnan(proposedlogP(fix)) + %if ~(logrand(fix+1,wix)0 + crop=ceil(Nkeep*p.BurnIn); + models(:,:,1:crop)=[]; %TODO: never allocate space for them ? + logP(:,:,1:crop)=[]; +end + + +% TODO: make standard diagnostics to give warnings... +% TODO: make some diagnostic plots if nargout==0; + + + +function textprogress(pct,curm,rejectpct) +persistent lastNchar lasttime starttime +if isempty(lastNchar)||pct==0 + lasttime=cputime-10;starttime=cputime;lastNchar=0; + pct=1e-16; +end +if pct==1 + fprintf('%s',repmat(char(8),1,lastNchar));lastNchar=0; + return +end +if (cputime-lasttime>0.1) + + ETA=datestr((cputime-starttime)*(1-pct)/(pct*60*60*24),13); + progressmsg=[183-uint8((1:40)<=(pct*40)).*(183-'*') '']; + %progressmsg=['-'-uint8((1:40)<=(pct*40)).*('-'-'�') '']; + %progressmsg=[uint8((1:40)<=(pct*40)).*'#' '']; + curmtxt=sprintf('% 9.3g\n',curm(1:min(end,20),1)); + %curmtxt=mat2str(curm); + progressmsg=sprintf('\nGWMCMC %5.1f%% [%s] %s\n%3.0f%% rejected\n%s\n',... + pct*100,progressmsg,ETA,rejectpct*100,curmtxt); + + fprintf('%s%s',repmat(char(8),1,lastNchar),progressmsg); + drawnow;lasttime=cputime; + lastNchar=length(progressmsg); +end + +function noaction(varargin) + +% Acknowledgements: I became aware of the GW algorithm via a student report +% which was using emcee for python. Great stuff. diff --git a/mcmc_simbio/src/gwmcmc/html/ex_behappy.html b/mcmc_simbio/src/gwmcmc/html/ex_behappy.html new file mode 100755 index 0000000..671580e --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_behappy.html @@ -0,0 +1,117 @@ + + + + + Don't worry, be Happy

Don't worry, be Happy

Sampling a smiley face likelihood function.

Contents

Smiley face equation

Formulate a likelihood function inspired by an equation of a smiley face.

Source Michael Borcherds: https://twitter.com/mike_geogebra/status/135391208703930369

logHappiness=@(m)1-exp(1e-4*((m(1)^4+2*m(1)^2*m(2)^2-0.3*m(1)^2*m(2)-40.75*m(1)^2+m(2)^4-m(2)^3-40.75*m(2)^2+25*m(2)+393.75)*((m(1)+3)^2+(m(2)-7)^2-1)*((m(1)-3)^2+(m(2)-7)^2-1)*(m(1)^2+(m(2)-2)^2-64)));
+

Draw samples from the distribution using GWMCMC

Now we apply the MCMC hammer to draw samples from the logHappiness distribution.

[models,logP]=gwmcmc(randn(2,100),logHappiness,100000,'ThinChain',2);
+models(:,:,1:end*.2)=[];
+models=models(:,:)';
+
+
+plot(models(:,1),models(:,2),'yo','markerfacecolor',[1 1 0]*.8);
+
+axis equal off
+
+
+title('GWMCMC says: "Don''t Worry, Be Happy!"');
+

Important links

Bobby McFerrin on youtube: https://www.youtube.com/watch?v=d-diB65scQU

\ No newline at end of file diff --git a/mcmc_simbio/src/gwmcmc/html/ex_behappy.md b/mcmc_simbio/src/gwmcmc/html/ex_behappy.md new file mode 100755 index 0000000..c8b4421 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_behappy.md @@ -0,0 +1,46 @@ +Don't worry, be Happy +======================================= + +Sampling a smiley face likelihood function. + + + +Smiley face equation +---------------------------------------------------------- + +Formulate a likelihood function inspired by an equation of a smiley face. + +Source Michael Borcherds: https:\#\#twitter.com\#mike_geogebra\#status\#135391208703930369 + +```matlab +logHappiness=@(m)1-exp(1e-4*((m(1)^4+2*m(1)^2*m(2)^2-0.3*m(1)^2*m(2)-40.75*m(1)^2+m(2)^4-m(2)^3-40.75*m(2)^2+25*m(2)+393.75)*((m(1)+3)^2+(m(2)-7)^2-1)*((m(1)-3)^2+(m(2)-7)^2-1)*(m(1)^2+(m(2)-2)^2-64))); +``` + + +Draw samples from the distribution using GWMCMC +---------------------------------------------------------- + +Now we apply the MCMC hammer to draw samples from the logHappiness distribution. + +```matlab +[models,logP]=gwmcmc(randn(2,100),logHappiness,100000,'ThinChain',2); +models(:,:,1:end*.2)=[]; +models=models(:,:)'; + + +plot(models(:,1),models(:,2),'yo','markerfacecolor',[1 1 0]*.8); + +axis equal off + + +title('GWMCMC says: "Don''t Worry, Be Happy!"'); +``` + +![IMAGE](ex_behappy_01.png) + + +Important links +---------------------------------------------------------- + +Bobby McFerrin on youtube: https:\#\#www.youtube.com\#watch?v=d-diB65scQU + diff --git a/mcmc_simbio/src/gwmcmc/html/ex_behappy_01.png b/mcmc_simbio/src/gwmcmc/html/ex_behappy_01.png new file mode 100755 index 0000000..bf840d4 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_behappy_01.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_breakfit.html b/mcmc_simbio/src/gwmcmc/html/ex_breakfit.html new file mode 100755 index 0000000..e97688e --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_breakfit.html @@ -0,0 +1,301 @@ + + + + + Fitting a trend-change model to a time series

Fitting a trend-change model to a time series

This code fits a trend-change model to a historical time series of sea level in Amsterdam with gaps.

Contents

Input data

Amsterdam sea level from this source: http://www.psmsl.org/data/longrecords/

Y=[1700 -152; 1701 -158; 1702 -132; 1703 -172; 1704 -135; 1705 -167; 1706 -192; 1707 -153; 1708 -149; 1709 -187; 1710 -168; 1711 -140; 1712 -129; 1713 -151;
+   1714 -106; 1715 -172; 1716 -168; 1717 -164; 1718 -185; 1719 -182; 1720 -109; 1721 -146; 1722 -141; 1723 -99; 1724 -145; 1725 -166; 1726 -108; 1727 -136;
+   1728 -195; 1729 -176; 1730 -148; 1731 -108; 1732 -134; 1733 -160; 1734 -165; 1735 -181; 1736 -109; 1737 -92; 1738 -152; 1739 -123; 1740 -124; 1741 -122;
+   1742 -154; 1743 -144; 1744 -148; 1745 -178; 1746 -178; 1747 -142; 1748 -147; 1749 -167; 1766 -175; 1767 -111; 1768 -160; 1769 -86; 1770 -94; 1771 -87;
+   1772 -142; 1773 -143; 1774 -135; 1775 -127; 1776 -150; 1777 -131; 1778 -155; 1779 -131; 1780 -130; 1781 -134; 1782 -160; 1783 -157; 1784 -173; 1785 -178;
+   1786 -178; 1787 -125; 1788 -204; 1789 -161; 1790 -109; 1791 -92; 1792 -150; 1793 -154; 1794 -118; 1795 -121; 1796 -157; 1797 -134; 1798 -135; 1799 -177;
+   1800 -175; 1801 -90; 1802 -159; 1803 -172; 1804 -130; 1805 -142; 1806 -106; 1807 -105; 1808 -183; 1809 -151; 1810 -128; 1811 -137; 1812 -141; 1813 -150;
+   1814 -185; 1815 -144; 1816 -113; 1817 -102; 1818 -160; 1819 -158; 1820 -194; 1821 -123; 1822 -125; 1823 -198; 1824 -97; 1825 -87; 1826 -126; 1827 -97;
+   1828 -124; 1829 -119; 1830 -141; 1831 -94; 1832 -141; 1833 -106; 1834 -77; 1835 -105; 1836 -96; 1837 -88; 1838 -117; 1839 -114; 1840 -111; 1841 -85;
+   1842 -132; 1843 -57; 1844 -53; 1845 -90; 1846 -80; 1847 -118; 1848 -141; 1849 -101; 1850 -91; 1851 -102; 1852 -97; 1853 -113; 1854 -49; 1855 -111;
+   1856 -85;  1857 -145; 1858 -137; 1859 -102; 1860 -113; 1861 -94; 1862 -125; 1863 -121; 1864 -161; 1865 -157; 1866 -93; 1867 -58; 1868 -91; 1869 -75; 1870 -129;
+   1871 -141; 1874 -110; 1875 -125; 1876 -80; 1877 -43; 1878 -60; 1879 -79; 1880 -31; 1881 -64; 1882 -74; 1883 -58; 1884 -54; 1885 -75; 1886 -88; 1887 -64; 1888 -86;
+   1889 -53; 1890 -84; 1891 -94; 1892 -78; 1893 -67; 1894 -92; 1895 -74; 1896 -81; 1897 -82; 1898 -32; 1899 -36; 1900 -67; 1901 -45; 1902 -62; 1903 -25; 1904 -58; 1905 -32;
+   1906 -34; 1907 -75; 1908 -66; 1909 -36; 1910 -12; 1911 -24; 1912 -7; 1913 -22; 1914 0; 1915 7; 1916 -5; 1917 -37; 1918 -44; 1919 -38; 1920 14; 1921 -10;
+   1922 -16; 1923 -38;1925 29];
+t=Y(:,1);
+Y=Y(:,2);
+

Define trend change forward model:

forwardmodel=@(t,m)(m(1)*(t<m(3))+m(2)*(t>m(3))).*(t-m(3))+m(4);
+

Make an initial guess for the model parameters.

p=polyfit(t-mean(t),Y,1);
+m0=[p(1) p(1) mean(t) p(2)]';
+sigma=std(Y-forwardmodel(t,m0));
+m0=[m0 ; log(sigma)];
+

Likelihood

We assume the data are normally distributed around the forward model.

% First we define a helper function equivalent to calling log(normpdf(x,mu,sigma))
+% but has higher precision because it avoids truncation errors associated with calling
+% log(exp(xxx)).
+lognormpdf=@(x,mu,sigma)-0.5*((x-mu)./sigma).^2  -log(sqrt(2*pi).*sigma);
+
+
+logLike=@(m)sum(lognormpdf(y,forwardmodel(t,m),m(5)));
+

Prior information

We want to restrict the model to place the kink-point within the observed time interval. All other parameters have a uniform prior.

logprior = @(m)(m(3)>min(t))&(m(3)<max(t));
+

Find the posterior distribution using GWMCMC

Now we apply the MCMC hammer to draw samples from the posterior.

% first we initialize the ensemble of walkers in a small gaussian ball
+% around the m0 estimate.
+
+ball=randn(length(m0),30)*0.1;
+ball(:,3)=ball(:,3)*200;
+mball=bsxfun(@plus,m0,ball);
+

Apply the hammer:

Draw samples from the posterior.

tic
+m=gwmcmc(mball,{logprior logL},300000,'burnin',.3,'stepsize',2);
+toc
+
Elapsed time is 25.385783 seconds.
+

Plot the auto-correlation function

And determine the effective sample size.

figure
+[C,lags,ESS]=eacorr(m);
+plot(lags,C,'.-',lags([1 end]),[0 0],'k');
+grid on
+xlabel('lags')
+ylabel('autocorrelation');
+text(lags(end),0,sprintf('Effective Sample Size (ESS): %.0f_ ',ceil(mean(ESS))),'verticalalignment','bottom','horizontalalignment','right')
+title('Markov Chain Auto Correlation')
+

Corner plot of parameters

The corner plot shows a bi-modal distribution with two different places you might place the kink in the trend-change model.

figure
+ecornerplot(m,'ks',true,'color',[.6 .35 .3],'names',{'rate_1' 'rate_2' 'kink' 'k' '\sigma'})
+

Plot of posterior fit

figure
+m=m(:,:)'; %flatten the chain
+
+
+%make a 2d histogram of forwardmodel of the posterior samples
+ygrid=linspace(min(Y),max(Y),200);
+tgrid=min(t):max(t);
+Ycount=zeros(length(ygrid),length(tgrid));
+for kk=1:1000
+    r=ceil(rand*size(m,1));
+    Ymodel=forwardmodel(tgrid,m(r,:));
+    Ybin=round((Ymodel-ygrid(1))*length(ygrid)/(ygrid(end)-ygrid(1)));
+    for jj=1:length(tgrid)
+        Ycount(Ybin(jj),jj)	=Ycount(Ybin(jj),jj)+1;
+    end
+end
+Ycount(Ycount==0)=nan;
+h=imagesc(Ycount,'Xdata',tgrid,'Ydata',ygrid);
+axis xy
+
+hold on
+
+h=plot(t,Y,'ks','markersize',5);
+
+[~, mm]=kmeans(m, 2); %use Kmeans to characterize two solutions
+
+h(2)=plot(tgrid,forwardmodel(tgrid,mm(1,:)),'color',[.6 .45 .3],'linewidth',2);
+h(3)=plot(tgrid,forwardmodel(tgrid,mm(2,:)),'color',[.6 .3 .45],'linewidth',2);
+
+axis tight
+legend(h,'Data','Model A','Model B','location','best')
+
\ No newline at end of file diff --git a/mcmc_simbio/src/gwmcmc/html/ex_breakfit.md b/mcmc_simbio/src/gwmcmc/html/ex_breakfit.md new file mode 100755 index 0000000..acda5e5 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_breakfit.md @@ -0,0 +1,180 @@ +Fitting a trend-change model to a time series +======================================= + +This code fits a trend-change model to a historical time series of sea level in Amsterdam with gaps. + + + +Input data +---------------------------------------------------------- + +Amsterdam sea level from this source: http://www.psmsl.org/data/longrecords/ + +```matlab +Y=[1700 -152; 1701 -158; 1702 -132; 1703 -172; 1704 -135; 1705 -167; 1706 -192; 1707 -153; 1708 -149; 1709 -187; 1710 -168; 1711 -140; 1712 -129; 1713 -151; + 1714 -106; 1715 -172; 1716 -168; 1717 -164; 1718 -185; 1719 -182; 1720 -109; 1721 -146; 1722 -141; 1723 -99; 1724 -145; 1725 -166; 1726 -108; 1727 -136; + 1728 -195; 1729 -176; 1730 -148; 1731 -108; 1732 -134; 1733 -160; 1734 -165; 1735 -181; 1736 -109; 1737 -92; 1738 -152; 1739 -123; 1740 -124; 1741 -122; + 1742 -154; 1743 -144; 1744 -148; 1745 -178; 1746 -178; 1747 -142; 1748 -147; 1749 -167; 1766 -175; 1767 -111; 1768 -160; 1769 -86; 1770 -94; 1771 -87; + 1772 -142; 1773 -143; 1774 -135; 1775 -127; 1776 -150; 1777 -131; 1778 -155; 1779 -131; 1780 -130; 1781 -134; 1782 -160; 1783 -157; 1784 -173; 1785 -178; + 1786 -178; 1787 -125; 1788 -204; 1789 -161; 1790 -109; 1791 -92; 1792 -150; 1793 -154; 1794 -118; 1795 -121; 1796 -157; 1797 -134; 1798 -135; 1799 -177; + 1800 -175; 1801 -90; 1802 -159; 1803 -172; 1804 -130; 1805 -142; 1806 -106; 1807 -105; 1808 -183; 1809 -151; 1810 -128; 1811 -137; 1812 -141; 1813 -150; + 1814 -185; 1815 -144; 1816 -113; 1817 -102; 1818 -160; 1819 -158; 1820 -194; 1821 -123; 1822 -125; 1823 -198; 1824 -97; 1825 -87; 1826 -126; 1827 -97; + 1828 -124; 1829 -119; 1830 -141; 1831 -94; 1832 -141; 1833 -106; 1834 -77; 1835 -105; 1836 -96; 1837 -88; 1838 -117; 1839 -114; 1840 -111; 1841 -85; + 1842 -132; 1843 -57; 1844 -53; 1845 -90; 1846 -80; 1847 -118; 1848 -141; 1849 -101; 1850 -91; 1851 -102; 1852 -97; 1853 -113; 1854 -49; 1855 -111; + 1856 -85; 1857 -145; 1858 -137; 1859 -102; 1860 -113; 1861 -94; 1862 -125; 1863 -121; 1864 -161; 1865 -157; 1866 -93; 1867 -58; 1868 -91; 1869 -75; 1870 -129; + 1871 -141; 1874 -110; 1875 -125; 1876 -80; 1877 -43; 1878 -60; 1879 -79; 1880 -31; 1881 -64; 1882 -74; 1883 -58; 1884 -54; 1885 -75; 1886 -88; 1887 -64; 1888 -86; + 1889 -53; 1890 -84; 1891 -94; 1892 -78; 1893 -67; 1894 -92; 1895 -74; 1896 -81; 1897 -82; 1898 -32; 1899 -36; 1900 -67; 1901 -45; 1902 -62; 1903 -25; 1904 -58; 1905 -32; + 1906 -34; 1907 -75; 1908 -66; 1909 -36; 1910 -12; 1911 -24; 1912 -7; 1913 -22; 1914 0; 1915 7; 1916 -5; 1917 -37; 1918 -44; 1919 -38; 1920 14; 1921 -10; + 1922 -16; 1923 -38;1925 29]; +t=Y(:,1); +Y=Y(:,2); +``` + + +Define trend change forward model: +---------------------------------------------------------- + +```matlab +forwardmodel=@(t,m)(m(1)*(tm(3))).*(t-m(3))+m(4); +``` + + +Make an initial guess for the model parameters. +---------------------------------------------------------- + +```matlab +p=polyfit(t-mean(t),Y,1); +m0=[p(1) p(1) mean(t) p(2)]'; +sigma=std(Y-forwardmodel(t,m0)); +m0=[m0 ; log(sigma)]; +``` + + +Likelihood +---------------------------------------------------------- + +We assume the data are normally distributed around the forward model. + +```matlab +% First we define a helper function equivalent to calling log(normpdf(x,mu,sigma)) +% but has higher precision because it avoids truncation errors associated with calling +% log(exp(xxx)). +lognormpdf=@(x,mu,sigma)-0.5*((x-mu)./sigma).^2 -log(sqrt(2*pi).*sigma); + + +logLike=@(m)sum(lognormpdf(y,forwardmodel(t,m),m(5))); +``` + + +Prior information +---------------------------------------------------------- + +We want to restrict the model to place the kink-point within the observed time interval. All other parameters have a uniform prior. + +```matlab +logprior = @(m)(m(3)>min(t))&(m(3) + + + Fitting a line

Fitting a line

This demo follows the linefit example of EMCEE for python. See full description here: http://dan.iel.fm/emcee/current/user/line/

Contents

Generate synthetic data

First we generate some noisy data which falls on a line. We know the true parameters of the line and the parameters of the noise added to the observations.

In this surrogate data there are two sources of uncertainty. One source with known variance (yerr), and another multiplicative uncertainty with unknown variance.

% This is the true model parameters used to generate the noise
+m_true = [-0.9594;4.294;log(0.534)]
+
+N = 50;
+x = sort(10*rand(1,N));
+yerr = 0.1+0.5*rand(1,N);
+y = m_true(1)*x+m_true(2);
+y = y + abs(exp(m_true(3))*y) .* randn(1,N);
+y = y + yerr .* randn(1,N);
+
+
+close all %close all figures
+errorbar(x,y,yerr,'ks','markerfacecolor',[1 1 1]*.4,'markersize',4);
+axis tight
+
m_true =
+      -0.9594
+        4.294
+     -0.62736
+

Least squares fit

lscov can be used to fit a straight line to the data assuming that the errors in yerr are correct. Notice how this results in very optimistic uncertainties on the slope and intercept. This is because this method only accounts for the known source of error.

[m_lsq,sigma_mlsq,MSE]=lscov([x;ones(size(x))]',y',diag(yerr.^2));
+sigma_m_lsq=sigma_mlsq/sqrt(MSE); %see help on lscov
+m_lsq
+sigma_m_lsq
+
+hold on
+plot(x,polyval(m_lsq,x),'b--','linewidth',2)
+legend('Data','LSQ fit')
+
m_lsq =
+      -1.0692
+       4.4279
+sigma_m_lsq =
+     0.011028
+     0.076933
+

Likelihood

We define a likelihood function consistent with how the data was generated, and then we use fminsearch to find the max-likelihood fit of the model to the data.

% First we define a helper function equivalent to calling log(normpdf(x,mu,sigma))
+% but has higher precision because it avoids truncation errors associated with calling
+% log(exp(xxx)).
+lognormpdf=@(x,mu,sigma)-0.5*((x-mu)./sigma).^2  -log(sqrt(2*pi).*sigma);
+
+forwardmodel=@(m)m(1)*x + m(2);
+variancemodel=@(m) yerr.^2 + (forwardmodel(m)*exp(m(3))).^2;
+
+logLike=@(m)sum(lognormpdf(y,forwardmodel(m),sqrt(variancemodel(m))));
+
+m_maxlike=fminsearch(@(m)-logLike(m),[polyfit(x,y,1) 0]');
+

Prior information

Here we formulate our prior knowledge about the model parameters. Here we use flat priors within a hard limits for each of the 3 model parameters. GWMCMC allows you to specify these kinds of priors as logical expressions.

logprior =@(m) (m(1)>-5)&&(m(1)<0.5) && (m(2)>0)&&(m(2)<10) && (m(3)>-10)&&(m(3)<1) ;
+

Find the posterior distribution using GWMCMC

Now we apply the MCMC hammer to draw samples from the posterior.

% first we initialize the ensemble of walkers in a small gaussian ball
+% around the max-likelihood estimate.
+minit=bsxfun(@plus,m_maxlike,randn(3,100)*0.01);
+

Apply the hammer:

Draw samples from the posterior

tic
+m=gwmcmc(minit,{logprior logLike},100000,'ThinChain',5,'burnin',.2);
+toc
+
Elapsed time is 6.605606 seconds.
+

Auto-correlation function

figure
+[C,lags,ESS]=eacorr(m);
+plot(lags,C,'.-',lags([1 end]),[0 0],'k');
+grid on
+xlabel('lags')
+ylabel('autocorrelation');
+text(lags(end),0,sprintf('Effective Sample Size (ESS): %.0f_ ',ceil(mean(ESS))),'verticalalignment','bottom','horizontalalignment','right')
+title('Markov Chain Auto Correlation')
+

Corner plot of parameters

figure
+ecornerplot(m,'ks',true,'color',[.6 .35 .3])
+

Plot of posterior fit

figure
+m=m(:,:)'; %flatten the chain
+
+%plot 100 samples...
+for kk=1:100
+    r=ceil(rand*size(m,1));
+    h=plot(x,forwardmodel(m(r,:)),'color',[.6 .35 .3].^.3);
+    hold on
+end
+h(2)=errorbar(x,y,yerr,'ks','markerfacecolor',[1 1 1]*.4,'markersize',4);
+
+h(4)=plot(x,forwardmodel(m_lsq),'b--','linewidth',2);
+h(3)=plot(x,forwardmodel(median(m)),'color',[.6 .35 .3],'linewidth',3);
+h(5)=plot(x,forwardmodel(m_true),'r','linewidth',2);
+
+axis tight
+legend(h,'Samples from posterior','Data','GWMCMC median','LSQ fit','Truth')
+
\ No newline at end of file diff --git a/mcmc_simbio/src/gwmcmc/html/ex_linefit.md b/mcmc_simbio/src/gwmcmc/html/ex_linefit.md new file mode 100755 index 0000000..07474d3 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_linefit.md @@ -0,0 +1,182 @@ +Fitting a line +======================================= + +This demo follows the linefit example of EMCEE for python. See full description here: http://dan.iel.fm/emcee/current/user/line/ + + + +Generate synthetic data +---------------------------------------------------------- + +First we generate some noisy data which falls on a line. We know the true parameters of the line and the parameters of the noise added to the observations. + +In this surrogate data there are two sources of uncertainty. One source with known variance (yerr), and another multiplicative uncertainty with unknown variance. + +```matlab +% This is the true model parameters used to generate the noise +m_true = [-0.9594;4.294;log(0.534)] + +N = 50; +x = sort(10*rand(1,N)); +yerr = 0.1+0.5*rand(1,N); +y = m_true(1)*x+m_true(2); +y = y + abs(exp(m_true(3))*y) .* randn(1,N); +y = y + yerr .* randn(1,N); + + +close all %close all figures +errorbar(x,y,yerr,'ks','markerfacecolor',[1 1 1]*.4,'markersize',4); +axis tight +``` + +``` +m_true = + -0.9594 + 4.294 + -0.62736 + +``` + +![IMAGE](ex_linefit_01.png) + + +Least squares fit +---------------------------------------------------------- + +lscov can be used to fit a straight line to the data assuming that the errors in yerr are correct. Notice how this results in very optimistic uncertainties on the slope and intercept. This is because this method only accounts for the known source of error. + +```matlab +[m_lsq,sigma_mlsq,MSE]=lscov([x;ones(size(x))]',y',diag(yerr.^2)); +sigma_m_lsq=sigma_mlsq/sqrt(MSE); %see help on lscov +m_lsq +sigma_m_lsq + +hold on +plot(x,polyval(m_lsq,x),'b--','linewidth',2) +legend('Data','LSQ fit') +``` + +``` +m_lsq = + -1.0376 + 4.0345 +sigma_m_lsq = + 0.0095665 + 0.058728 + +``` + +![IMAGE](ex_linefit_02.png) + + +Likelihood +---------------------------------------------------------- + +We define a likelihood function consistent with how the data was generated, and then we use fminsearch to find the max-likelihood fit of the model to the data. + +```matlab +% First we define a helper function equivalent to calling log(normpdf(x,mu,sigma)) +% but has higher precision because it avoids truncation errors associated with calling +% log(exp(xxx)). +lognormpdf=@(x,mu,sigma)-0.5*((x-mu)./sigma).^2 -log(sqrt(2*pi).*sigma); + +forwardmodel=@(m)m(1)*x + m(2); +variancemodel=@(m) yerr.^2 + (forwardmodel(m)*exp(m(3))).^2; + +logLike=@(m)sum(lognormpdf(y,forwardmodel(m),sqrt(variancemodel(m)))); + +m_maxlike=fminsearch(@(m)-logLike(m),[polyfit(x,y,1) 0]'); +``` + + +Prior information +---------------------------------------------------------- + +Here we formulate our prior knowledge about the model parameters. Here we use flat priors within a hard limits for each of the 3 model parameters. GWMCMC allows you to specify these kinds of priors as logical expressions. + +```matlab +logprior =@(m) (m(1)>-5)&&(m(1)<0.5) && (m(2)>0)&&(m(2)<10) && (m(3)>-10)&&(m(3)<1) ; +``` + + +Find the posterior distribution using GWMCMC +---------------------------------------------------------- + +Now we apply the MCMC hammer to draw samples from the posterior. + +```matlab +% first we initialize the ensemble of walkers in a small gaussian ball +% around the max-likelihood estimate. +minit=bsxfun(@plus,m_maxlike,randn(3,100)*0.01); +``` + + +Apply the hammer: +---------------------------------------------------------- + +Draw samples from the posterior + +```matlab +tic +m=gwmcmc(minit,{logprior logLike},100000,'ThinChain',5,'burnin',.2); +toc +``` + +``` +Elapsed time is 6.766257 seconds. + +``` + + +Auto-correlation function +---------------------------------------------------------- + +```matlab +figure +[C,lags,ESS]=eacorr(m); +plot(lags,C,'.-',lags([1 end]),[0 0],'k'); +grid on +xlabel('lags') +ylabel('autocorrelation'); +text(lags(end),0,sprintf('Effective Sample Size (ESS): %.0f_ ',ceil(mean(ESS))),'verticalalignment','bottom','horizontalalignment','right') +title('Markov Chain Auto Correlation') +``` + +![IMAGE](ex_linefit_03.png) + + +Corner plot of parameters +---------------------------------------------------------- + +```matlab +figure +ecornerplot(m,'ks',true,'color',[.6 .35 .3]) +``` + +![IMAGE](ex_linefit_04.png) + + +Plot of posterior fit +---------------------------------------------------------- + +```matlab +figure +m=m(:,:)'; %flatten the chain + +%plot 100 samples... +for kk=1:100 + r=ceil(rand*size(m,1)); + h=plot(x,forwardmodel(m(r,:)),'color',[.6 .35 .3].^.3); + hold on +end +h(2)=errorbar(x,y,yerr,'ks','markerfacecolor',[1 1 1]*.4,'markersize',4); + +h(4)=plot(x,forwardmodel(m_lsq),'b--','linewidth',2); +h(3)=plot(x,forwardmodel(median(m)),'color',[.6 .35 .3],'linewidth',3); +h(5)=plot(x,forwardmodel(m_true),'r','linewidth',2); + +axis tight +legend(h,'Samples from posterior','Data','GWMCMC median','LSQ fit','Truth') +``` + +![IMAGE](ex_linefit_05.png) diff --git a/mcmc_simbio/src/gwmcmc/html/ex_linefit_01.png b/mcmc_simbio/src/gwmcmc/html/ex_linefit_01.png new file mode 100755 index 0000000..99d70a5 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_linefit_01.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_linefit_02.png b/mcmc_simbio/src/gwmcmc/html/ex_linefit_02.png new file mode 100755 index 0000000..e988fb1 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_linefit_02.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_linefit_03.png b/mcmc_simbio/src/gwmcmc/html/ex_linefit_03.png new file mode 100755 index 0000000..6c7d6e5 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_linefit_03.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_linefit_04.png b/mcmc_simbio/src/gwmcmc/html/ex_linefit_04.png new file mode 100755 index 0000000..42a0628 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_linefit_04.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_linefit_05.png b/mcmc_simbio/src/gwmcmc/html/ex_linefit_05.png new file mode 100755 index 0000000..97f6d91 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_linefit_05.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana.html b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana.html new file mode 100755 index 0000000..93a7aab --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana.html @@ -0,0 +1,168 @@ + + + + + The MCMC hammer

The MCMC hammer

GWMCMC is an implementation of the Goodman and Weare 2010 Affine invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling enables bayesian inference. The problem with many traditional MCMC samplers is that they can have slow convergence for badly scaled problems, and that it is difficult to optimize the random walk for high-dimensional problems. This is where the GW-algorithm really excels as it is affine invariant. It can achieve much better convergence on badly scaled problems. It is much simpler to get to work straight out of the box, and for that reason it truly deserves to be called the MCMC hammer.

See also: http://astrobites.org/2012/02/20/code-you-can-use-the-mcmc-hammer/

Contents

Rosenbrock: A badly scaled example

A classical difficult low dimensional problem is the rosenbrock density. It is defined by the following log-probability function:

logPfun=@(m) -(100*(m(2,:)-m(1,:).^2).^2 +(1-m(1,:)).^2)/20;
+
+%lets visualize it:
+close all
+[X,Y]=meshgrid(-4:.01:6,-1:.02:34);
+Z=logPfun([X(:) Y(:)]'); Z=reshape(Z,size(X));
+contour(X,Y,exp(Z))
+colormap(parula)
+title('The Rosenbrock banana')
+xlim([-4 6])
+ylim([-1 34])
+

Apply the MCMC hammer:

Now we apply the Goodman & Weare MCMC sampler and plot the results on top

M=2; %number of model parameters
+Nwalkers=40; %number of walkers/chains.
+minit=randn(M,Nwalkers);
+tic
+models=gwmcmc(minit, logPfun,100000,'StepSize',30,'burnin',.2);
+toc
+
+
+%flatten the chain: analyze all the chains as one
+
+models=models(:,:);
+
+%plot the results
+
+hold on
+plot(models(1,:),models(2,:),'k.')
+
+legend('Rosenbrock','GWMCMC samples','location','northwest')
+
Elapsed time is 2.228519 seconds.
+

References:

  • Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 65�80
  • Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665

-Aslak Grinsted 2015

\ No newline at end of file diff --git a/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana.md b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana.md new file mode 100755 index 0000000..8985534 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana.md @@ -0,0 +1,72 @@ +The MCMC hammer +======================================= + +GWMCMC is an implementation of the Goodman and Weare 2010 Affine invariant ensemble Markov Chain Monte Carlo (MCMC) sampler. MCMC sampling enables bayesian inference. The problem with many traditional MCMC samplers is that they can have slow convergence for badly scaled problems, and that it is difficult to optimize the random walk for high-dimensional problems. This is where the GW-algorithm really excels as it is affine invariant. It can achieve much better convergence on badly scaled problems. It is much simpler to get to work straight out of the box, and for that reason it truly deserves to be called the MCMC hammer. + +See also: http://astrobites.org/2012/02/20/code-you-can-use-the-mcmc-hammer/ + + + +Rosenbrock: A badly scaled example +---------------------------------------------------------- + +A classical difficult low dimensional problem is the rosenbrock density. It is defined by the following log-probability function: + +```matlab +logPfun=@(m) -(100*(m(2,:)-m(1,:).^2).^2 +(1-m(1,:)).^2)/20; + +%lets visualize it: +close all +[X,Y]=meshgrid(-4:.01:6,-1:.02:34); +Z=logPfun([X(:) Y(:)]'); Z=reshape(Z,size(X)); +contour(X,Y,exp(Z)) +colormap(parula) +title('The Rosenbrock banana') +xlim([-4 6]) +ylim([-1 34]) +``` + +![IMAGE](ex_rosenbrockbanana_01.png) + + +Apply the MCMC hammer: +---------------------------------------------------------- + +Now we apply the Goodman & Weare MCMC sampler and plot the results on top + +```matlab +M=2; %number of model parameters +Nwalkers=40; %number of walkers/chains. +minit=randn(M,Nwalkers); +tic +models=gwmcmc(minit, logPfun,100000,'StepSize',30,'burnin',.2); +toc + + +%flatten the chain: analyze all the chains as one + +models=models(:,:); + +%plot the results + +hold on +plot(models(1,:),models(2,:),'k.') + +legend('Rosenbrock','GWMCMC samples','location','northwest') +``` + +``` +Elapsed time is 2.202490 seconds. + +``` + +![IMAGE](ex_rosenbrockbanana_02.png) + + +References: +---------------------------------------------------------- + + + Goodman & Weare (2010), Ensemble Samplers With Affine Invariance, Comm. App. Math. Comp. Sci., Vol. 5, No. 1, 65�80 + + Foreman-Mackey, Hogg, Lang, Goodman (2013), emcee: The MCMC Hammer, arXiv:1202.3665 +-Aslak Grinsted 2015 + diff --git a/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana_01.png b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana_01.png new file mode 100755 index 0000000..cb95885 Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana_01.png differ diff --git a/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana_02.png b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana_02.png new file mode 100755 index 0000000..fdd029e Binary files /dev/null and b/mcmc_simbio/src/gwmcmc/html/ex_rosenbrockbanana_02.png differ diff --git a/mcmc_simbio/src/gwmcmc/private/kde2d.m b/mcmc_simbio/src/gwmcmc/private/kde2d.m new file mode 100755 index 0000000..d78a9e0 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/private/kde2d.m @@ -0,0 +1,244 @@ +function [bandwidth,density,X,Y]=kde2d(data,n,MIN_XY,MAX_XY,EffectiveSampleSize) +%% fast and accurate state-of-the-art bivariate kernel density estimator with diagonal bandwidth matrix. +% +% The kernel is assumed to be Gaussian. The two bandwidth parameters are +% chosen optimally without ever using/assuming a parametric model for the data or any "rules of thumb". +% Unlike many other procedures, this one is immune to accuracy failures in the estimation of +% multimodal densities with widely separated modes (see examples). +% +% Usage: [bandwidth,density,X,Y]=kde2d(data,n[,MIN_XY,MAX_XY,EffectiveSampleSize]) +% +% INPUTS: data - an N by 2 array with continuous data +% n - size of the n by n grid over which the density is computed +% n has to be a power of 2, otherwise n=2^ceil(log2(n)); +% the default value is 2^8; +% MIN_XY,MAX_XY- limits of the bounding box over which the density is computed; +% the format is: +% MIN_XY=[lower_Xlim,lower_Ylim] +% MAX_XY=[upper_Xlim,upper_Ylim]. +% The dafault limits are computed as: +% MAX=max(data,[],1); MIN=min(data,[],1); Range=MAX-MIN; +% MAX_XY=MAX+Range/4; MIN_XY=MIN-Range/4; +% [EffectiveSampleSize]- allows you to choose a smaller effective sample +% size than the number of points in data. +% OUTPUT: bandwidth - a row vector with the two optimal +% bandwidths for a bivaroate Gaussian kernel; +% the format is: +% bandwidth=[bandwidth_X, bandwidth_Y]; +% density - an n by n matrix containing the density values over the n by n grid; +% density is not computed unless the function is asked for such an output; +% X,Y - the meshgrid over which the variable "density" has been computed; +% the intended usage is as follows: +% surf(X,Y,density) +% Example (simple Gaussian mixture) +% clear all +% % generate a Gaussian mixture with distant modes +% data=[randn(500,2); +% randn(500,1)+3.5, randn(500,1);]; +% % call the routine +% [bandwidth,density,X,Y]=kde2d(data); +% % plot the data and the density estimate +% contour3(X,Y,density,50), hold on +% plot(data(:,1),data(:,2),'r.','MarkerSize',5) +% +% Example (Gaussian mixture with distant modes): +% +% clear all +% % generate a Gaussian mixture with distant modes +% data=[randn(100,1), randn(100,1)/4; +% randn(100,1)+18, randn(100,1); +% randn(100,1)+15, randn(100,1)/2-18;]; +% % call the routine +% [bandwidth,density,X,Y]=kde2d(data); +% % plot the data and the density estimate +% surf(X,Y,density,'LineStyle','none'), view([0,60]) +% colormap hot, hold on, alpha(.8) +% set(gca, 'color', 'blue'); +% plot(data(:,1),data(:,2),'w.','MarkerSize',5) +% +% Example (Sinusoidal density): +% +% clear all +% X=rand(1000,1); Y=sin(X*10*pi)+randn(size(X))/3; data=[X,Y]; +% % apply routine +% [bandwidth,density,X,Y]=kde2d(data); +% % plot the data and the density estimate +% surf(X,Y,density,'LineStyle','none'), view([0,70]) +% colormap hot, hold on, alpha(.8) +% set(gca, 'color', 'blue'); +% plot(data(:,1),data(:,2),'w.','MarkerSize',5) +% +% Notes: If you have a more accurate density estimator +% (as measured by which routine attains the smallest +% L_2 distance between the estimate and the true density) or you have +% problems running this code, please email me at botev@maths.uq.edu.au +% + +% +% This version has been modified by Aslak Grinsted to allow effective +% sample size adjustment to the bandwidth calculation. +% + +%LICENSE: +% Copyright (c) 2015, Dr. Zdravko Botev +% All rights reserved. +% +% Redistribution and use in source and binary forms, with or without +% modification, are permitted provided that the following conditions are +% met: +% +% * Redistributions of source code must retain the above copyright +% notice, this list of conditions and the following disclaimer. +% * Redistributions in binary form must reproduce the above copyright +% notice, this list of conditions and the following disclaimer in +% the documentation and/or other materials provided with the distribution +% * Neither the name of the The University of New South Wales nor the names +% of its contributors may be used to endorse or promote products derived +% from this software without specific prior written permission. +% +% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +% AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +% IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +% ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +% LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +% CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +% SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +% INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +% CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +% ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +% POSSIBILITY OF SUCH DAMAGE. + +% Reference: Z. I. Botev, J. F. Grotowski and D. P. Kroese +% "KERNEL DENSITY ESTIMATION VIA DIFFUSION" ,Submitted to the +% Annals of Statistics, 2009 +if nargin<2 + n=2^8; +end +n=2^ceil(log2(n)); % round up n to the next power of 2; +N=size(data,1); +if (nargin<3), MIN_XY=[], end +if (nargin<4), MAX_XY=[], end +if isempty(MIN_XY)||isempty(MAX_XY) + MAX=max(data,[],1); MIN=min(data,[],1); Range=MAX-MIN; + if isempty(MAX_XY),MAX_XY=MAX+Range/4; end + if isempty(MIN_XY),MIN_XY=MIN-Range/4; end +end +scaling=MAX_XY-MIN_XY; +if N<=size(data,2) + error('data has to be an N by 2 array where each row represents a two dimensional observation') +end +transformed_data=(data-repmat(MIN_XY,N,1))./repmat(scaling,N,1); +if nargin>=5 + if EffectiveSampleSize>N + error('EffectiveSampleSize>size(Data,1)') + end + N=EffectiveSampleSize; +end +%bin the data uniformly using regular grid; +initial_data=ndhist(transformed_data,n); +% discrete cosine transform of initial data +a= dct2d(initial_data); +% now compute the optimal bandwidth^2 +I=(0:n-1).^2; A2=a.^2; + +t_star=fzero( @(t)(t-evolve(t)),[0,0.1]); + +p_02=func([0,2],t_star);p_20=func([2,0],t_star); p_11=func([1,1],t_star); +t_y=(p_02^(3/4)/(4*pi*N*p_20^(3/4)*(p_11+sqrt(p_20*p_02))))^(1/3); +t_x=(p_20^(3/4)/(4*pi*N*p_02^(3/4)*(p_11+sqrt(p_20*p_02))))^(1/3); +% smooth the discrete cosine transform of initial data using t_star +a_t=exp(-(0:n-1)'.^2*pi^2*t_x/2)*exp(-(0:n-1).^2*pi^2*t_y/2).*a; +% now apply the inverse discrete cosine transform +if nargout>1 + density=idct2d(a_t)*(numel(a_t)/prod(scaling)); + [X,Y]=meshgrid(MIN_XY(1):scaling(1)/(n-1):MAX_XY(1),MIN_XY(2):scaling(2)/(n-1):MAX_XY(2)); +end +bandwidth=sqrt([t_x,t_y]).*scaling; + +%####################################### + function [out,time]=evolve(t) + Sum_func = func([0,2],t) + func([2,0],t) + 2*func([1,1],t); + time=(2*pi*N*Sum_func)^(-1/3); + out=(t-time)/time; + end +%####################################### + function out=func(s,t) + if sum(s)<=4 + Sum_func=func([s(1)+1,s(2)],t)+func([s(1),s(2)+1],t); const=(1+1/2^(sum(s)+1))/3; + time=(-2*const*K(s(1))*K(s(2))/N/Sum_func)^(1/(2+sum(s))); + out=psi(s,time); + else + out=psi(s,t); + end + + end +%####################################### + function out=psi(s,Time) + % s is a vector + w=exp(-I*pi^2*Time).*[1,.5*ones(1,length(I)-1)]; + wx=w.*(I.^s(1)); + wy=w.*(I.^s(2)); + out=(-1)^sum(s)*(wy*A2*wx')*pi^(2*sum(s)); + end +%####################################### + function out=K(s) + out=(-1)^s*prod((1:2:2*s-1))/sqrt(2*pi); + end +%####################################### + function data=dct2d(data) + % computes the 2 dimensional discrete cosine transform of data + % data is an nd cube + [nrows,ncols]= size(data); + if nrows~=ncols + error('data is not a square array!') + end + % Compute weights to multiply DFT coefficients + w = [1;2*(exp(-i*(1:nrows-1)*pi/(2*nrows))).']; + weight=w(:,ones(1,ncols)); + data=dct1d(dct1d(data)')'; + function transform1d=dct1d(x) + + % Re-order the elements of the columns of x + x = [ x(1:2:end,:); x(end:-2:2,:) ]; + + % Multiply FFT by weights: + transform1d = real(weight.* fft(x)); + end + end +%####################################### + function data = idct2d(data) + % computes the 2 dimensional inverse discrete cosine transform + [nrows,ncols]=size(data); + % Compute wieghts + w = exp(i*(0:nrows-1)*pi/(2*nrows)).'; + weights=w(:,ones(1,ncols)); + data=idct1d(idct1d(data)'); + function out=idct1d(x) + y = real(ifft(weights.*x)); + out = zeros(nrows,ncols); + out(1:2:nrows,:) = y(1:nrows/2,:); + out(2:2:nrows,:) = y(nrows:-1:nrows/2+1,:); + end + end +%####################################### + function binned_data=ndhist(data,M) + % this function computes the histogram + % of an n-dimensional data set; + % 'data' is nrows by n columns + % M is the number of bins used in each dimension + % so that 'binned_data' is a hypercube with + % size length equal to M; + [nrows,ncols]=size(data); + bins=zeros(nrows,ncols); + for i=1:ncols + [dum,bins(:,i)] = histc(data(:,i),[0:1/M:1],1); + bins(:,i) = min(bins(:,i),M); + end + % Combine the vectors of 1D bin counts into a grid of nD bin + % counts. + binned_data = accumarray(bins(all(bins>0,2),:),1/nrows,M(ones(1,ncols))); + end + + + +end diff --git a/mcmc_simbio/src/gwmcmc/private/mxdom2md.xsl b/mcmc_simbio/src/gwmcmc/private/mxdom2md.xsl new file mode 100755 index 0000000..944ecfe --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/private/mxdom2md.xsl @@ -0,0 +1,252 @@ + + + + + ]> + + + + + + + + + +======================================= + + + + + + + + + + + + + + + +========================================================== + + + +---------------------------------------------------------- + + + + + + + + + + + + + + + + + + + + + + + + +** + + + + + + + + + + + +Contents +------------------------- + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +`````` + + +```matlab + +``` + + + +**** +`` +** + + + + + + + + + + + + + + + + + + + + + +```matlab + +``` + + + + + + + + + +``` + +``` + + + + + + + + +![IMAGE]() + + + + +```matlab``` + + + + + + + + + + + + + + + + + + + + + + + *\* + #\% + /\# + <\< + >\> + [\[ + ]\] + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/mcmc_simbio/src/gwmcmc/private/parseArgs.m b/mcmc_simbio/src/gwmcmc/private/parseArgs.m new file mode 100755 index 0000000..f43d22b --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/private/parseArgs.m @@ -0,0 +1,159 @@ +function ArgStruct=parseArgs(args,ArgStruct,varargin) +% Helper function for parsing varargin. +% +% +% ArgStruct=parseArgs(varargin,ArgStruct[,FlagtypeParams[,Aliases]]) +% +% * ArgStruct is the structure full of named arguments with default values. +% * Flagtype params is params that don't require a value. (the value will be set to 1 if it is present) +% * Aliases can be used to map one argument-name to several argstruct fields +% +% +% example usage: +% -------------- +% function parseargtest(varargin) +% +% %define the acceptable named arguments and assign default values +% Args=struct('Holdaxis',0, ... +% 'SpacingVertical',0.05,'SpacingHorizontal',0.05, ... +% 'PaddingLeft',0,'PaddingRight',0,'PaddingTop',0,'PaddingBottom',0, ... +% 'MarginLeft',.1,'MarginRight',.1,'MarginTop',.1,'MarginBottom',.1, ... +% 'rows',[],'cols',[]); +% +% %The capital letters define abrreviations. +% % Eg. parseargtest('spacingvertical',0) is equivalent to parseargtest('sv',0) +% +% Args=parseArgs(varargin,Args, ... % fill the arg-struct with values entered by the user +% {'Holdaxis'}, ... %this argument has no value (flag-type) +% {'Spacing' {'sh','sv'}; 'Padding' {'pl','pr','pt','pb'}; 'Margin' {'ml','mr','mt','mb'}}); +% +% disp(Args) +% +% +% +% +% Aslak Grinsted 2004 + +% ------------------------------------------------------------------------- +% Copyright (C) 2002-2004, Aslak Grinsted +% This software may be used, copied, or redistributed as long as it is not +% sold and this copyright notice is reproduced on each copy made. This +% routine is provided as is without any express or implied warranties +% whatsoever. + +persistent matlabver + +if isempty(matlabver) + matlabver=ver('MATLAB'); + matlabver=str2double(matlabver.Version); +end + +Aliases={}; +FlagTypeParams=''; + +if (length(varargin)>0) + FlagTypeParams=lower(strvcat(varargin{1})); %#ok + if length(varargin)>1 + Aliases=varargin{2}; + end +end + + +%---------------Get "numeric" arguments +NumArgCount=1; +while (NumArgCount<=size(args,2))&&(~ischar(args{NumArgCount})) + NumArgCount=NumArgCount+1; +end +NumArgCount=NumArgCount-1; +if (NumArgCount>0) + ArgStruct.NumericArguments={args{1:NumArgCount}}; +else + ArgStruct.NumericArguments={}; +end + + +%--------------Make an accepted fieldname matrix (case insensitive) +Fnames=fieldnames(ArgStruct); +for i=1:length(Fnames) + name=lower(Fnames{i,1}); + Fnames{i,2}=name; %col2=lower + Fnames{i,3}=[name(Fnames{i,1}~=name) ' ']; %col3=abreviation letters (those that are uppercase in the ArgStruct) e.g. SpacingHoriz->sh + %the space prevents strvcat from removing empty lines + Fnames{i,4}=isempty(strmatch(Fnames{i,2},FlagTypeParams)); %Does this parameter have a value? +end +FnamesFull=strvcat(Fnames{:,2}); %#ok +FnamesAbbr=strvcat(Fnames{:,3}); %#ok + +if length(Aliases)>0 + for i=1:length(Aliases) + name=lower(Aliases{i,1}); + FieldIdx=strmatch(name,FnamesAbbr,'exact'); %try abbreviations (must be exact) + if isempty(FieldIdx) + FieldIdx=strmatch(name,FnamesFull); %&??????? exact or not? + end + Aliases{i,2}=FieldIdx; + Aliases{i,3}=[name(Aliases{i,1}~=name) ' ']; %the space prevents strvcat from removing empty lines + Aliases{i,1}=name; %dont need the name in uppercase anymore for aliases + end + %Append aliases to the end of FnamesFull and FnamesAbbr + FnamesFull=strvcat(FnamesFull,strvcat(Aliases{:,1})); %#ok + FnamesAbbr=strvcat(FnamesAbbr,strvcat(Aliases{:,3})); %#ok +end + +%--------------get parameters-------------------- +l=NumArgCount+1; +while (l<=length(args)) + a=args{l}; + if ischar(a) + paramHasValue=1; % assume that the parameter has is of type 'param',value + a=lower(a); + FieldIdx=strmatch(a,FnamesAbbr,'exact'); %try abbreviations (must be exact) + if isempty(FieldIdx) + FieldIdx=strmatch(a,FnamesFull); + end + if (length(FieldIdx)>1) %shortest fieldname should win + [mx,mxi]=max(sum(FnamesFull(FieldIdx,:)==' ',2));%#ok + FieldIdx=FieldIdx(mxi); + end + if FieldIdx>length(Fnames) %then it's an alias type. + FieldIdx=Aliases{FieldIdx-length(Fnames),2}; + end + + if isempty(FieldIdx) + error(['Unknown named parameter: ' a]) + end + for curField=FieldIdx' %if it is an alias it could be more than one. + if (Fnames{curField,4}) + if (l+1>length(args)) + error(['Expected a value for parameter: ' Fnames{curField,1}]) + end + val=args{l+1}; + else %FLAG PARAMETER + if (l=6 + ArgStruct.(Fnames{curField,1})=val; %try the line below if you get an error here + else + ArgStruct=setfield(ArgStruct,Fnames{curField,1},val); %#ok <-works in old matlab versions + end + end + l=l+1+paramHasValue; %if a wildcard matches more than one + else + error(['Expected a named parameter: ' num2str(a)]) + end +end \ No newline at end of file diff --git a/mcmc_simbio/src/gwmcmc/private/publishexamples.m b/mcmc_simbio/src/gwmcmc/private/publishexamples.m new file mode 100755 index 0000000..ce49793 --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/private/publishexamples.m @@ -0,0 +1,83 @@ +%%publish examples. +% +% This code publishes the examples to the html folder. +% +function publishexamples + + +font='Rockwell'; +set(0,'defaultUicontrolFontName',font); +set(0,'defaultUitableFontName',font); +set(0,'defaultAxesFontName',font); +set(0,'defaultTextFontName',font); +set(0,'defaultUipanelFontName',font); +set(0,'defaultFigureColor',[1 1 1]); +set(0,'defaultAxesColor',[1 1 1]*.97); +set(0,'defaultaxesxcolor',[1 1 1]*.4); +set(0,'defaultaxesycolor',[1 1 1]*.4); +set(0, 'defaulttextcolor',[1 1 1]*.4); +set(0,'defaultaxesbox','off'); +set(0,'defaultlegendbox','off'); +set(0,'defaultaxestickdir','out','defaultAxesTickDirMode', 'manual'); +set(0,'defaultfigureinverthardcopy','off'); +set(0,'defaultfigurecolormap', hslcolormap('yr',[0 .7 .9],[.98 .2])); + +% reset random number generator... +s = RandStream('mt19937ar','Seed',0); +RandStream.setGlobalStream(s); + + + +% publishfile('..\ex_linefit','html') +% publishfile('..\ex_rosenbrockbanana','html') +% publishfile('..\ex_behappy','html') +publishfile('..\ex_breakfit','html') + +% publishfile('..\ex_linefit','markdown') +% publishfile('..\ex_rosenbrockbanana','markdown') +% publishfile('..\ex_behappy','markdown') +publishfile('..\ex_breakfit','markdown') + + + + +function publishfile(fname,outputformat) + +options=[]; +options.format= 'html'; % 'html' | 'doc' | 'pdf' | 'ppt' | 'xml' | 'latex' +%options.stylesheet= 'C:\Users\Aslak\Documents\MATLAB\gwmcmc\repoexclude\robotoslab.xsl'; % '' | an XSL filename (ignored when format = 'doc', 'pdf', or 'ppt') +options.outputDir= 'html'; +options.imageFormat= 'png'; % '' (default based on format) 'bmp' | 'eps' | 'epsc' | 'jpeg' | 'meta' | 'png' | 'ps' | 'psc' | 'tiff' +options.figureSnapMethod= 'print'; % 'entireGUIWindow'| 'print' | 'getframe' | 'entireFigureWindow' +options.useNewFigure= true; % true | false +options.showCode= true; % true | false +options.evalCode= true; % true | false +options.catchError= true; % true | false +options.createThumbnail= false; % true | false +options.maxOutputLines= inf; % Inf | non-negative integer + +switch outputformat + case 'markdown',markdown=true; + otherwise, markdown=false; +end + +if markdown + privatepath=fileparts(mfilename('fullpath')); + options.stylesheet= fullfile(privatepath,'mxdom2md.xsl'); + options.format= 'latex'; +end + +[examplepath,name]=fileparts(fname); +oldpath=pwd; +try + cd(examplepath) + publish(name,options); + if markdown + target=fullfile(options.outputDir,name); + movefile([target '.tex'],[target '.md']); + end +catch ME + cd(oldpath) + rethrow(ME) +end +cd(oldpath) diff --git a/mcmc_simbio/src/gwmcmc/private/subaxis.m b/mcmc_simbio/src/gwmcmc/private/subaxis.m new file mode 100755 index 0000000..fb1476f --- /dev/null +++ b/mcmc_simbio/src/gwmcmc/private/subaxis.m @@ -0,0 +1,106 @@ +function h=subaxis(varargin) +%SUBAXIS Create axes in tiled positions. (just like subplot) +% Usage: +% h=subaxis(rows,cols,cellno[,settings]) +% h=subaxis(rows,cols,cellx,celly[,settings]) +% h=subaxis(rows,cols,cellx,celly,spanx,spany[,settings]) +% +% SETTINGS: Spacing,SpacingHoriz,SpacingVert +% Padding,PaddingRight,PaddingLeft,PaddingTop,PaddingBottom +% Margin,MarginRight,MarginLeft,MarginTop,MarginBottom +% Holdaxis +% +% all units are relative (i.e. from 0 to 1) +% +% Abbreviations of parameters can be used.. (Eg MR instead of MarginRight) +% (holdaxis means that it wont delete any axes below.) +% +% +% Example: +% +% >> subaxis(2,1,1,'SpacingVert',0,'MR',0); +% >> imagesc(magic(3)) +% >> subaxis(2,'p',.02); +% >> imagesc(magic(4)) +% +% 2001-2014 / Aslak Grinsted (Feel free to modify this code.) + +f=gcf; + + + +UserDataArgsOK=0; +Args=get(f,'UserData'); +if isstruct(Args) + UserDataArgsOK=isfield(Args,'SpacingHorizontal')&isfield(Args,'Holdaxis')&isfield(Args,'rows')&isfield(Args,'cols'); +end +OKToStoreArgs=isempty(Args)|UserDataArgsOK; + +if isempty(Args)&&(~UserDataArgsOK) + Args=struct('Holdaxis',0, ... + 'SpacingVertical',0.05,'SpacingHorizontal',0.05, ... + 'PaddingLeft',0,'PaddingRight',0,'PaddingTop',0,'PaddingBottom',0, ... + 'MarginLeft',.1,'MarginRight',.1,'MarginTop',.1,'MarginBottom',.1, ... + 'rows',[],'cols',[]); +end +Args=parseArgs(varargin,Args,{'Holdaxis'},{'Spacing' {'sh','sv'}; 'Padding' {'pl','pr','pt','pb'}; 'Margin' {'ml','mr','mt','mb'}}); + +if (length(Args.NumericArguments)>2) + Args.rows=Args.NumericArguments{1}; + Args.cols=Args.NumericArguments{2}; +%remove these 2 numerical arguments + Args.NumericArguments={Args.NumericArguments{3:end}}; +end + +if OKToStoreArgs + set(f,'UserData',Args); +end + + +switch length(Args.NumericArguments) + case 0 + return % no arguments but rows/cols.... + case 1 + if numel(Args.NumericArguments{1}) > 1 % restore subplot(m,n,[x y]) behaviour + [x1 y1] = ind2sub([Args.cols Args.rows],Args.NumericArguments{1}(1)); % subplot and ind2sub count differently (column instead of row first) --> switch cols/rows + [x2 y2] = ind2sub([Args.cols Args.rows],Args.NumericArguments{1}(end)); + else + x1=mod((Args.NumericArguments{1}-1),Args.cols)+1; x2=x1; + y1=floor((Args.NumericArguments{1}-1)/Args.cols)+1; y2=y1; + end +% x1=mod((Args.NumericArguments{1}-1),Args.cols)+1; x2=x1; +% y1=floor((Args.NumericArguments{1}-1)/Args.cols)+1; y2=y1; + case 2 + x1=Args.NumericArguments{1};x2=x1; + y1=Args.NumericArguments{2};y2=y1; + case 4 + x1=Args.NumericArguments{1};x2=x1+Args.NumericArguments{3}-1; + y1=Args.NumericArguments{2};y2=y1+Args.NumericArguments{4}-1; + otherwise + error('subaxis argument error') +end + + +cellwidth=((1-Args.MarginLeft-Args.MarginRight)-(Args.cols-1)*Args.SpacingHorizontal)/Args.cols; +cellheight=((1-Args.MarginTop-Args.MarginBottom)-(Args.rows-1)*Args.SpacingVertical)/Args.rows; +xpos1=Args.MarginLeft+Args.PaddingLeft+cellwidth*(x1-1)+Args.SpacingHorizontal*(x1-1); +xpos2=Args.MarginLeft-Args.PaddingRight+cellwidth*x2+Args.SpacingHorizontal*(x2-1); +ypos1=Args.MarginTop+Args.PaddingTop+cellheight*(y1-1)+Args.SpacingVertical*(y1-1); +ypos2=Args.MarginTop-Args.PaddingBottom+cellheight*y2+Args.SpacingVertical*(y2-1); + +if Args.Holdaxis + h=axes('position',[xpos1 1-ypos2 xpos2-xpos1 ypos2-ypos1]); +else + h=subplot('position',[xpos1 1-ypos2 xpos2-xpos1 ypos2-ypos1]); +end + + +set(h,'box','on'); +%h=axes('position',[x1 1-y2 x2-x1 y2-y1]); +set(h,'units',get(gcf,'defaultaxesunits')); +set(h,'tag','subaxis'); + + + +if (nargout==0), clear h; end; + diff --git a/mcmc_simbio/src/integrableLHS.m b/mcmc_simbio/src/integrableLHS.m new file mode 100644 index 0000000..20aa57a --- /dev/null +++ b/mcmc_simbio/src/integrableLHS.m @@ -0,0 +1,137 @@ +function int_minit = integrableLHS(eMO, nW, paramranges,... + enames, ds, varargin) +%integrableLHS generate a set of integrable latin hypercube distributed +% parameter points for simbiology models +% eMO = exported model object +% nW = number of walkers + +% OLD VERSION: +% spread = log-spread of the parameter values around logp +% logp = the parameters that set the center of the latin hypercube +% NEW VERSION: +% just specify the parameter ranges explicitly. these are log transformed. + +% enames = names of estimated parameters +% ds = dosing strategy +% optional name value pair: 'multiopt_params', mop is a matrix containing +% the indiced for the parameters to use for each sub optimization problem. +% it has rows corresponding to each sub optimization problem, and each row +% us the indices of the parameters to be estimated for that problem, +% followed by zeros to pad. + + +p = inputParser; +p.addParameter('multiopt_params', [], @isnumeric) +p.addParameter('distribution', 'LHS', @ischar) +p.addParameter('width', .001, @isnumeric) + +p.parse(varargin{:}); + +p = p.Results; + +if iscell(eMO) + % TODO: update this multiopt version. for now, we just do the single + % opt version. + mop = p.multiopt_params; + assert(~isempty(mop),... + 'Please ensure that in multi optimization mode, the estimation structure is specified') + assert(isequal(length(logp), length(estNamesFull), size(mop,2),... + 'The length of logp, full estimation names, and the # columns in estimation structure must match')); + assert(isequal(length(eMO), size(mop, 1)), ... + 'The number of sub optimization problems must be consistent in the array of model objects and the estimation structure array'); + nOpt = length(eMO); + nparam = length(logp); % logp not defined? + % for each sub problem, get indices of the integrable points + % Most of the time we seem for have 70% or more integrability. So assume we lose 30% of points per sub problem. + npts = round(nW/max([0.65^nOpt 0.1])); %we allow for up to 10X multiplication, need to rethink the method if this does not suffice + lhsamp = paramranges*(lhsdesign(npts, nparam)-0.5); + minit=bsxfun(@plus,logp,lhsamp'); + IP = cell(nOpt,1); + % number of points to generate initially should be + tic + for kk = 1:nOpt + % here we reorder the parameters for the individual optimization problem. The estNames input to the sbiointegrable function below + % should be in the order that the parameters appear in the exported model object. + [~,~,V] = find(mop(kk,:)); + en = enames(V); + % order en correctly by working through valueInfo + count = 1; + for ii = 1:length(eMO.ValueInfo) + index1 = strcmp(en, eMO.ValueInfo(ii).Name); + index2 = find(index1); + if index2~= 0 + en2{count} = en{index2}; + count = count+1; + end + end + + IP{kk} = sbiointegrable(eMO{kk}, minit, en2, ds{kk}); + numInt{kk} = sum(all(IP{kk}==1, 2)); + intIx{kk} = find(all(IP{kk}==1, 2)); + end + toc ; + + simul_int = (1:npts)'; + for kk = 1:nOpt + simul_int = intersection(intIx{kk}, simul_int); + end + + if nW >= length(simulint) + int_minit = minit(:, simul_int); + warning('Number of integrable points less than number specified. Using only integrable points.') + + else + % randomly generate nW samples from list of integrable indices + r = randperm(length(simulint), nW); + int_minit = minit(:, simul_int(r)); + end + +else + + nparam = length(enames); + npts = round(nW*2); % can tolerate up to 50% non integrability. + % Increase the factor here if you need to tolerate more. + + % generate latin hyper cube distributed points to test for + % integrability + switch p.distribution + case 'LHS' + lhsamp = lhsdesign(npts, nparam); + lhsamp = lhsamp'; % nparam by npts matrix of LHS points + + minit= ... + lhsamp.*(repmat(paramranges(:, 2), 1, npts)-repmat(paramranges(:, 1), 1, npts))+... + repmat(paramranges(:, 1), 1, npts); + + case 'gaussian' + midpt = (repmat(paramranges(:, 2), 1, npts) +... + repmat(paramranges(:, 1), 1, npts))/2; + + minit = p.width*randn(nparam,npts)-p.width/2 + midpt; + + case 'unifrand' + midpt = (repmat(paramranges(:, 2), 1, npts) +... + repmat(paramranges(:, 1), 1, npts))/2; + width = (repmat(paramranges(:, 2), 1, npts) -... + repmat(paramranges(:, 1), 1, npts)); + minit = width.*rand(nparam,npts)-width/2 + midpt; + end + + + tic + IP = sbiointegrable(eMO, minit, enames, ds); + toc ; + + numInt = sum(all(IP==1, 2)); + intIx = find(all(IP==1, 2)); + if nW >= numInt + int_minit = minit(:, intIx); + else + % randomly generate nW samples from list of integrable indices + r = randperm(numInt, nW); + int_minit = minit(:, intIx(r)); + end +end + +end + diff --git a/mcmc_simbio/src/integrableLHS_v2.m b/mcmc_simbio/src/integrableLHS_v2.m new file mode 100644 index 0000000..65160e0 --- /dev/null +++ b/mcmc_simbio/src/integrableLHS_v2.m @@ -0,0 +1,171 @@ +function int_minit = integrableLHS_v2(mi, mai, ri, varargin) + % version 2 of the integrable LHS function. + % + % OLD HELP FILE: + %integrableLHS generate a set of integrable latin hypercube distributed + % parameter points for simbiology models + % eMO = exported model object + % ri.nW = number of walkers + + % OLD VERSION: + % spread = log-spread of the parameter values around logp + % logp = the parameters that set the center of the latin hypercube + % NEW VERSION: + % just specify the parameter ranges explicitly. these are log transformed. + + % enames = names of estimated parameters + % ds = dosing strategy + % optional name value pair: 'multiopt_params', mop is a matrix containing + % the indiced for the parameters to use for each sub optimization problem. + % it has rows corresponding to each sub optimization problem, and each row + % us the indices of the parameters to be estimated for that problem, + % followed by zeros to pad. + + + p = inputParser; + p.addParameter('multiopt_params', [], @isnumeric) + p.addParameter('distribution', 'LHS', @ischar) + p.addParameter('width', .001, @isnumeric) + + p.parse(varargin{:}); + + p = p.Results; + + % compute the reduced number of parameters. + % nreduc = sum(cellfun(@numel, mi.semanticGroups)) ... + % - numel(mi.semanticGroups); + + % reducedvec -> mastervec -> -> -> distribute across all models + % and geometries and doses. if the integration passes for all cases, + % then that points is integrable. Probably need to expand the candidate + % points to be 5x the original number of walkers. + % make sure the parameter order is correct. + % + [~, rpr, ~] = reduceMasterVec(mai); + nparam = size(rpr, 1); + + + npts = round(ri.nW*3); % can tolerate up to 67% non integrability. + % Increase the factor here if you need to tolerate more. + + % Compute the parameter sharing across all topologies and geometries for + % initial walker estimation purposess. + + % generate latin hyper cube distributed points to test for + % integrability + + switch p.distribution + case 'LHS' + lhsamp = lhsdesign(npts, nparam); + lhsamp = lhsamp'; % nparam by npts matrix of LHS points + + minit= ... + lhsamp.*(repmat(rpr(:, 2), 1, npts)-... + repmat(rpr(:, 1), 1, npts))+... + repmat(rpr(:, 1), 1, npts); + + case 'gaussian' + midpt = (repmat(rpr(:, 2), 1, npts) +... + repmat(rpr(:, 1), 1, npts))/2; + + minit = p.width*randn(nparam,npts)-p.width/2 + midpt; + + case 'unifrand' + midpt = (repmat(rpr(:, 2), 1, npts) +... + repmat(rpr(:, 1), 1, npts))/2; + width = (repmat(rpr(:, 2), 1, npts) -... + repmat(rpr(:, 1), 1, npts)); + minit = width.*rand(nparam,npts)-width/2 + midpt; + end + + % rebuild the master vector + minit_justEstParams = rebuildMasterVec(minit, mai); + % only has #estparams in the columns. + % not the full master vector. + % all the values should be in log space. + + % we build the full set of npts master vectors, arranged into a matrix + % of size #full master vector elements x npts: + mv = mai.masterVector; + minit_fixedAndEstParams = repmat(mv, 1, npts); + estParamsIx = setdiff((1:length(mv))', mai.fixedParams); + minit_fixedAndEstParams(estParamsIx, :) = minit_justEstParams; + % i changed this line on 3.12.2018. not sure if this break previous + % code or fixes it. check how the previous code was working in the + % first place. yeah i think the previous code, which was only the vnprl + % and the protein with the mrna parameters fixed to 10 sets of values + % all never had semantic groups for parameters to be ESTIMATED. + + % each column of minit, when distributed across topologies and geometries + % should work for every topology geometry pair. + + IProw_old = ones(1, npts); + for i = 1:length(mi) % for each topology + + %number of params in a given topo (model) + nParam_TopoGeom = size(mi(i).paramMaps, 1); + + % define dose matrix. This is correct, can also put a {} around the dosedvals + % matrix. Cant remove the braces around the names. + ds = struct('names', {mi(i).dosedNames}, 'dosematrix', mi(i).dosedVals); + + + for j = 1:size(mi(i).paramMaps, 2) % for each geometry, do everything. + + % build the minit for this topo - geom pair + % from the master one (ie, minit_fixedAndEstParams) + pIX_TopoGeom = mi(i).paramMaps(mi(i).orderingIx, j); + minitTopoGeom = minit_fixedAndEstParams(pIX_TopoGeom, :); + + % REORDER TO MAKE THEM CORRECT WITH THE EXPORTED MODEL'S + % parameter ORDERING. + % + % ??? MAYBE NOT NEEDED. JUST REORDER AFTR THE ESTIMATION. THE + % PARAM RANGERS ARE MAYBE ALREADY ORDERED. + +% minitTopoGeom_reordered = minitTopoGeom( + + + + % now simulate this the t-g for this minit and + % for each column of minit, report if it passes. + % IP is a matrix of dimension npts x # dose combinations. + % Three possible values: + % 0 = integration tol not met + % 1 = run successfully + % 2 = unknown error. + % + tic + IP = sbiointegrable(mi(i).emo, minitTopoGeom, mi(i).namesOrd, ds); + toc + IP = IP'; % make IP nDoses x npts + + + % numInt = sum(all(IP==1, 1)); + IProw_new = all(IP==1, 1); + % intIx = find(all(IP==1, 1)); + IPtemp = [IProw_old; IProw_new]; + IProw_old = all(IPtemp==1, 1); + end + % do this for each geometry and each topology. the set of points that + % pass every t-g pair are our valid starting point. could be stringent. + end + IProw = IProw_old; + numInt = sum(IProw==1); + disp([num2str(numInt) ' points out of ' num2str(size(minitTopoGeom,2)) ... + ' are integrable. Need ' num2str(ri.nW) ' walkers.']) + intIx = find(IProw==1); + if ri.nW >= numInt + % not enough integrable points + warning(['Not enough integrable points. '... + 'Reducing to maximal set of integrable points.']) + int_minit = minit_justEstParams(:, intIx); + else + % randomly generate ri.nW samples from list of integrable indices + r = randperm(numInt, ri.nW); + int_minit = minit_justEstParams(:, intIx(r)); + end + + +end + diff --git a/mcmc_simbio/src/loglike_gaussian.m b/mcmc_simbio/src/loglike_gaussian.m new file mode 100644 index 0000000..31fc448 --- /dev/null +++ b/mcmc_simbio/src/loglike_gaussian.m @@ -0,0 +1,34 @@ +function [llh] = loglike_gaussian(logp, stdev, ICarray,measuredspidx, tspan, ... + dataArray, genemodel, lognormvec) +% loglike_gaussian Gaussian log likelihood +% Compute the log likelihood of the data given the model parameters and +% the standard deviation. + +% Data is input as a array that is N x M x nIC. N is number of time points in tspan, +% M is the number of species species + +mV = mean(dataArray(:,measuredspidx,:),1); +meanVals = mean(mV,3); %meanVals is a row vec. + +wt = sum(meanVals)./meanVals; %hight mean = lower wt +relWt = wt/sum(wt); +CONC_temp = zeros(size(dataArray)); + + +nIC = size(ICarray,1); % number of different initial conditions. +% initial conditions are row vectors. Rows are different sets of initial conditions. + +for i = 1:nIC +[~,CONC_temp(:,:,i)] = genemodel(logp, ICarray(i,:), tspan); +end + +residuals = dataArray(:,measuredspidx,:) - CONC_temp(:,measuredspidx,:); +resi = bsxfun(@times, residuals, relWt); + +res = resi(:); +llh = sum(lognormvec(res, stdev)); + + + +end + diff --git a/mcmc_simbio/src/loglike_gaussian_joint.m b/mcmc_simbio/src/loglike_gaussian_joint.m new file mode 100644 index 0000000..3ac500c --- /dev/null +++ b/mcmc_simbio/src/loglike_gaussian_joint.m @@ -0,0 +1,60 @@ +function [llh] = loglike_gaussian_joint(jointlogp, stdev, ICarray,measuredspidx, tspan, ... + dataArray1,dataArray2, genemodel, lognormvec) +% +% loglike_gaussian Gaussian log likelihood +% Compute the log likelihood of the data given the model parameters and +% the standard deviation. +% +% Data is input as a array that is N x M x nIC. N is number of time points in tspan, +% M is the number of species species +% +% one param vector, used two populate two runs. two data sets. +logp1 = jointlogp([1:4 7:8]); + +mV = mean(dataArray1(:,measuredspidx,:),1); +meanVals = mean(mV,3); %meanVals is a row vec. + +wt = sum(meanVals)./meanVals; %hight mean = lower wt +relWt = wt/sum(wt); +CONC_temp = zeros(size(dataArray1)); + +nIC = size(ICarray,1); % number of different initial conditions. +% initial conditions are row vectors. Rows are different sets of initial conditions. + +for i = 1:nIC +[~,CONC_temp(:,:,i)] = genemodel(logp1, ICarray(i,:), tspan); +end + +residuals = dataArray1(:,measuredspidx,:) - CONC_temp(:,measuredspidx,:); +resi = bsxfun(@times, residuals, relWt); + +res1 = resi(:); + +logp2 = jointlogp([1:2 5:8]); +mV = mean(dataArray2(:,measuredspidx,:),1); +meanVals = mean(mV,3); %meanVals is a row vec. + +wt = sum(meanVals)./meanVals; %hight mean = lower wt +relWt = wt/sum(wt); +CONC_temp = zeros(size(dataArray2)); + + +nIC = size(ICarray,1); % number of different initial conditions. +% initial conditions are row vectors. Rows are different sets of initial conditions. + +for i = 1:nIC +[~,CONC_temp(:,:,i)] = genemodel(logp2, ICarray(i,:), tspan); +end + +residuals = dataArray2(:,measuredspidx,:) - CONC_temp(:,measuredspidx,:); +resi = bsxfun(@times, residuals, relWt); + +res2 = resi(:); + +res = [res1; res2]; +llh = sum(lognormvec(res, stdev)); + + + +end + diff --git a/mcmc_simbio/src/masterVecArray.m b/mcmc_simbio/src/masterVecArray.m new file mode 100644 index 0000000..1303a82 --- /dev/null +++ b/mcmc_simbio/src/masterVecArray.m @@ -0,0 +1,13 @@ +function [mvarray] = masterVecArray(marray, mai) +%UNTITLED2 Summary of this function goes here +% Detailed explanation goes here + +szm = size(marray); +mvarray = repmat(mai.masterVector, [1, szm(2:end)]) ; + +estParamsIx = setdiff((1:length(mai.masterVector))', mai.fixedParams); +mvarray(estParamsIx, :, :) = marray; + + +end + diff --git a/mcmc_simbio/src/mcmc_3D.m b/mcmc_simbio/src/mcmc_3D.m new file mode 100644 index 0000000..9363dd3 --- /dev/null +++ b/mcmc_simbio/src/mcmc_3D.m @@ -0,0 +1,22 @@ +function [fig, ax] = mcmc_3D(mcut, axL, titl) + % mcmc_3D takes an array of dimensions nPoints x 3 and generates a + % scatterplot. + % + % mcut is a nPoints x 3 matrix of points to scatterplot. + % axL must be a 1 x 3 cell array of the axis label strings + % corresponding to the 3 columns of mcus. + % titl is a title string. + % +fig = figure; + + +XX = mcut(:, 1); +YY = mcut(:, 2); +ZZ = mcut(:, 3); +scatter3(XX,YY,ZZ) +xlabel(axL{1}, 'FontSize', 20) +ylabel(axL{1}, 'FontSize', 20) +zlabel(axL{1}, 'FontSize', 20) +title(titl, 'FontSize', 20) +ax = gca; +end diff --git a/mcmc_simbio/src/mcmc_cut.m b/mcmc_simbio/src/mcmc_cut.m new file mode 100644 index 0000000..4f77463 --- /dev/null +++ b/mcmc_simbio/src/mcmc_cut.m @@ -0,0 +1,80 @@ +function outputslice = mcmc_cut(marray, pID, pRanges, varargin) +% Cut a lower dimensional slice out of a high dimensional +% parameter distribution. +% +% - marray: This function takes an array of numbers of size +% nPoints x nParam, and returns an array of size +% nPointsFewer x nParam, where nPointsFewer is the number of +% points satisfying the slicing condition described below. +% marray can also be an array of size nParam x nWalkers x nsteps, +% in which case the output is of the same dimension, or of dimension +% nParam x nWalkersSmaller, where nWalkersSmaller is the total +% number of points where the slicing condition holds. Note that the walker +% positions with respect to the steps is not preserved. +% +% - pID is a vector of parameter indices ranging from 1 to +% nParam. +% +% - pRanges is a 2 x length(pID) array of upper and lower bounds +% (first and second row resp) for parameter values corresponding +% to the parameters indexed by the array pID. +% +% SLICING CONDITION: +% This function returns the subset of points (rows) in marray +% for which the condition all(pranges(1, :) >= point(pID)) and +% all(pranges(2, :) <= point(pID)), where the vector 'point' +% is a row of the array of parameter points marray. (The Output +% 'outputslice' is a matrix of all such points.) +% +% Optional Input name-value pair +% 'marginalize' (default: false). If set to true, we remove +% the parameters with indices in the index vector pID from +% the returned array, and and in this case the output array +% OUTPUTSLICE will have dimensions +% nPointsFewer x (nParam - length(pID)). +% +p = inputParser; +p.addParameter('marginalize', false, @islogical) +p.parse(varargin{:}); +p = p.Results; +convertTo3D = false; +if ndims(marray) == 3 + convertTo3D = true; + szm = size(marray); + marray = marray(:, :)'; +end + +fun1 = @(rowvec) all(rowvec(pID) CAN JUST USE MCMC_TRAJECTORIES WITH A CUSTOM SPECIES ARGUMENT. +%mcmc_plotCustomSpecies Take an exported model, and a parameter array (2D), and plot the +%trajectories for up to the last 5 parameter points in the parameter array +% -- em: exported model object. +% -- m: parameter array, ORDERED (ie, parameters ordered by the following +% type of code: +% mvarray = masterVecArray(m, mai); +% marrayOrd = mvarray(mi.paramMaps(mi.orderingIx, 1),:,:); +% and then made 2D if needed. +% This array has dimensions: nPoints x nParam +% last 5 points get plotted. If there are less than 5 points, then +% all of them get plotted. +% -- mi: model_info struct. Must be a scalar struct. +% -- mai: master_info struct. must be a scalar struct. +% -- spl: cell array vector list of cell array vector lists of species to plot. +% The outer cell array is the number of subplots. The inner cell +% array is the list of species whose trajectories get summed in +% the plotting. This is the same specification as the +% measuredSpecies field in the model_info struct. +% + +% determine the subplot dimensions from the +nParam = size(m, 2); +[n1 n2] = twofactors(nParam); +if isprime(nParam) + n1 = ceil(nParam/5); % 5 columns + n2 = 5; +end +nPoints = size(m, 1); +if nPoints >4 + sIx = nPoints-4; + nSimCurves = 5; + +else + sIx = 1; + nSimCurves = nPoints; +end + +tv = di(mi.dataToMapTo).timeVector; +dose = mi.dosedVals; + +% simulate the model at the required points. +[da, idxnotused] = simulatecurves(em,m, nSimCurves, dose, tv, spl); + +for i = sIx:nPoints + + + + + +% plot the results + + + + +end + diff --git a/mcmc_simbio/src/mcmc_runsim.m b/mcmc_simbio/src/mcmc_runsim.m new file mode 100644 index 0000000..e39f602 --- /dev/null +++ b/mcmc_simbio/src/mcmc_runsim.m @@ -0,0 +1,216 @@ +function mi = mcmc_runsim(tstamp, projdir,di, mobj, mi, varargin) +% mcmc_runsim: run the mcmc estimation using the affine invariant +% ensemble sampler. +% +% INPUTS: +% +% tstamp: The time stamp string of the simulation generated by +% project_init. Format is 'yyyymmdd_HHMMSS' +% +% projdir: Directory where the generated data subdirectory +% (simdata_yyyymmdd_HHMMSS, where yyyymmdd_HHMMSS is +% the time stamp) will be created. +% +% tv: time vector in the units of seconds. +% +% da: A matlab array containing experimental data. This array +% has dimensions nTimePoints x nMS x nReplicates x nDoses +% where +% -nTimePoints: length(tv), i.e., the number of time points. +% -nMS: The number of measured species. Corresponds +% to values given in the mcmc_info and data_info +% structs. +% -nReplicates +% -nDoses: Number of dose combinations (initial conditions) +% +% mobj: Simbiology model object. +% +% mi: mcmc_info struct. Type 'help mcmc_info_dsg2014_mrna' or +% 'help mcmc_info_template' into the MATLAB command line to learn +% more. +% +% Optional name-value pair arguments: +% InitialDistribution: Initial distribution of walker points. This can be +% Latin hyprcube sampled (Value: 'LHS'), gaussian +% distributed (Value: 'gaussian') about the midpoint of +% mi.paramranges or uniformly distributed (Value: 'unifrand'). +% +% Width: Applies to the width of the gaussian or uniform random +% parameter distribution around the midpoint given by +% mi.paramranges. +% +% UserInitialize: The user provides a matrix of initial walker positions. +% When this input is specified, 'InitialDistribution' and +% 'Width' are ignored. +% +% FitOption: Allows for fitting to data in 3 modes: +% 'FitMedian': This mode computes the curvewise median (Default). +% of the data over the replicates, and fits the model +% to this. +% 'FitMean': Compute the mean of the replicates, and fit to this +% mean +% 'FitAll' Fit all the curves. + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: + +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. + +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +p = inputParser; +p.addParameter('InitialDistribution', 'LHS', @ischar); % LHS, gaussian, unifrand +p.addParameter('Width', 0.1, @isnumeric) +p.addParameter('UserInitialize', [], @isnumeric) +p.addParameter('FitOption', 'FitMedian', @ischar); % 'FitMean', 'FitAll' +p.parse(varargin{:}); +p = p.Results; + +% Transform Experimental data to the appropriate form for fitting +da = di.dataArray; +tv = di.timeVector; +switch p.FitOption + case 'FitMedian' + % Compute the curvewise median of the data. + [ix, mdvals] = medianIndex(sum(da, 1), 3); + da = medianReplicate(da, ix); + case 'FitMean' + da = mean(da, 3); + case 'FitAll' + % do nothing +% da = da; + otherwise + error('Invalid fit option. Read the documentation for how to specify inputs.') +end + +%% EXPORT MODEL object to get it ready for MCMC +% the resulting object is of class SimBiology.export.Model +% documentation: https://www.mathworks.com/help/simbio/ref/simbiology.export.model-class.html +% Sven Mesecke's blog post on using the exported model class for parameter inference applicaton. +% http://sveme.org/how-to-use-global-optimization-toolbox-algorithms-for-simbiology-parameter-estimation-in-parallel-part-i.html + +% export and accelerate simbiology model object using estimated species +% and dosing species names + +% select the parameters and species objects using the name array +ep = sbioselect(mobj, 'Type', 'parameter', 'Name', ... + mi.namesUnord);% est parameters + +es = sbioselect(mobj, 'Type', 'species', 'Name', ... + mi.namesUnord);% est species + +aps = [ep; es]; % active parameters and species + +% reorder the parameter and species so they are in the same order as that +% in the model. +eno = cell(length(aps), 1);% est names ordered + +for i = 1:length(aps) + eno{i} = aps(i).Name; +end + +% +ds = sbioselect(mobj, 'Type', 'species', 'Name', mi.dosedNames); + +emo = export(mobj, [ep; es; ds]); % exported model object, dosed species names. +SI = emo.SimulationOptions; +SI.StopTime = tv(end); +accelerate(emo); + +mi.names_ord = eno; % est names ordered. +mi.emo = emo; % exported model object. + + +%% COMPUTE INITIAL WALKER POSITIONS +ds = struct('names', {mi.dosedNames}, 'dosematrix', mi.dosedVals); +if isempty(p.UserInitialize) + minit = integrableLHS(emo, mi.nW, mi.paramRanges, eno, ... + ds, 'distribution', p.InitialDistribution, 'width', p.Width); +else + minit = p.UserInitialize; + % assume all the user defined points are integrable. +end + +%% SETUP FUNCTIONS +% setup the log prior, log likelihood function and lognormvec functions +lognormvec=@(res,sig) -(res./sig).^2 -log(sqrt(2*pi)).*sig; + +logprior = @(logp) all(mi.paramRanges(:, 1) < logp) &&... + all(logp < mi.paramRanges(:,2)); + +sigg = mi.stdev/mi.tightening; + + + +loglike = @(logp) gen_residuals_4(logp, emo, da, tv, ... + mi.dosedVals, mi.measuredSpecies, lognormvec, sigg); + + +% BURN IN: run the burn in simulation +if isempty(p.UserInitialize) + tic + [m] =gwmcmc_vse(minit,{logprior loglike},... + mi.npoints,... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning,... + 'Parallel', mi.parallel); + toc + + minit = m(:,:,end); + clear m +else + disp('User initialized intitial walker positions, skipping burn in phase') +end + + +% specify where to save things +cfname = cell(mi.nIter, 1); +specificproj = [projdir '/simdata_' tstamp]; + +%% we save useful variables in a one off manner here (ie, outside the loops) +fname = ['full_variable_set_' tstamp]; % filename +save([specificproj '/' fname]); + +% run the actual simuations, saving the data every iteration +for i = 1:mi.nIter % + if ~mod(i, 3) + fprintf('Pausing for 10 minutes before starting run number %d. \n', i); + pause(6) + end + + tic + fprintf('starting mcmc %d\n', i); + [m] = gwmcmc_vse(minit,{logprior loglike},... + mi.nPoints, ... + 'StepSize',mi.stepSize , ... + 'ThinChain',mi.thinning, 'Parallel', mi.parallel); + + fprintf('ending mcmc %d\n', i); + toc + fname = ['mcmc' tstamp '_ID' num2str(i)] ; + cfname{i} = fname; + save([specificproj '/' fname], 'm'); + % the only thing that is different in each run are the above + % pause(1) + minit = m(:,:,end);% + 0.1*randn(size(m(:,:,end-1))); + + clear m + +end + +% generate log file +mcmc_log(tstamp, projdir,specificproj, mi, di) +end + diff --git a/mcmc_simbio/src/mcmc_runsim_4.m b/mcmc_simbio/src/mcmc_runsim_4.m new file mode 100644 index 0000000..e445ad5 --- /dev/null +++ b/mcmc_simbio/src/mcmc_runsim_4.m @@ -0,0 +1,208 @@ +function mcmc_runsim_4(tstamp, projdir, tv,da, mobj, mi, varargin) +% mcmc_runsim run the mcmc estimation - This version works with (the +% prototyping version) the gen_residuals_4 function. +% +% mobj is the simbiology model +% mi is the mcmc info struct created using the mcmc_info_... function. +% funcs are the various functions needed, the prior, the lognormvec. + +% !LATER check if a directory to save the simulation results exists, +% and create it if it does not. +% for now we just assume that project_init did its job right. +% +% has the following name value input pairs. +% 'distribution', 'LHS' +% 'width', 0.1 +% +% +% +% order the est names in the same order as +% run an initial pass to find a set of points where integration tolerances +% are met. This valid points will be our initial walker positions. + +p = inputParser; +p.addParameter('distribution', 'LHS', @ischar) +p.addParameter('width', 0.1, @isnumeric) +p.addParameter('userinitialize', [], @isnumeric) + +p.parse(varargin{:}); + +p = p.Results; + +%% EXPORT MODEL object to get it ready for MCMC +% the resulting object is of class SimBiology.export.Model +% documentation: https://www.mathworks.com/help/simbio/ref/simbiology.export.model-class.html +% Sven Mesecke's blog post on using the exported model class for parameter inference applicaton. +% http://sveme.org/how-to-use-global-optimization-toolbox-algorithms-for-simbiology-parameter-estimation-in-parallel-part-i.html + +% export and accelerate simbiology model object using estimated species +% and dosing species names + +% select the parameters and species objects using the name array +ep = sbioselect(mobj, 'Type', 'parameter', 'Name', ... + mi.names_unord);% est parameters + +es = sbioselect(mobj, 'Type', 'species', 'Name', ... + mi.names_unord);% est species + +aps = [ep; es]; % active parameters and species + +% reorder the parameter and species so they are in the same order as that +% in the model. +eno = cell(length(aps), 1);% est names ordered + +for i = 1:length(aps) + eno{i} = aps(i).Name; +end + +% +ds = sbioselect(mobj, 'Type', 'species', 'Name', mi.dosednames); + +emo = export(mobj, [ep; es; ds]); % exported model object, dosed species names. +SI = emo.SimulationOptions; +SI.StopTime = tv(end); +accelerate(emo); + +mi.names_ord = eno; % est names ordered. +mi.emo = emo; % exported model object. + + +ds = struct('names', {mi.dosednames}, 'dosematrix', mi.dosedvals); +if isempty(p.userinitialize) + minit = integrableLHS(em, mi.nW, mi.paramranges, eno, ... + ds, 'distribution', p.distribution, 'width', p.width); +else + minit = p.userinitialize; + % assume all the user defined points are integrable. +end + +% setup the log prior, log likelihood function and lognormvec functions +lognormvec=@(res,sig) -(res./sig).^2 -log(sqrt(2*pi)).*sig; + +logprior = @(logp) all(mi.paramranges(:, 1) < logp) &&... + all(logp < mi.paramranges(:,2)); + +sigg = mi.stdev/mi.tightening; + +loglike = @(logp) gen_residuals_4(logp, em, da, tv, ... + mi.dosedvals, mi.measuredspecies, lognormvec, sigg); + + +% run the burn in simulation +if isempty(p.userinitialize) + tic + [m] =gwmcmc_vse(minit,{logprior loglike},... + mi.npoints,... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning,... + 'Parallel', mi.parallel); + toc + + minit = m(:,:,end); + clear m +else + disp('User initialized intitial walker positions, skipping burn in phase') +end + + +% specify where to save things +cfname = cell(mi.niter, 1); +specificproj = [projdir '/simdata_' tstamp]; + +%% we save useful variables in a one off manner here (ie, outside the loops) +fname = ['full_variable_set_' tstamp]; % filename +save([specificproj '/' fname]); + +% run the actual simuations, saving the data every iteration +for i = 1:mi.niter % + if ~mod(i, 3) + fprintf('Pausing for 10 minutes before starting run number %d. \n', i); + pause(600) + end + + tic + fprintf('starting mcmc %d\n', i); + [m] = gwmcmc_vse(minit,{logprior loglike},... + mi.npoints, ... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning, 'Parallel', mi.parallel); + + fprintf('ending mcmc %d\n', i); + toc + fname = ['mcmc' tstamp '_ID' num2str(i)] ; + cfname{i} = fname; + save([specificproj '/' fname], 'm'); + % the only thing that is different in each run are the above + % pause(1) + minit = m(:,:,end);% + 0.1*randn(size(m(:,:,end-1))); + + clear m + +end + +% generate log file + + + +%% generate simulation log file +% do this. also, write a log file using log4m and +% fprintf. +currdir = pwd; +cd(specificproj); +fileID = fopen(['summary_' tstamp '.txt'],'w'); + +% !TODO FIX THIS. REALLY SHOULD NOT BE LIKE THIS. +% siminfo = {'MG apt and GFP', '''data_dsg2014'''}; +% fS = 'MCMC estimation for %s, using the TXTL modeling toolbox \n'; +% fprintf(fileID,fS,siminfo{1}); +% fS = 'Data from file %s \n'; +% fprintf(fileID,fS, siminfo{2}); + +fS = ['################################################ \n' ... + 'The estimated species and parameters are: \n']; +fprintf(fileID,fS); +fS = '%s in range [%0.5g %0.5g] \n'; +[nn] = length(eno); +for i = 1:nn + fprintf(fileID,fS,eno{i}, exp(mi.paramranges(i, 1)),... + exp(mi.paramranges(i, 2))); +end + +fS = ['################################################ \n' ... + 'Dosed Species are: \n']; +fprintf(fileID,fS); +fS = '%s \n'; +[nn] = length(ds.names); +for i = 1:nn + fprintf(fileID,fS,ds.names{i}); +end + +% do the measured species later: +% fS = ['################################################ '... +% '\nMeasured Species are: \n']; +% fprintf(fileID,fS); +% fS = '%s \n'; +% [nn] = length(mi.measuredspecies); +% for i = 1:nn +% fprintf(fileID,fS,mi.measuredspecies{i}); +% end + +fS = ['################################################ \n'... + 'Simulation Parameters are: \n']; +fprintf(fileID,fS); +fprintf(fileID,'path: %s \n', projdir); +fprintf(fileID,'stdev: %3.1f \n', mi.stdev); +fprintf(fileID,'number of walkers: %d \n', mi.nW); +fprintf(fileID,'step size:%4.2f \n', mi.stepsize); +fprintf(fileID,'tightening: %4.2f \n', mi.tightening); +fprintf(fileID,'number of repeats: %d \n', mi.niter); +fprintf(fileID,'thinning: %d \n', mi.thinning); +fprintf(fileID,'points per iter: %d \n', mi.npoints); + +% MAP estimates +% median +fclose(fileID); +cd(currdir) + +end + diff --git a/mcmc_simbio/src/mcmc_runsim_5.m b/mcmc_simbio/src/mcmc_runsim_5.m new file mode 100644 index 0000000..4c208f4 --- /dev/null +++ b/mcmc_simbio/src/mcmc_runsim_5.m @@ -0,0 +1,179 @@ +function mcmc_runsim_5(tstamp, projdir, tv,da, em, mi, pmap, varargin) +%mcmc_runsim run the mcmc estimation - This version works with (the +%prototyping version) the gen_residuals_5 function. +% +% em is the exported simbiology model +% mi is the mcmc info struct created using the mcmc_info_... function. +% funcs are the various functions needed, the prior, the lognormvec. + +% !LATER check if a directory to save the simulation results exists, +% and create it if it does not. +% for now we just assume that project_init did its job right. +% +% has the following name value input pairs. +% 'distribution', 'LHS' +% 'width', 0.1 +% +% +% order the est names in the same order as +% run an initial pass to find a set of points where integration tolerances +% are met. This valid points will be our initial walker positions. + +p = inputParser; +p.addParameter('distribution', 'LHS', @ischar) +p.addParameter('width', 0.1, @isnumeric) +p.addParameter('userinitialize', [], @isnumeric) + +p.parse(varargin{:}); + +p = p.Results; +eno = mi.names_ord; +ds = struct('names', {mi.dosednames}, 'dosematrix', mi.dosedvals); + +if isempty(p.userinitialize) + % need to generalize this to the multi geometry case. + minit = integrableLHS(em, mi.nW, mi.paramranges, eno, ... + ds, 'distribution', p.distribution, 'width', p.width); +else + minit = p.userinitialize; + % assume all the user defined points are integrable. +end + +% setup the log prior, log likelihood function and lognormvec functions +lognormvec=@(res,sig) -(res./sig).^2 -log(sqrt(2*pi)).*sig; + +espix = pmap{1}; +cspix = pmap{2}; +nExt = size(da, 5); + + +% nparam = nExt*length(espix) + length(cspix); +paramranges = mi.paramranges; + +priorboundz = [repmat(paramranges(espix,:), length(espix),1) ; paramranges(cspix,:)]; + + +logprior = @(logp) all(priorboundz(:, 1) < logp) &&... + all(logp < priorboundz(:,2)); + +sigg = mi.stdev/mi.tightening; + +loglike = @(logp) gen_residuals_5(logp, em, da, tv, ... + mi.dosedvals, mi.measuredspecies, lognormvec, sigg, pmap ); + + +% run the burn in simulation +if isempty(p.userinitialize) + tic + [m] =gwmcmc_vse(minit,{logprior loglike},... + mi.npoints,... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning,... + 'Parallel', mi.parallel); + toc + + minit = m(:,:,end); + clear m +else + disp('User initialized intitial walker positions, skipping burn in phase') +end + + + % run the actual simuations, saving the data every iteration + cfname = cell(mi.niter, 1); + specificproj = [projdir '/simdata_' tstamp]; + for i = 1:mi.niter % + pause(10) + tic + disp(sprintf('starting mcmc %d\n', i)); + [m] = gwmcmc_vse(minit,{logprior loglike},... + mi.npoints, ... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning, 'Parallel', mi.parallel); + + disp(sprintf('ending mcmc %d\n', i)); + toc + fname = ['mcmc' tstamp '_ID' num2str(i)] ; + cfname{i} = fname; + save([specificproj '/' fname], 'm'); + % the only thing that is different in each run are the above +% pause(1) + minit = m(:,:,end);% + 0.1*randn(size(m(:,:,end-1))); + + clear m + + end + +% generate log file + + +%% we save all the other things here (ie, outside the loops +fname = ['full_variable_set_' tstamp]; % filename +save([specificproj '/' fname]); + + + +%% generate simulation log file + % do this. also, write a log file using log4m and +% fprintf. +currdir = pwd; +cd(specificproj); +fileID = fopen(['summary_' tstamp '.txt'],'w'); + +% !TODO FIX THIS. REALLY SHOULD NOT BE LIKE THIS. +% siminfo = {'MG apt and GFP', '''data_dsg2014'''}; +% fS = 'MCMC estimation for %s, using the TXTL modeling toolbox \n'; +% fprintf(fileID,fS,siminfo{1}); +% fS = 'Data from file %s \n'; +% fprintf(fileID,fS, siminfo{2}); + + +% !TODO : THIS IS NOT RIGHT. +fS = ['################################################ \n' ... +'The estimated species and parameters are: \n']; +fprintf(fileID,fS); +fS = '%s in range [%0.5g %0.5g] \n'; +[nn] = length(eno); +for i = 1:nn + fprintf(fileID,fS,eno{i}, exp(mi.paramranges(i, 1)),... + exp(mi.paramranges(i, 2))); +end + +fS = ['################################################ \n' ... + 'Dosed Species are: \n']; +fprintf(fileID,fS); +fS = '%s \n'; +[nn] = length(ds.names); +for i = 1:nn + fprintf(fileID,fS,ds.names{i}); +end + +% do the measured species later: +% fS = ['################################################ '... +% '\nMeasured Species are: \n']; +% fprintf(fileID,fS); +% fS = '%s \n'; +% [nn] = length(mi.measuredspecies); +% for i = 1:nn +% fprintf(fileID,fS,mi.measuredspecies{i}); +% end + +fS = ['################################################ \n'... + 'Simulation Parameters are: \n']; +fprintf(fileID,fS); +fprintf(fileID,'path: %s \n', projdir); +fprintf(fileID,'stdev: %3.1f \n', mi.stdev); +fprintf(fileID,'number of walkers: %d \n', mi.nW); +fprintf(fileID,'step size:%4.2f \n', mi.stepsize); +fprintf(fileID,'tightening: %4.2f \n', mi.tightening); +fprintf(fileID,'number of repeats: %d \n', mi.niter); +fprintf(fileID,'thinning: %d \n', mi.thinning); +fprintf(fileID,'points per iter: %d \n', mi.npoints); + +% MAP estimates +% median +fclose(fileID); +cd(currdir) + +end + diff --git a/mcmc_simbio/src/mcmc_runsim_OLD.m b/mcmc_simbio/src/mcmc_runsim_OLD.m new file mode 100644 index 0000000..8619daf --- /dev/null +++ b/mcmc_simbio/src/mcmc_runsim_OLD.m @@ -0,0 +1,135 @@ +function mcmc_runsim(tstamp, projdir, tv,da, em, mi) +%mcmc_runsim run the mcmc estimation +% em is the exported simbiology model +% mi is the mcmc info struct created using the mcmc_info_... function. +% funcs are the various functions needed, the prior, the lognormvec. + +% !LATER check if a directory to save the simulation results exists, and create it +% if it does not. +% for now we just assume that project_init did its job right. + + +% order the est names in the same order as +% run an initial pass to find a set of points where integration tolerances +% are met. This valid points will be our initial walker positions. +eno = mi.names_ord; +ds = struct('names', {mi.dosednames}, 'dosematrix', mi.dosedvals); +minit = integrableLHS(em, mi.nW, mi.paramranges, eno, ... + ds); + +% setup the log prior, log likelihood function and lognormvec functions +lognormvec=@(res,sig) -(res./sig).^2 -log(sqrt(2*pi)).*sig; + +logprior = @(logp) all(mi.paramranges(:, 1) < logp) &&... + all(logp < mi.paramranges(:,2)); + +sigg = mi.stdev/mi.tightening; + +% in the future, change this to use input supplied functions in place of +% lognormvec and gen_residuals_3. +loglike = @(logp) sum(lognormvec(gen_residuals_3(logp, em, da, tv, ... + mi.dosedvals, mi.measuredspecies),sigg)); + +% run the burn in simulation + + tic + [m] =gwmcmc_vse(minit,{logprior loglike},... + mi.npoints,... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning,... + 'Parallel', mi.parallel); + toc + +% pause(10) + + minit = m(:,:,end); + clear m + % run the actual simuations, saving the data every iteration + cfname = cell(mi.niter, 1); + specificproj = [projdir '/simdata_' tstamp]; + for i = 1:mi.niter % + tic + disp(sprintf('starting mcmc %d\n', i)); + [m, ~, ~, ~] = gwmcmc_vse(minit,{logprior loglike},... + mi.npoints, ... + 'StepSize',mi.stepsize , ... + 'ThinChain',mi.thinning, 'Parallel', mi.parallel); + + disp(sprintf('ending mcmc %d\n', i)); + toc + fname = ['mcmc' tstamp '_ID' num2str(i)] ; + cfname{i} = fname; + save([specificproj '/' fname], 'm'); + % the only thing that is different in each run are the above +% pause(1) + minit = m(:,:,end);% + 0.1*randn(size(m(:,:,end-1))); + + clear m + end + +% generate log file + + +%% we save all the other things here (ie, outside the loops +fname = ['full_variable_set_' tstamp]; % filename +save([specificproj '/' fname]); + + + +%% generate simulation log file + % do this. also, write a log file using log4m and +% fprintf. +currdir = pwd; +cd(specificproj); +fileID = fopen(['summary_' tstamp '.txt'],'w'); + +siminfo = {'MG apt and GFP', '''data_dsg2014'''}; +fS = 'MCMC estimation for %s, using the TXTL modeling toolbox \n'; +fprintf(fileID,fS,siminfo{1}); +fS = 'Data from file %s \n'; +fprintf(fileID,fS, siminfo{2}); + +fS = '################################################ \nThe estimated species and parameters are: \n'; +fprintf(fileID,fS); +fS = '%s in range [%0.5g %0.5g] \n'; +[nn] = length(eno); +for i = 1:nn + fprintf(fileID,fS,eno{i}, exp(mi.paramranges(i, 1)),... + exp(mi.paramranges(i, 2))); +end + +fS = '################################################ \nDosed Species are: \n'; +fprintf(fileID,fS); +fS = '%s \n'; +[nn] = length(ds.names); +for i = 1:nn + fprintf(fileID,fS,ds.names{i}); +end + +% do the measured species later: +% fS = '################################################ \nMeasured Species are: \n'; +% fprintf(fileID,fS); +% fS = '%s \n'; +% [nn] = length(mi.measuredspecies); +% for i = 1:nn +% fprintf(fileID,fS,mi.measuredspecies{i}); +% end + +fS = '################################################ \nSimulation Parameters are: \n'; +fprintf(fileID,fS); +fprintf(fileID,'path: %s \n', projdir); +fprintf(fileID,'stdev: %3.1f \n', mi.stdev); +fprintf(fileID,'number of walkers: %d \n', mi.nW); +fprintf(fileID,'step size:%4.2f \n', mi.stepsize); +fprintf(fileID,'tightening: %4.2f \n', mi.tightening); +fprintf(fileID,'number of repeats: %d \n', mi.niter); +fprintf(fileID,'thinning: %d \n', mi.thinning); +fprintf(fileID,'points per iter: %d \n', mi.npoints); + +% MAP estimates +% median +fclose(fileID); +cd(currdir) + +end + diff --git a/mcmc_simbio/src/mcmc_runsim_v2.m b/mcmc_simbio/src/mcmc_runsim_v2.m new file mode 100644 index 0000000..5cae907 --- /dev/null +++ b/mcmc_simbio/src/mcmc_runsim_v2.m @@ -0,0 +1,320 @@ +function mi = mcmc_runsim_v2(tstamp, projdir, data_info, mcmc_info, varargin) +% version 2 of the runsim file. +% This version is for the multi modal version of the mcmc problem. +% +% +% mcmc_runsim: run the mcmc estimation using the affine invariant +% ensemble sampler. +% +% INPUTS: +% +% tstamp: The time stamp string of the simulation generated by +% project_init. Format is 'yyyymmdd_HHMMSS' +% +% projdir: Directory where the generated data subdirectory +% (simdata_yyyymmdd_HHMMSS, where yyyymmdd_HHMMSS is +% the time stamp) will be created. +% +% tv: time vector in the units of seconds. +% +% da: A matlab array containing experimental data. This array +% has dimensions nTimePoints x nMS x nReplicates x nDoses +% where +% -nTimePoints: length(tv), i.e., the number of time points. +% -nMS: The number of measured species. Corresponds +% to values given in the mcmc_info and data_info +% structs. +% -nReplicates +% -nDoses: Number of dose combinations (initial conditions) +% +% mobj: Simbiology model object. +% +% mi: mcmc_info struct. Type 'help mcmc_info_dsg2014_mrna' or +% 'help mcmc_info_template' into the MATLAB command line to learn +% more. +% +% Optional name-value pair arguments: +% InitialDistribution: Initial distribution of walker points. This can be +% Latin hyprcube sampled (Value: 'LHS'), gaussian +% distributed (Value: 'gaussian') about the midpoint of +% mi.paramranges or uniformly distributed (Value: 'unifrand'). +% +% Width: Applies to the width of the gaussian or uniform random +% parameter distribution around the midpoint given by +% mi.paramranges. +% +% UserInitialize: The user provides a matrix of initial walker positions. +% When this input is specified, 'InitialDistribution' and +% 'Width' are ignored. +% +% FitOption: Allows for fitting to data in 3 modes: +% 'FitMedian': This mode computes the curvewise median (Default). +% of the data over the replicates, and fits the model +% to this. +% 'FitMean': Compute the mean of the replicates, and fit to this +% mean +% 'FitAll' Fit all the curves. + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: + +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. + +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +p = inputParser; +p.addParameter('InitialDistribution', 'LHS', @ischar); % LHS, gaussian, unifrand +p.addParameter('Width', 0.1, @isnumeric) +p.addParameter('UserInitialize', [], @isnumeric) +p.addParameter('FitOption', 'FitMedian', @ischar); % 'FitMean', 'FitAll' +p.addParameter('pausemode', false, @islogical) +p.addParameter('DoseNormalization', false, @islogical) + +p.addParameter('multiplier', 1, @isnumeric); % every 10 iterations, +% the stepsize is multiplied by this number. Set it to something like 2 or 4 +% in the hope that the walkers +% will be able to explore more of the space during this iteration. +p.parse(varargin{:}); +p = p.Results; + +% get the three objects. +ri = mcmc_info.runsim_info; +mi = mcmc_info.model_info; +mai = mcmc_info.master_info; + +[da, mi, tv] = exportmobj(mi, data_info, p.FitOption); +di = data_info; +% +% nTopo = length(mi); +% nGeom = zeros(length(mi),1); +% for i = 1:nTopo +% nGeom(i) = length(mi(i).dataToMapTo); +% end +% +% % V2 data +% % for each topology geometry pair, compute the data to fit +% % ASSUME that the data dimensions across geometries is the same +% % only differs across topologies at most. +% % Despite this, we allow for different geometries within a topology +% % to point to different data info array elements, just that all of +% % these elements must have the same number of timepoints, measured species, +% % replicates and dosing combinations. (version 3 of this code can be even +% % more general, with each topo-geom pair getting its own cell. this will be +% % slower.) +% +% da = cell(nTopo, 1); +% +% for i = 1:nTopo % each topology +% % each of the nGeom(i) geometries has a data info array element it points to. +% % since the dimensions of these is assumed to be equal, we just use the first +% % one to set the empty array: +% data_info_element = mi(i).dataToMapTo(1); +% currda = data_info(data_info_element).dataArray; +% % Transform Experimental - compute mean or median or nothing +% currda = computeFitOption(currda, p.FitOption); +% da{i} = currda; +% tv{i} = data_info(data_info_element).timeVector; +% for j = 2:nGeom(i) % each geometry +% data_info_element = mi(i).dataToMapTo(j); +% currda = data_info(data_info_element).dataArray; +% % Transform Experimental - compute mean or median or nothing +% currda = computeFitOption(currda, p.FitOption); +% % concatenate in the 5th dimension (the geometries dimension.) +% da{i} = cat(5, da{i}, currda); +% end +% % EXPORT MODEL object to get it ready for MCMC +% % the resulting object is of class SimBiology.export.Model +% % documentation: +% % https://www.mathworks.com/help/simbio/ref/simbiology.export.model-class.html +% % Sven Mesecke's blog post on using the exported model class for +% % parameter inference applicaton. +% % http://sveme.org/how-to-use-global-optimization-toolbox-algorithms-for- +% % simbiology-parameter-estimation-in-parallel-part-i.html +% +% mobj = mi(i).modelObj; +% +% enuo = mi(i).namesUnord;% estimated names unordered +% +% ep = sbioselect(mobj, 'Type', 'parameter', 'Name', ... +% mi(i).namesUnord);% est parameters +% +% es = sbioselect(mobj, 'Type', 'species', 'Name', ... +% mi(i).namesUnord);% est species +% +% aps = [ep; es]; % active parameters and species +% +% % reorder the parameter and species so they are in the same order as that +% % in the model. +% eno = cell(length(aps), 1);% est names ordered +% ds = sbioselect(mobj, 'Type', 'species', 'Name', mi(i).dosedNames); +% emo{i} = export(mobj, [ep; es; ds]); % exported model object, dosed species names. +% SI = emo{i}.SimulationOptions; +% +% % each of the nGeom(i) geometries has a data info array element it points to. +% % since the dimensions of these is assumed to be equal, we just use the first +% % one to set the empty array: +% data_info_element = mi(i).dataToMapTo(1); +% SI.StopTime = data_info(data_info_element).timeVector(end); +% accelerate(emo{i}); +% +% mi(i).emo = emo{i}; % exported model object. +% orderingIx = zeros(length(aps),1); +% orderingIx2 = orderingIx; +% for k = 1:length(aps) +% eno{k} = aps(k).Name; +% for kk = 1:length(enuo) +% if strcmp(eno{k}, enuo{kk} ) +% orderingIx(k) = kk; % eno = enuo(orderingIx); +% % the kth element of orderingIx is kk. so the kth element of +% % enuo(orderingIx) is enuo(kk). But this is just eno(k). And eno +% % has the property of the kth element being eno(k). (as seen +% % from "if eno{k} == enuo{kk} ") +% +% orderingIx2(kk) = k; %i.e., enuo = eno(orderingIx2); +% % the kkth element of orderingIx2 is k. so the kk th element of +% % eno(orderingIx2) is eno(k). But the vector with this property is +% % simply enuo. (as seen from "if eno{k} == enuo{kk} ") +% end +% end +% end +% +% mi(i).orderingIx = orderingIx; % these two arrays will be VERY useful. +% mi(i).orderingIx2 = orderingIx2; % this one being the second. +% mi(i).namesOrd = eno; % est names ordered. +% end + +% V2 model export - export with all parameters, and set fixed and estimated +% parameters per iteration of mcmc. One exported model for each topology +% make sure the reordering of the parameters when exporting is carefully +% taken care of. + +%% COMPUTE INITIAL WALKER POSITIONS +% a very cool idea: if a parameter with the same semantic meaning +% appears as individual parameters across different topologies and +% geometries, then we expect that generally the final estimated values +% should be close between the different verions of the parameter +% (the number of RNAP in extract 1, extract 2 and so on should be similar +% to an order of magnitude, even if they are differet.) +% To be clear, this is not saying that parameters that get shared across +% topologies and geometries need to be close. Those are EQUAL BY DEFINITION. +% All it is saying is that within a master vector if a parameter is semantically +% similar to another, they should have the same STARTING values. + +if isempty(p.UserInitialize) + minit = integrableLHS_v2(mi, mai, ri, ... + 'distribution', p.InitialDistribution, ... + 'width', p.Width); + +else + minit = p.UserInitialize; + % assume all the user defined points are integrable. +end + +%% SETUP FUNCTIONS +% setup the log prior, log likelihood function and lognormvec functions +lognormvec=@(res,sig) -(res./sig).^2 -log(sqrt(2*pi)).*sig; + +logprior = @(logp) all(mai.paramRanges(:, 1) < logp) &&... + all(logp < mai.paramRanges(:,2)); + +sigg = ri.stdev/ri.tightening; + + +% need to transform the data array to summary stats before +% sending it into the gen residuals function. +mv = mai.masterVector; +estParamIx = setdiff((1:length(mv))', mai.fixedParams); + +loglike = @(logp) gen_residuals_v2(logp, estParamIx, ... + mv, da,... + tv, mi, lognormvec, sigg); + +% BURN IN: run the burn in simulation +if isempty(p.UserInitialize) + tic + [m] =gwmcmc_vse(minit,{logprior loglike},... + ri.nPoints,... + 'StepSize',ri.stepSize , ... + 'ThinChain',ri.thinning,... + 'Parallel', ri.parallel); + toc + + minit = m(:,:,end); + clear m +else + disp('User initialized intitial walker positions, skipping burn in phase') +end + +% specify where to save things +cfname = cell(ri.nIter, 1); +specificproj = [projdir '/simdata_' tstamp]; + +%% we save useful variables in a one off manner here (ie, outside the loops) +fname = ['full_variable_set_' tstamp]; % filename +save([specificproj '/' fname]); + +% run the actual simuations, saving the data every iteration +for i = 1:ri.nIter % + disp(['Iteration number ' num2str(i) '.']); + if p.pausemode + if ~mod(i, 1) + fprintf('Pausing for 2 minutes before starting run number %d. \n', i); + pause(120) + end + end + + + if i == 3 || ~mod(i, 3) + ssize = p.multiplier*ri.stepSize; + fprintf('Mixup round! The step size for this iteration is set to \n %d * %d = %d.\n',... + p.multiplier, ri.stepSize, ssize); + else + ssize = ri.stepSize; + + + end + + + tic + fprintf('starting mcmc %d\n', i); + [m] = gwmcmc_vse(minit,{logprior loglike},... + ri.nPoints, ... + 'StepSize',ssize , ... + 'ThinChain',ri.thinning, 'Parallel', ri.parallel); + + fprintf('ending mcmc %d\n', i); + toc + fname = ['mcmc' tstamp '_ID' num2str(i)] ; + cfname{i} = fname; + save([specificproj '/' fname], 'm'); + % the only thing that is different in each run are the above + % pause(1) + minit = m(:,:,end);% + 0.1*randn(size(m(:,:,end-1))); + + clear m + +end + +% generate log file + if isempty(p.UserInitialize) + initialization_used = p.InitialDistribution; + else + initialization_used = 'User_initialized'; + end +mcmc_log_v2(tstamp,projdir, specificproj, mcmc_info, data_info, initialization_used); +end + + + diff --git a/mcmc_simbio/src/mcmc_traj_CustomSpecies.m b/mcmc_simbio/src/mcmc_traj_CustomSpecies.m new file mode 100644 index 0000000..dbc33e8 --- /dev/null +++ b/mcmc_simbio/src/mcmc_traj_CustomSpecies.m @@ -0,0 +1,924 @@ +function fighandle = mcmc_traj_CustomSpecies(em, mi, marray, titl, lgds, varargin) +% +% Plot the data time course trajectories and simulated model trajectories +% for each dose and measured species. The doses are arranged in rows of a +% subplot, or all collapsed into a single row. Each measured species has its +% own figure. +% +% Subplot arrangement: +% - Subplot Column: The column corresponds to a measured species. If there +% are multiple measured species, then multiple figures are generated. +% If the separateExpSim optional input parameter's value is set to +% true, then there are two columns, the first one corresponding the +% experimental data and the second corresponding to the simulated data. +% - Subplot Rows: Default is to use one row for each dose. But if the +% collateDoses input parameter is set to true, then all the doses +% get plotted on a single row. +% +% The experimental data and the simulation may be plotted in a few different +% ways: Mean + standard deviation, (curvewise) median + rest of the curves, +% just the curves, just the mean, just the median. +% The default is to use the median for the experimental data, and the mean +% + standard deviation for the simulated curves. +% The standard deviation is plotted as a shaded region. When the individual +% curves are plotted, they are plotted as thin lines: solid for experimental +% data and dotted for simulations. +% +% INPUTS +% em: The exported simbiology model object that is to be simulated. +% This is created when the mcmc_runsim is run, and the parameters, species an +% doses that can be set are predefined. +% +% data_info: This is the data info struct that contains the data and the +% related information. +% +% mcmc_info: This is the mcmc info struct that contains info on the mcmc +% run (including which species are dosed, measured etc.) +% +% marray: A numerical array of dimensions nParam x nWalkers x nSamp +% or nPoints x nParam. +% +% OPTIONAL NAME VALUE PAIRS +% collateDoses: Default is false. If true, all the doses get plotted on the +% same row of the subplots. +% +% separateExpSim: Default is false. If true, the experimental data and the +% simulation graphs are each given their own column. +% +% ExpMode: How to plot the experimental data. Can be 'mean', 'median' +% 'meanstd','medianstd', 'mediancurves', 'meancurves', 'curves' or 'none'. +% Default is 'median'. +% +% SimMode: How to plot the simulated data. Can be 'mean', 'median' +% 'meanstd','medianstd', 'mediancurves', 'meancurves', 'curves' or 'none'. +% Default is 'median'. +% If both the Exp and Sim Mode are none, or if the separateExpSim is set to +% true, and either mode are none, then an error is thrown. +% +% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % +% How to specify TITLES and LEGENDS. +% Titles and Legends depend on the combination of CollatedDoses and separateExpSim +% +% CollatedDoses 1; separateExpSim 1 +% The subplots within each species' figure are arranged as 1 x 2. +% +% Title: This should we a cell array of strings of size 1 x 2 x nMS, where nMS is the +% number of measured species. The first column corresponds to the experimental +% data, and the second to the simulated data. Each string should specify the +% measures species that is being displayed in the plot, and whether it is +% experimental or simulated data. Furthermore, You should also specify +% what kind of statistic is being shown: mean or median with standard +% deviation or just sample curves etc. +% Legends: This should be a cell array of strings of dimension 1 x nICs or +% 2 x nICs (the two rows corresponding to experimental data column, +% and simulated data column respectively). If either ExpMode or SimMode are +% 'none', then the dimension must be 1 x nICs. If the dimension is 1 x nICs, +% and both ExpMode and SimMode are specified, then the same legend is used for +% both. +% +% CollatedDoses 1; separateExpSim 0 +% Here the subplots are arranged as 1 by 1 for each species. +% +% Title: This should be a cell array of strings of size 1 x nMS, where nMS is the +% number of measured species. Each string corresponds to one measured species. +% You should also probably use this place to specify what the statistics being +% plotted are: (mean median or none) with (std, curves or none) for both +% experimental data and simulated data. +% +% Legends: This should be a cell array of strings of dimension 1 x (nICs + 1) +% The strings should describe the curves corresponding to Dose 1 of +% experimental data, Dose 1 of simulated data, Doses 2 to last of +% experimental data. For the simulated data doses 2 to last, the fact that +% the same colors are used for the same dose for both the simulated and +% experimental data should be used to interpret the simulated data curves. +% If either ExpMode or SimMode are 'none', then the dimension must +% be 1 x nICs. The same legend sting array is used for every measured +% species. +% +% CollatedDoses 0; separateExpSim 1 +% Here the number of subplots is nICs x 2 per figure, and there is one +% figure for each species. +% +% Title: This should be a cell array of size nICs x 2 x nMS, where nMS +% is the number of measured species. +% Legends: No legend array is needed. +% +% CollatedDoses 0; separateExpSim 0 +% Number of subplots: nICs x 1 per figure, and there is one figure for +% each species. +% +% Title: This should be a cell array of size nICs x 1 x nMS, where nMS +% is the number of measured species. +% Legends: No legend array is needed. In the first dose, the experimental +% data and simulated data plots are labeled as 'exp' and 'sim' +% respectively. +% +% +% +% +% +% +% If SimMode of ExpMode are 'curves', then the curves are used in the legend. +% If Either of them are none, then no legend is used. +% +% TITLES: +% +% +% +% If either the ExpMode or the SimMode are 'curves', then the +% + +% Specify the title and legend inputs as follows: +% For +% The legend that gets specified, if one is to be specified +% (see 'Conditions under which legends are included in the plots'), +% for both ExpMode and SimMode is as follows: +% 'mean', 'median': Specify legend for the mean or median curves. +% + +% +% If collateDoses is true, then the legends are the doses, otherwise the +% titles have the dose information, and the legends depend on the summary +% and spread statistics displayed in each subplot. +% +% The plot titles are taken from the data info struct's measuredNames field. +% If that field is empty, then the measured species field of the mcmc info +% struct is used. Dosing information is taken from the dosedNames and the +% dosedVals fields of the data info struct, and if these are not populated, +% then it is taken from the dosedNames and dosedVals fields of the mcmc_info +% struct. +% +% When the collateDoses option is false, i.e., each dose is +% is plotted in a separate row, then the dose values are also used in the +% title string. If collateDoses is true, then the dose info is used to produce +% the legend strings. Depending on the values of the options ExpSummary, ExpSpread, +% SimSummary, and SimSpread, we also include legends for the corresponding lines +% (if there are any) for the first dose. +% +% +% +% -------------------------------------------------------------------------- + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: + +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. + +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +%% precompute a bunch of things to set the defaults for this function +% Number of simulation curves to plot + +p = inputParser; +p.addParameter('nSimCurves', 50, @isnumeric); +p.addParameter('collateDoses', false, @islogical); +p.addParameter('separateExpSim', false, @islogical); +p.addParameter('just_data_info', false, @islogical); +p.addParameter('subplot_arrangement', [], @isnumeric); % [nrows ncols]. Must have nrows*ncols = ndoses. only gets used if collate doses is false +% Modes for the next two inputs: 'mean', 'median' 'meanstd', 'medianstd' +% 'mediancurves', 'curves' +p.addParameter('ExpMode', 'median', @ischar); +p.addParameter('SimMode', 'meanstd', @ischar); +p.addParameter('title', {}, @iscell); +p.addParameter('legends', {}, @iscell); +p.addParameter('savematlabfig', false, @islogical); % if this is true, then projdir and tstamp must be specified. +p.addParameter('savejpeg', false, @islogical); % if this is true, then projdir and tstamp must be specified. +p.addParameter('projdir', [], @ischar); +p.addParameter('tstamp', [], @ischar); +p.addParameter('extrafignamestring', [], @ischar); +p.parse(varargin{:}) +p = p.Results; + +if p.separateExpSim && (p.ExpMode == 'none' || p.SimMode == 'none') + error(['Cant have unspecified Experimental data or Simulation Data '... + 'if the separateExpSim is set to true']) +end + +if p.just_data_info + % just plot the data in data info + % One plot per measured species in each data info. All doses collated + % in the same plot. + fighandle = cell(length(di), 1); + + for dID = 1:length(di) + currdi = di(dID); + [expsummst, expspreadst] = computeDataStats(currdi.dataArray, p.ExpMode); + % dimensionLabels = currdi.dimensionLabels; + % expmax = computeMaxes(expsummst, expspreadst, p.ExpMode, dimensionLabels); + dNames = currdi.dosedNames; + dVals = currdi.dosedVals; + [ndNames, nICs] = size(dVals); + assert(length(dNames) == ndNames); + linehandle = zeros(nICs, 1); + % ptchhandle = zeros(nICs, 1); + nMS = length(currdi.measuredNames); + + tv = currdi.timeVector; + timeUnits = currdi.timeUnits; + tv = converttosec(tv, timeUnits); + colorz = parula(nICs+2); + for msnum = 1:nMS + fighandle{dID}(msnum) = figure; + ax = gca; + legendentry = cell(ndNames, 1); + for i=1:nICs + [ax, linehandle(i)] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + legendentry{i} = []; + + for dnID = 1:ndNames-1 + legendentry{i} = [legendentry{i} dNames{dnID} ' = ' num2str(dVals(dnID, i)) ', ']; + end + + legendentry{i} = [legendentry{i} dNames{ndNames} ' = ' num2str(dVals(ndNames,i ))]; + end + legend(linehandle, legendentry); + title(currdi.measuredNames{msnum}{1:end}) + + end + + + end + + +else + + % Compute the y axis limits for each measured species. All the y axis for a + % given measured species are set to the max y axis limits. + + %% set the number of curves to simulate + % number of walkers is the second dimension of the 3D version of the + % paraemter array OR if the parameter array is 2D, it is taken from the + % mi array. + if ndims(marray) == 3 + % compute the number of walkers + nWalkers = size(marray, 2); + m = marray(:,:)'; % the parameter array is now #points x #params + elseif ismatrix(marray) + % nWalkers = mi.nW; + m = marray; % assume that the 2 dims are correct - npoints x nparams + end + + % Number of curves to simulate is the minimum of the specified number and the + % number of available walkers. + % if p.nSimCurves > nWalkers + % p.nSimCurves = nWalkers; + % end + + if p.nSimCurves > size(m, 1) + p.nSimCurves = size(m, 1); + end + + %% initialize things + [nDSP, nICs] = size(mi.dosedVals); % number of dosed species, + % and number of dose combinations + + dn = mi.dosedNames; %cell array + dose = mi.dosedVals'; + % recall from the data_info documentation: + % 'dosedVals': A matrix of dose values of size + % # of dosed species by # of dose combinations + % thus dose has dimensions #combos x #species + + % convert time vector to seconds. + tv = di.timeVector; + timeUnits = di.timeUnits; + tv = converttosec(tv, timeUnits); + + nts = length(tv); + nMS = length(mi.measuredSpecies); + + % initialize arrays (nominal array is: + % timepoints x measured outputs x nSimCurves x doses ) + da = zeros(nts, nMS, p.nSimCurves, nICs); + ms = mi.measuredSpecies; + dv = mi.dosedVals; + %% + % figure out how to incorporate Extract differences too. + % !TODO + + %% Simulate the model for the given parameters. + % set parameters in the model + % for each dose simulate the model + % simulate the model + + [da, idxnotused] = simulatecurves(em,m, p.nSimCurves, dose, tv, ms); + % compute the simulation maxes, with the non - integrable points removed. + pointstouse = setdiff(1:(p.nSimCurves),idxnotused); + if isempty(pointstouse) + error('none of the points are integrable. Something has gone wrong...') + end + %% Compute relevant statistics depending on what is specified in the inputs. + % for the experimental data. + [expsummst, expspreadst] = computeDataStats(di.dataArray, p.ExpMode); + [simsummst, simspreadst] = computeDataStats(da, p.SimMode); + + dimensionLabels = {'time points', 'measured species', 'replicates', 'doses'}; + expmax = computeMaxes(expsummst, expspreadst, p.ExpMode, dimensionLabels); + simmax = computeMaxes(simsummst, simspreadst, p.SimMode, dimensionLabels); + colorz = parula(nICs+2); + colorz2 = summer(nICs+2); + colorz3 = winter(nICs+2); + fighandle = zeros(nMS, 1); + % titl = p.title; + % legs = p.legends; + + for msnum = 1:nMS + fighandle(msnum) = figure; + ax = gca; + if p.collateDoses + if p.separateExpSim + % SEPARATEEXPSIM 1 , COLLATEDOSES 1 + ax = subplot(1, 2, 1); % get axis + linehandle = zeros(nICs, 1); + ptchhandle = zeros(nICs, 1); + for i=1:nICs + [ax, linehandle(i)] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); + end + + legends([linehandle], legs(1,:)); + title(titl{1, 1, msnum}); + % !! p.legends must be a cell array of size 2 x nICs + % p.title must be a 2 x nMS cell array containing titles + % specifying experiment and simulation for a given measures species. + % can talk about what the summary and spread statistic refer to. + ax = subplot(1, 2, 2); + for i=1:nICs + [ax, linehandle] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '--',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', ':',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + end + % legends([linehandle(1); ptchhandle(1); linehandle(2:end)], + % legs(2,:)); + legends([linehandle], legs(2,:)); + title(titl{1, 2, msnum}) + + else + % SEPARATEEXPSIM 0 , COLLATEDOSES 1 + + linehandle = zeros(nICs, 1); + ptchhandle = zeros(nICs, 1); + + linehandle2 = zeros(nICs, 1); + ptchhandle2 = zeros(nICs, 1); + + for i=1:nICs + [ax, linehandle] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + end + + for i=1:nICs + [ax, linehandle2] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz2(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 1,... + 'SpreadColor', colorz2(i, :),... + 'SpreadLineStyle', ':',... + 'SpreadLineWidth', 0.25,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle2 + end + title(titl{msnum}); + % !! p.legends must be a legend array saying exp dose 1, + % sim dose 1, exp dose 2 to end. + legends([linehandle(1); linehandle2(1); linehandle(2:end)],... + p.legends); + end + else + if ~isempty(p.subplot_arrangement) + assert(numel(p.subplot_arrangement) == 2); + assert(prod(p.subplot_arrangement) == nICs); + pCell = num2cell(p.subplot_arrangement); + [nrows, ncols] = pCell{:}; + + + linehandle = zeros(nrows, ncols); + ptchhandle = zeros(nrows, ncols); + + linehandle2 = zeros(nrows, ncols); + ptchhandle2 = zeros(nrows, ncols); + + for i = 1:nICs + rowIX = floor((i-1)/ncols)+1; + colIX = i-floor((i-1)/ncols)*ncols; + + if p.separateExpSim + warning('Both subplot arrangement and separate experiment and sim are specified.\n Not going to do anything') + % future version : just make two figures. + + % SEPARATEEXPSIM 1 , COLLATEDOSES 0, + % subplot_arrangement specified. +% +% ind = (i-1)*2+1; +% ax = subplot(nICs, 2, ind); % exp data plot. +% [ax, linehandle] = ... +% plotintoaxis(ax, p.ExpMode,... +% tv, expsummst, expspreadst, ... +% i, msnum, ... +% 'LineColor', colorz(i, :),... +% 'LineStyle', '-',... +% 'LineWidth', 2,... +% 'SpreadColor', colorz(i, :),... +% 'SpreadLineStyle', '--',... +% 'SpreadLineWidth', 0.5,... +% 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle +% % set title +% title(titl{i, 1, msnum}); +% ind = (i-1)*2+2; +% ax = subplot(nICs, 2, ind); % simulated data plot +% [ax, linehandle] = ... +% plotintoaxis(ax, p.SimMode,... +% tv, simsummst, simspreadst, ... +% i, msnum, ... +% 'LineColor', colorz(i, :),... +% 'LineStyle', '-.',... +% 'LineWidth', 2,... +% 'SpreadColor', colorz(i, :),... +% 'SpreadLineStyle', ':',... +% 'SpreadLineWidth', 0.5,... +% 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle +% % set title +% title(titl{i, 2, msnum}); +% % in this case, the title has to be a cell array +% % of dimensions nICs x 2 x nMS + else + % % SEPARATEEXPSIM 0 , COLLATEDOSES 0 + + + ax = subplot(nrows, ncols, i); % start plotting row first. ugh. why matlab, why? + [ax, linehandle(rowIX, colIX)] = plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , , ptchhandle(i) + hold on + [ax, linehandle2(rowIX, colIX)] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz2(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 1,... + 'SpreadColor', colorz2(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.25,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) ,, ptchhandle2(i) + hold on + % set titles + title(titl{i, 1, msnum}); + % in this case, the p.title has to be a cell array + % of dimensions + % nICs x 1 x nMS + % set legends only for the top subplot. + % and distinguish the experimental and simulated data. + + if i == 1 + legend([linehandle(1); linehandle2(1)],... + {'exp', 'sim'}, 'Location', 'SouthEast') + end + end + end + + else + for i = 1:nICs + if p.separateExpSim + + % SEPARATEEXPSIM 1 , COLLATEDOSES 0 + + ind = (i-1)*2+1; + ax = subplot(nICs, 2, ind); % exp data plot. + [ax, linehandle] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + % set title + title(titl{i, 1, msnum}); + ind = (i-1)*2+2; + ax = subplot(nICs, 2, ind); % simulated data plot + [ax, linehandle] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', ':',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + % set title + title(titl{i, 2, msnum}); + % in this case, the title has to be a cell array + % of dimensions nICs x 2 x nMS + else + % % SEPARATEEXPSIM 0 , COLLATEDOSES 0 + + linehandle = zeros(nICs, 1); + ptchhandle = zeros(nICs, 1); + + linehandle2 = zeros(nICs, 1); + ptchhandle2 = zeros(nICs, 1); + + ax = subplot(nICs, 1, i); + [ax, linehandle(i)] = plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , , ptchhandle(i) + hold on + [ax, linehandle2(i)] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz2(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 1,... + 'SpreadColor', colorz2(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.25,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) ,, ptchhandle2(i) + hold on + % set titles + title(titl{i, 1, msnum}); + % in this case, the p.title has to be a cell array + % of dimensions + % nICs x 1 x nMS + % set legends only for the top subplot. + % and distinguish the experimental and simulated data. + + if i == 1 + legend([linehandle(1); linehandle2(1)],... + {'exp', 'sim'}) + end + end + end + + end + + + end + + % save figure here + if p.savematlabfig + if isempty(p.projdir) || isempty(p.tstamp) + warning('timestamp and project directory not specified. Nothing will be saved.') + else + specificproj = [p.projdir '/simdata_' p.tstamp]; + saveas(gcf, [specificproj '/traj' p.tstamp num2str(msnum) p.extrafignamestring]); + + end + end + + if p.savejpeg + if isempty(p.projdir) || isempty(p.tstamp) + warning('timestamp and project directory not specified. Nothing will be saved.') + else + specificproj = [p.projdir '/simdata_' p.tstamp]; + print(gcf, '-djpeg', '-r200', [specificproj '/traj' p.tstamp num2str(msnum) p.extrafignamestring]) + end + end + end +end + + + +end +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%% LOCAL FUNCTIONS %%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +function tv = converttosec(tv, timeUnits) +% convert weeks, days, hours, or minutes to seconds. +switch timeUnits + case 'weeks' + tv = tv*7*24*3600; + case 'days' + tv = tv*24*3600; + case 'hours' + tv = tv*3600; + case 'minutes' + tv = tv*60; + case 'seconds' + tv = tv; +end +end + +function [timeDim, measuredDim, doseDim, replicateDim] =... + dimensionLabelMaps(dimensionLabels) + +% dimension lebels have to be the strings +% 'replicates', 'time points', 'measured species', 'doses'. +replicateDim = strcmp(dimensionLabels, 'replicates'); +replicateDim = find(replicateDim); +if isempty(replicateDim) + error('There is no ''replicates'' entry in the dimension labels array.') +end +timeDim = strcmp(dimensionLabels, 'time points'); +timeDim = find(timeDim); +if isempty(timeDim) + error('There is no ''time points'' entry in the dimension labels array.') +end +measuredDim = strcmp(dimensionLabels, 'measured species'); +measuredDim = find(measuredDim); +if isempty(measuredDim) + error('There is no ''measured species'' entry in the dimension labels array.') +end +doseDim = strcmp(dimensionLabels, 'doses'); +doseDim = find(doseDim); +if isempty(doseDim) + error('There is no ''doses'' entry in the dimension labels array.') +end + +end + + +function datamax = computeMaxes(summst, spreadst, dispmode, dimensionLabels) +[tD, mD, dD, rD] = dimensionLabelMaps(dimensionLabels); + +switch dispmode + case 'mean' + % summst must be a time x measured species x 1 x doses array. + datamax = max(max(summst, [],1), [], 4); % max over time and doses. + case 'median' + % summst must be a time x measured species x 1 x doses array. + datamax = max(max(summst, [],1), [], 4); + case 'meanstd' + % summst must be a time x measured species x 1 x doses array. + % spreadst must be a time x measured species x 1 x doses array. + summ_spread_st = summst + spreadst; + datamax = max(max(summ_spread_st, [],1), [], 4); + case 'meancurves' + datamax = max(max(max(spreadst, [],1), [], 4), [], 3); + case 'medianstd' + % summst must be a time x measured species x 1 x doses array. + % spreadst must be a time x measured species x 1 x doses array. + summ_spread_st = summst + spreadst; + datamax = max(max(summ_spread_st, [],1), [], 4); + case 'mediancurves' + datamax = max(max(max(cat(3,spreadst, summst), [],1), [], 4), [], 3); + case 'curves' + datamax = max(max(max(spreadst, [],1), [], 4), [], 3); + otherwise + error(['Invalid data display mode. Must be one of: ''mean'','... + ' ''median'' , ''meanstd'',''medianstd'', ''mediancurves'', ''curves''.']) +end +end + + +%!! +function [summst, spreadst] = computeDataStats(dataArray, dispmode) +% da: data array +% datasummary: mean, median or none +% dataspread: curves, std or none. +% rD: replicates dimension + +switch dispmode + case 'mean' + summst = mean(dataArray, 3); + case 'median' + % sum over the time dimension + % tD MUST be 1 for this to work.. I tried being fully general, + % but what is the point? It's unnecessarily difficult. + % compute the indexes of the median (in terms of sum / integral) + % curves over the replicates + [ix, mdvals] = medianIndex(sum(dataArray, 1), 3); + % again, rD MUST be 3. + summst = medianReplicate(dataArray, ix); + % spreadstatistic is the empty vector + spreadst = []; + case 'meanstd' + summst = mean(dataArray, 3); + spreadst = std(dataArray, 0, 3); + case 'meancurves' + summst = mean(dataArray, 3); + spreadst = dataArray; + case 'medianstd' + [ix, mdvals] = medianIndex(sum(dataArray, 1), 3); + summst = medianReplicate(dataArray, ix); + spreadst = std(dataArray, 0, 3); + case 'mediancurves' + [ix, mdvals] = medianIndex(sum(dataArray, 1), 3); + summst = medianReplicate(dataArray, ix); + spreadst = allButMedianCurve(dataArray, ix); + case 'curves' + summst = []; + spreadst = dataArray; + + otherwise + error(['Invalid data display mode. Must be one of: ''mean'','... + ' ''median'' , ''meanstd'',''medianstd'', ''meancurves'','... + ' ''mediancurves'', ''curves''.']) +end +end + + + +%!! + +function nonMedianCurves = allButMedianCurve(dataArray, ix) +% data array must be time x measuredspecies x replicates x doses + +% Get the median curves. Can this be done via pure vector indexing? +% !UNIMPORTANT but could be fun to think about sometime. +nonMedianCurves = zeros(size(dataArray, 1), size(dataArray, 2), ... + size(dataArray, 3)-1, size(dataArray, 4)); +ixx = 1:size(dataArray, 3); +for i = 1:size(dataArray, 2) + for j = 1:size(dataArray, 4) + ixxx = setdiff(ixx , ix(1,i, 1,j)); % remove median curve index + for k = 1:size(nonMedianCurves, 3) + nonMedianCurves(:,i,k,j) = dataArray(:, i, ixxx(k), j); + end + end +end +end + + +function [title, legend] = titlelegend(p, di, mi) +end + +function [ax, varargout] = plotintoaxis(ax, mode, tv,... + summst, spreadst, dosei, msi, varargin) +% plot a single summary statistic and spread statistic on a given axis +% optional name value pair arguments have names: +% +% 'LineColor' +% 'LineStyle' +% 'LineWidth' +% 'SpreadColor' +% 'SpreadLineStyle' +% 'SpreadLineWidth' +% 'FaceAlpha' +% +p = inputParser; +p.addParameter('LineColor', parula(1), @isnumeric); +p.addParameter('LineStyle','-' , @ischar); +p.addParameter('LineWidth', 2, @isnumeric); +p.addParameter('SpreadColor', summer(1), @isnumeric); +p.addParameter('SpreadLineStyle', ':', @ischar); % gets used if the spread is curves. +p.addParameter('SpreadLineWidth', 2, @isnumeric); + +% gets used if the spread is standard deviation +p.addParameter('FaceAlpha', 0.25, @isnumeric); + +% shading. +p.parse(varargin{:}) +p = p.Results; + +% make the axes to be plotted on current, and bring the relevant figure into focus +axes(ax); + +if strcmp(mode, 'meanstd') || strcmp(mode, 'medianstd') + % plot summary and std as the spread. + [linehandle, ptchhandle] =... + boundedline(tv, summst(:,msi,1, dosei),... + spreadst(:,msi,1, dosei)); + set(ptchhandle, ... + 'FaceColor', p.SpreadColor,... + 'FaceAlpha', p.FaceAlpha); + set(linehandle, ... + 'Color', p.LineColor,... + 'LineStyle', p.LineStyle,... + 'LineWidth', p.LineWidth); + hold on + outstuff = {linehandle, ptchhandle}; + +elseif strcmp(mode, 'mean') || strcmp(mode, 'median') + % plot only the mean or median (no spread) + [linehandle] = plot(tv, summst(:,msi,1, dosei)); + set(linehandle, ... + 'Color', p.LineColor,... + 'LineStyle', p.LineStyle,... + 'LineWidth', p.LineWidth); + hold on + outstuff = {linehandle}; + +elseif strcmp(mode, 'mediancurves') || strcmp(mode, 'meancurves') + + [linehandle] = plot(tv, ... + summst(:,msi,1, dosei)); + set(linehandle, ... + 'Color', p.LineColor,... + 'LineStyle', p.LineStyle,... + 'LineWidth', p.LineWidth); + hold on + + spreadhandles = zeros(size(spreadst, 3), 1); + for j = 1:size(spreadst, 3) % number of replicates + spreadhandles(j) = plot(tv, spreadst(:,msi,j, dosei)); + set(spreadhandles(j),... + 'Color', p.SpreadColor,... + 'LineStyle', p.SpreadLineStyle,... + 'LineWidth', p.SpreadLineWidth); + hold on + end + outstuff = {linehandle, spreadhandles}; + +elseif strcmp(mode, 'curves') + spreadhandles = zeros(size(spreadst, 3),1); + for j = 1:size(spreadst, 3) + spreadhandles(j) = plot(tv, ... + spreadst(:,msi,j, dosei)); + set(spreadhandles(j),... + 'Color', p.SpreadColor,... + 'LineStyle', p.SpreadLineStyle,... + 'LineWidth', p.SpreadLineWidth); + hold on + end + outstuff = {spreadhandles(1)}; +else + warning(['No valid exp data plotting format'... + ' specified. No experimental data will be plotted.']) +end +hold on + +% process variable outputs +nout = nargout-1; +varargout(1:nout) = outstuff(1:nout); + + + +end + + + + + + + + + + + + + diff --git a/mcmc_simbio/src/mcmc_trajectories.m b/mcmc_simbio/src/mcmc_trajectories.m new file mode 100644 index 0000000..cc337f1 --- /dev/null +++ b/mcmc_simbio/src/mcmc_trajectories.m @@ -0,0 +1,899 @@ +function [fighandle, varargout] = mcmc_trajectories(em, di, mi, marray,... + titl, lgds, varargin) +% +% Plot the data time course trajectories and simulated model trajectories +% for each dose and measured species. The doses are arranged in rows of a +% subplot, or all collapsed into a single row. Each measured species has its +% own figure. +% +% Subplot arrangement: +% - Subplot Column: The column corresponds to a measured species. If there +% are multiple measured species, then multiple figures are generated. +% If the separateExpSim optional input parameter's value is set to +% true, then there are two columns, the first one corresponding the +% experimental data and the second corresponding to the simulated data. +% - Subplot Rows: Default is to use one row for each dose. But if the +% collateDoses input parameter is set to true, then all the doses +% get plotted on a single row. +% +% The experimental data and the simulation may be plotted in a few different +% ways: Mean + standard deviation, (curvewise) median + rest of the curves, +% just the curves, just the mean, just the median. +% The default is to use the median for the experimental data, and the mean +% + standard deviation for the simulated curves. +% The standard deviation is plotted as a shaded region. When the individual +% curves are plotted, they are plotted as thin lines: solid for experimental +% data and dotted for simulations. +% +% OUTPUTS: fighandle and tw optional outputs: +% varargout{1} = data array of dimensions nTimepoints x nMeasuredSpecies x +% nSimCurves x nDoseCombinations +% varargout{2} = idxnotused. This is the set of indices of the third +% (nSimCurves) dimension that have at least one dose that led to an error +% during the simulation. The corresponding vector in the da array will have +% all NaNs. +% +% INPUTS +% em: The exported simbiology model object that is to be simulated. +% This is created when the mcmc_runsim is run, and the parameters, species an +% doses that can be set are predefined. +% +% data_info: This is the data info struct that contains the data and the +% related information. +% +% mcmc_info: This is the mcmc info struct that contains info on the mcmc +% run (including which species are dosed, measured etc.) +% +% marray: A numerical array of dimensions nParam x nWalkers x nSamp +% or nPoints x nParam. +% +% OPTIONAL NAME VALUE PAIRS +% collateDoses: Default is false. If true, all the doses get plotted on the +% same row of the subplots. +% +% separateExpSim: Default is false. If true, the experimental data and the +% simulation graphs are each given their own column. +% +% ExpMode: How to plot the experimental data. Can be 'mean', 'median' +% 'meanstd','medianstd', 'mediancurves', 'meancurves', 'curves' or 'none'. +% Default is 'median'. +% +% SimMode: How to plot the simulated data. Can be 'mean', 'median' +% 'meanstd','medianstd', 'mediancurves', 'meancurves', 'curves' or 'none'. +% Default is 'median'. +% If both the Exp and Sim Mode are none, or if the separateExpSim is set to +% true, and either mode are none, then an error is thrown. +% +% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % +% How to specify TITLES and LEGENDS. +% Titles and Legends depend on the combination of CollatedDoses and separateExpSim +% +% CollatedDoses 1; separateExpSim 1 +% The subplots within each species' figure are arranged as 1 x 2. +% +% Title: This should we a cell array of strings of size 1 x 2 x nMS, where nMS is the +% number of measured species. The first column corresponds to the experimental +% data, and the second to the simulated data. Each string should specify the +% measures species that is being displayed in the plot, and whether it is +% experimental or simulated data. Furthermore, You should also specify +% what kind of statistic is being shown: mean or median with standard +% deviation or just sample curves etc. +% Legends: This should be a cell array of strings of dimension 1 x nICs or +% 2 x nICs (the two rows corresponding to experimental data column, +% and simulated data column respectively). If either ExpMode or SimMode are +% 'none', then the dimension must be 1 x nICs. If the dimension is 1 x nICs, +% and both ExpMode and SimMode are specified, then the same legend is used for +% both. +% +% CollatedDoses 1; separateExpSim 0 +% Here the subplots are arranged as 1 by 1 for each species. +% +% Title: This should be a cell array of strings of size 1 x nMS, where nMS is the +% number of measured species. Each string corresponds to one measured species. +% You should also probably use this place to specify what the statistics being +% plotted are: (mean median or none) with (std, curves or none) for both +% experimental data and simulated data. +% +% Legends: This should be a cell array of strings of dimension 1 x (nICs + 1) +% The strings should describe the curves corresponding to Dose 1 of +% experimental data, Dose 1 of simulated data, Doses 2 to last of +% experimental data. For the simulated data doses 2 to last, the fact that +% the same colors are used for the same dose for both the simulated and +% experimental data should be used to interpret the simulated data curves. +% If either ExpMode or SimMode are 'none', then the dimension must +% be 1 x nICs. The same legend sting array is used for every measured +% species. +% +% CollatedDoses 0; separateExpSim 1 +% Here the number of subplots is nICs x 2 per figure, and there is one +% figure for each species. +% +% Title: This should be a cell array of size nICs x 2 x nMS, where nMS +% is the number of measured species. +% Legends: No legend array is needed. +% +% CollatedDoses 0; separateExpSim 0 +% Number of subplots: nICs x 1 per figure, and there is one figure for +% each species. +% +% Title: This should be a cell array of size nICs x 1 x nMS, where nMS +% is the number of measured species. +% Legends: No legend array is needed. In the first dose, the experimental +% data and simulated data plots are labeled as 'exp' and 'sim' +% respectively. +% +% +% +% +% +% +% If SimMode of ExpMode are 'curves', then the curves are used in the legend. +% If Either of them are none, then no legend is used. +% +% TITLES: +% +% +% +% If either the ExpMode or the SimMode are 'curves', then the +% + +% Specify the title and legend inputs as follows: +% For +% The legend that gets specified, if one is to be specified +% (see 'Conditions under which legends are included in the plots'), +% for both ExpMode and SimMode is as follows: +% 'mean', 'median': Specify legend for the mean or median curves. +% + +% +% If collateDoses is true, then the legends are the doses, otherwise the +% titles have the dose information, and the legends depend on the summary +% and spread statistics displayed in each subplot. +% +% The plot titles are taken from the data info struct's measuredNames field. +% If that field is empty, then the measured species field of the mcmc info +% struct is used. Dosing information is taken from the dosedNames and the +% dosedVals fields of the data info struct, and if these are not populated, +% then it is taken from the dosedNames and dosedVals fields of the mcmc_info +% struct. +% +% When the collateDoses option is false, i.e., each dose is +% is plotted in a separate row, then the dose values are also used in the +% title string. If collateDoses is true, then the dose info is used to produce +% the legend strings. Depending on the values of the options ExpSummary, ExpSpread, +% SimSummary, and SimSpread, we also include legends for the corresponding lines +% (if there are any) for the first dose. +% +% +% +% -------------------------------------------------------------------------- + +% Copyright (c) 2018, Vipul Singhal, Caltech +% Permission is hereby granted, free of charge, to any person obtaining a copy +% of this software and associated documentation files (the "Software"), to deal +% in the Software without restriction, including without limitation the rights +% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +% copies of the Software, and to permit persons to whom the Software is +% furnished to do so, subject to the following conditions: + +% The above copyright notice and this permission notice shall be included in all +% copies or substantial portions of the Software. + +% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +% AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +% OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +% SOFTWARE. + +%% precompute a bunch of things to set the defaults for this function +% Number of simulation curves to plot + +p = inputParser; +p.addParameter('nSimCurves', 50, @isnumeric); +p.addParameter('collateDoses', false, @islogical); +p.addParameter('separateExpSim', false, @islogical); +p.addParameter('just_data_info', false, @islogical); +p.addParameter('subplot_arrangement', [], @isnumeric); % [nrows ncols]. Must have nrows*ncols = ndoses. only gets used if collate doses is false +% Modes for the next two inputs: 'mean', 'median' 'meanstd', 'medianstd' +% 'mediancurves', 'curves' +p.addParameter('ExpMode', 'median', @ischar); +p.addParameter('SimMode', 'meanstd', @ischar); +p.addParameter('title', {}, @iscell); +p.addParameter('legends', {}, @iscell); +p.addParameter('savematlabfig', false, @islogical); % if this is true, then projdir and tstamp must be specified. +p.addParameter('savejpeg', false, @islogical); % if this is true, then projdir and tstamp must be specified. +p.addParameter('projdir', [], @ischar); +p.addParameter('tstamp', [], @ischar); +p.addParameter('extrafignamestring', [], @ischar); +p.parse(varargin{:}) +p = p.Results; + +if p.separateExpSim && (p.ExpMode == 'none' || p.SimMode == 'none') + error(['Cant have unspecified Experimental data or Simulation Data '... + 'if the separateExpSim is set to true']) +end + +if p.just_data_info + % just plot the data in data info + % One plot per measured species in each data info. All doses collated + % in the same plot. + fighandle = cell(length(di), 1); + + for dID = 1:length(di) + currdi = di(dID); + [expsummst, expspreadst] = computeDataStats(currdi.dataArray, p.ExpMode); + % dimensionLabels = currdi.dimensionLabels; + % expmax = computeMaxes(expsummst, expspreadst, p.ExpMode, dimensionLabels); + dNames = currdi.dosedNames; + dVals = currdi.dosedVals; + [ndNames, nICs] = size(dVals); + assert(length(dNames) == ndNames); + linehandle = zeros(nICs, 1); + % ptchhandle = zeros(nICs, 1); + nMS = length(currdi.measuredNames); + + tv = currdi.timeVector; + timeUnits = currdi.timeUnits; + tv = converttosec(tv, timeUnits); + colorz = parula(nICs+2); + for msnum = 1:nMS + fighandle{dID}(msnum) = figure; + ax = gca; + legendentry = cell(ndNames, 1); + for i=1:nICs + [ax, linehandle(i)] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + legendentry{i} = []; + + for dnID = 1:ndNames-1 + legendentry{i} = [legendentry{i} dNames{dnID} ' = ' num2str(dVals(dnID, i)) ', ']; + end + + legendentry{i} = [legendentry{i} dNames{ndNames} ' = ' num2str(dVals(ndNames,i ))]; + end + legend(linehandle, legendentry); + title(currdi.measuredNames{msnum}{1:end}) + + end + + + end + + +else + + % Compute the y axis limits for each measured species. All the y axis for a + % given measured species are set to the max y axis limits. + + %% set the number of curves to simulate + % number of walkers is the second dimension of the 3D version of the + % paraemter array OR if the parameter array is 2D, it is taken from the + % mi array. + if ndims(marray) == 3 + % compute the number of walkers + nWalkers = size(marray, 2); + m = marray(:,:)'; % the parameter array is now #points x #params + elseif ismatrix(marray) + % nWalkers = mi.nW; + m = marray; % assume that the 2 dims are correct - npoints x nparams + end + + % Number of curves to simulate is the minimum of the specified number and the + % number of available walkers. + % if p.nSimCurves > nWalkers + % p.nSimCurves = nWalkers; + % end + + if p.nSimCurves > size(m, 1) + p.nSimCurves = size(m, 1); + end + + %% initialize things + [nDSP, nICs] = size(mi.dosedVals); % number of dosed species, + % and number of dose combinations + + dn = mi.dosedNames; %cell array + dose = mi.dosedVals'; + % recall from the data_info documentation: + % 'dosedVals': A matrix of dose values of size + % # of dosed species by # of dose combinations + % thus dose has dimensions #combos x #species + + % convert time vector to seconds. + tv = di.timeVector; + timeUnits = di.timeUnits; + tv = converttosec(tv, timeUnits); + + nts = length(tv); + nMS = length(mi.measuredSpecies); + + % initialize arrays (nominal array is: + % timepoints x measured outputs x nSimCurves x doses ) + da = zeros(nts, nMS, p.nSimCurves, nICs); + ms = mi.measuredSpecies; + dv = mi.dosedVals; + %% + % figure out how to incorporate Extract differences too. + % !TODO + + %% Simulate the model for the given parameters. + % set parameters in the model + % for each dose simulate the model + % simulate the model + + [da, idxnotused] = simulatecurves(em,m, p.nSimCurves, dose, tv, ms); + % compute the simulation maxes, with the non - integrable points removed. + pointstouse = setdiff(1:(p.nSimCurves),idxnotused); + if isempty(pointstouse) + error('none of the points are integrable. Something has gone wrong...') + end + %% Compute relevant statistics depending on what is specified in the inputs. + % for the experimental data. + [expsummst, expspreadst] = computeDataStats(di.dataArray, p.ExpMode); + [simsummst, simspreadst] = computeDataStats(da(:,:,pointstouse, :), p.SimMode); + + dimensionLabels = {'time points', 'measured species', 'replicates', 'doses'}; + expmax = computeMaxes(expsummst, expspreadst, p.ExpMode, dimensionLabels); + simmax = computeMaxes(simsummst, simspreadst, p.SimMode, dimensionLabels); + colorz = parula(nICs+2); + colorz2 = summer(nICs+2); + colorz3 = winter(nICs+2); + fighandle = zeros(nMS, 1); + % titl = p.title; + % legs = p.legends; + + for msnum = 1:nMS + fighandle(msnum) = figure; + ax = gca; + if p.collateDoses + if p.separateExpSim + % SEPARATEEXPSIM 1 , COLLATEDOSES 1 + ax = subplot(1, 2, 1); % get axis + linehandle = zeros(nICs, 1); + ptchhandle = zeros(nICs, 1); + for i=1:nICs + [ax, linehandle(i)] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); + end + + legends([linehandle], legs(1,:)); + title(titl{1, 1, msnum}); + % !! p.legends must be a cell array of size 2 x nICs + % p.title must be a 2 x nMS cell array containing titles + % specifying experiment and simulation for a given measures species. + % can talk about what the summary and spread statistic refer to. + ax = subplot(1, 2, 2); + for i=1:nICs + [ax, linehandle] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '--',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', ':',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + end + % legends([linehandle(1); ptchhandle(1); linehandle(2:end)], + % legs(2,:)); + legends([linehandle], legs(2,:)); + title(titl{1, 2, msnum}) + + else + % SEPARATEEXPSIM 0 , COLLATEDOSES 1 + + linehandle = zeros(nICs, 1); + ptchhandle = zeros(nICs, 1); + + linehandle2 = zeros(nICs, 1); + ptchhandle2 = zeros(nICs, 1); + % plot experimental data + for i=1:nICs + [ax, linehandle] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + end + % plot simulation data + for i=1:nICs + [ax, linehandle2] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz2(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 1,... + 'SpreadColor', colorz2(i, :),... + 'SpreadLineStyle', ':',... + 'SpreadLineWidth', 0.25,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle2 + end + title(titl{msnum}); + % !! p.legends must be a legend array saying exp dose 1, + % sim dose 1, exp dose 2 to end. + legends([linehandle(1); linehandle2(1); linehandle(2:end)],... + p.legends); + end + else + if ~isempty(p.subplot_arrangement) + assert(numel(p.subplot_arrangement) == 2); + assert(prod(p.subplot_arrangement) == nICs); + pCell = num2cell(p.subplot_arrangement); + [nrows, ncols] = pCell{:}; + + + linehandle = zeros(nrows, ncols); + ptchhandle = zeros(nrows, ncols); + + linehandle2 = zeros(nrows, ncols); + ptchhandle2 = zeros(nrows, ncols); + + for i = 1:nICs + rowIX = floor((i-1)/ncols)+1; + colIX = i-floor((i-1)/ncols)*ncols; + + if p.separateExpSim + warning('Both subplot arrangement and separate experiment and sim are specified.\n Not going to do anything') + % future version : just make two figures. + + % SEPARATEEXPSIM 1 , COLLATEDOSES 0, + % subplot_arrangement specified. +% +% ind = (i-1)*2+1; +% ax = subplot(nICs, 2, ind); % exp data plot. +% [ax, linehandle] = ... +% plotintoaxis(ax, p.ExpMode,... +% tv, expsummst, expspreadst, ... +% i, msnum, ... +% 'LineColor', colorz(i, :),... +% 'LineStyle', '-',... +% 'LineWidth', 2,... +% 'SpreadColor', colorz(i, :),... +% 'SpreadLineStyle', '--',... +% 'SpreadLineWidth', 0.5,... +% 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle +% % set title +% title(titl{i, 1, msnum}); +% ind = (i-1)*2+2; +% ax = subplot(nICs, 2, ind); % simulated data plot +% [ax, linehandle] = ... +% plotintoaxis(ax, p.SimMode,... +% tv, simsummst, simspreadst, ... +% i, msnum, ... +% 'LineColor', colorz(i, :),... +% 'LineStyle', '-.',... +% 'LineWidth', 2,... +% 'SpreadColor', colorz(i, :),... +% 'SpreadLineStyle', ':',... +% 'SpreadLineWidth', 0.5,... +% 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle +% % set title +% title(titl{i, 2, msnum}); +% % in this case, the title has to be a cell array +% % of dimensions nICs x 2 x nMS + else + % % SEPARATEEXPSIM 0 , COLLATEDOSES 0 + + + ax = subplot(nrows, ncols, i); % start plotting row first. ugh. why matlab, why? + [ax, linehandle(rowIX, colIX)] = plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , , ptchhandle(i) + hold on + [ax, linehandle2(rowIX, colIX)] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz2(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 1,... + 'SpreadColor', colorz2(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.25,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) ,, ptchhandle2(i) + hold on + % set titles + title(titl{i, 1, msnum}); + % in this case, the p.title has to be a cell array + % of dimensions + % nICs x 1 x nMS + % set legends only for the top subplot. + % and distinguish the experimental and simulated data. + + if i == 1 + legend([linehandle(1); linehandle2(1)],... + {'exp', 'sim'}, 'Location', 'SouthEast') + end + end + end + + else + for i = 1:nICs + if p.separateExpSim + + % SEPARATEEXPSIM 1 , COLLATEDOSES 0 + + ind = (i-1)*2+1; + ax = subplot(nICs, 2, ind); % exp data plot. + [ax, linehandle] = ... + plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + % set title + title(titl{i, 1, msnum}); + ind = (i-1)*2+2; + ax = subplot(nICs, 2, ind); % simulated data plot + [ax, linehandle] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', ':',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , ptchhandle + % set title + title(titl{i, 2, msnum}); + % in this case, the title has to be a cell array + % of dimensions nICs x 2 x nMS + else + % % SEPARATEEXPSIM 0 , COLLATEDOSES 0 + + linehandle = zeros(nICs, 1); + ptchhandle = zeros(nICs, 1); + + linehandle2 = zeros(nICs, 1); + ptchhandle2 = zeros(nICs, 1); + + ax = subplot(nICs, 1, i); + [ax, linehandle(i)] = plotintoaxis(ax, p.ExpMode,... + tv, expsummst, expspreadst, ... + i, msnum, ... + 'LineColor', colorz(i, :),... + 'LineStyle', '-',... + 'LineWidth', 2,... + 'SpreadColor', colorz(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.5,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) , , ptchhandle(i) + hold on + [ax, linehandle2(i)] = ... + plotintoaxis(ax, p.SimMode,... + tv, simsummst, simspreadst, ... + i, msnum, ... + 'LineColor', colorz2(i, :),... + 'LineStyle', '-.',... + 'LineWidth', 1,... + 'SpreadColor', colorz2(i, :),... + 'SpreadLineStyle', '--',... + 'SpreadLineWidth', 0.25,... + 'FaceAlpha', 0.25); % removed out arg: (doesnt work yet) ,, ptchhandle2(i) + hold on + % set titles + title(titl{i, 1, msnum}); + % in this case, the p.title has to be a cell array + % of dimensions + % nICs x 1 x nMS + % set legends only for the top subplot. + % and distinguish the experimental and simulated data. + + if i == 1 + legend([linehandle(1); linehandle2(1)],... + {'exp', 'sim'}) + end + end + end + + end + + + end + + % save figure here + if p.savematlabfig + if isempty(p.projdir) || isempty(p.tstamp) + warning('timestamp and project directory not specified. Nothing will be saved.') + else + specificproj = [p.projdir '/simdata_' p.tstamp]; + saveas(gcf, [specificproj '/traj' p.tstamp num2str(msnum) p.extrafignamestring]); + + end + end + + if p.savejpeg + if isempty(p.projdir) || isempty(p.tstamp) + warning('timestamp and project directory not specified. Nothing will be saved.') + else + specificproj = [p.projdir '/simdata_' p.tstamp]; + print(gcf, '-djpeg', '-r200', [specificproj '/traj' p.tstamp num2str(msnum) p.extrafignamestring]) + end + end + end +end + +if nargout == 2 + varargout{1} = da; +elseif nargout == 3 + varargout{1} = da; + varargout{2} = idxnotused; +end + + + + + +end +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%% LOCAL FUNCTIONS %%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +function tv = converttosec(tv, timeUnits) +% convert weeks, days, hours, or minutes to seconds. +switch timeUnits + case 'weeks' + tv = tv*7*24*3600; + case 'days' + tv = tv*24*3600; + case 'hours' + tv = tv*3600; + case 'minutes' + tv = tv*60; + case 'seconds' + tv = tv; +end +end + +function [timeDim, measuredDim, doseDim, replicateDim] =... + dimensionLabelMaps(dimensionLabels) + +% dimension lebels have to be the strings +% 'replicates', 'time points', 'measured species', 'doses'. +replicateDim = strcmp(dimensionLabels, 'replicates'); +replicateDim = find(replicateDim); +if isempty(replicateDim) + error('There is no ''replicates'' entry in the dimension labels array.') +end +timeDim = strcmp(dimensionLabels, 'time points'); +timeDim = find(timeDim); +if isempty(timeDim) + error('There is no ''time points'' entry in the dimension labels array.') +end +measuredDim = strcmp(dimensionLabels, 'measured species'); +measuredDim = find(measuredDim); +if isempty(measuredDim) + error('There is no ''measured species'' entry in the dimension labels array.') +end +doseDim = strcmp(dimensionLabels, 'doses'); +doseDim = find(doseDim); +if isempty(doseDim) + error('There is no ''doses'' entry in the dimension labels array.') +end + +end + + +function datamax = computeMaxes(summst, spreadst, dispmode, dimensionLabels) +[tD, mD, dD, rD] = dimensionLabelMaps(dimensionLabels); + +switch dispmode + case 'mean' + % summst must be a time x measured species x 1 x doses array. + datamax = max(max(summst, [],1), [], 4); % max over time and doses. + case 'median' + % summst must be a time x measured species x 1 x doses array. + datamax = max(max(summst, [],1), [], 4); + case 'meanstd' + % summst must be a time x measured species x 1 x doses array. + % spreadst must be a time x measured species x 1 x doses array. + summ_spread_st = summst + spreadst; + datamax = max(max(summ_spread_st, [],1), [], 4); + case 'meancurves' + datamax = max(max(max(spreadst, [],1), [], 4), [], 3); + case 'medianstd' + % summst must be a time x measured species x 1 x doses array. + % spreadst must be a time x measured species x 1 x doses array. + summ_spread_st = summst + spreadst; + datamax = max(max(summ_spread_st, [],1), [], 4); + case 'mediancurves' + datamax = max(max(max(cat(3,spreadst, summst), [],1), [], 4), [], 3); + case 'curves' + datamax = max(max(max(spreadst, [],1), [], 4), [], 3); + otherwise + error(['Invalid data display mode. Must be one of: ''mean'','... + ' ''median'' , ''meanstd'',''medianstd'', ''mediancurves'', ''curves''.']) +end +end + + +%!! + + + + +%!! + +function nonMedianCurves = allButMedianCurve(dataArray, ix) +% data array must be time x measuredspecies x replicates x doses + +% Get the median curves. Can this be done via pure vector indexing? +% !UNIMPORTANT but could be fun to think about sometime. +nonMedianCurves = zeros(size(dataArray, 1), size(dataArray, 2), ... + size(dataArray, 3)-1, size(dataArray, 4)); +ixx = 1:size(dataArray, 3); +for i = 1:size(dataArray, 2) + for j = 1:size(dataArray, 4) + ixxx = setdiff(ixx , ix(1,i, 1,j)); % remove median curve index + for k = 1:size(nonMedianCurves, 3) + nonMedianCurves(:,i,k,j) = dataArray(:, i, ixxx(k), j); + end + end +end +end + + +function [title, legend] = titlelegend(p, di, mi) +end + +function [ax, varargout] = plotintoaxis(ax, mode, tv,... + summst, spreadst, dosei, msi, varargin) +% plot a single summary statistic and spread statistic on a given axis +% optional name value pair arguments have names: +% +% 'LineColor' +% 'LineStyle' +% 'LineWidth' +% 'SpreadColor' +% 'SpreadLineStyle' +% 'SpreadLineWidth' +% 'FaceAlpha' +% +p = inputParser; +p.addParameter('LineColor', parula(1), @isnumeric); +p.addParameter('LineStyle','-' , @ischar); +p.addParameter('LineWidth', 2, @isnumeric); +p.addParameter('SpreadColor', summer(1), @isnumeric); +p.addParameter('SpreadLineStyle', ':', @ischar); % gets used if the spread is curves. +p.addParameter('SpreadLineWidth', 2, @isnumeric); + +% gets used if the spread is standard deviation +p.addParameter('FaceAlpha', 0.25, @isnumeric); + +% shading. +p.parse(varargin{:}) +p = p.Results; + +% make the axes to be plotted on current, and bring the relevant figure into focus +axes(ax); + +if strcmp(mode, 'meanstd') || strcmp(mode, 'medianstd') + % plot summary and std as the spread. + [linehandle, ptchhandle] =... + boundedline(tv, summst(:,msi,1, dosei),... + spreadst(:,msi,1, dosei)); + set(ptchhandle, ... + 'FaceColor', p.SpreadColor,... + 'FaceAlpha', p.FaceAlpha); + set(linehandle, ... + 'Color', p.LineColor,... + 'LineStyle', p.LineStyle,... + 'LineWidth', p.LineWidth); + hold on + outstuff = {linehandle, ptchhandle}; + +elseif strcmp(mode, 'mean') || strcmp(mode, 'median') + % plot only the mean or median (no spread) + [linehandle] = plot(tv, summst(:,msi,1, dosei)); + set(linehandle, ... + 'Color', p.LineColor,... + 'LineStyle', p.LineStyle,... + 'LineWidth', p.LineWidth); + hold on + outstuff = {linehandle}; + +elseif strcmp(mode, 'mediancurves') || strcmp(mode, 'meancurves') + + [linehandle] = plot(tv, ... + summst(:,msi,1, dosei)); + set(linehandle, ... + 'Color', p.LineColor,... + 'LineStyle', p.LineStyle,... + 'LineWidth', p.LineWidth); + hold on + + spreadhandles = zeros(size(spreadst, 3), 1); + for j = 1:size(spreadst, 3) % number of replicates + spreadhandles(j) = plot(tv, spreadst(:,msi,j, dosei)); + set(spreadhandles(j),... + 'Color', p.SpreadColor,... + 'LineStyle', p.SpreadLineStyle,... + 'LineWidth', p.SpreadLineWidth); + hold on + end + outstuff = {linehandle, spreadhandles}; + +elseif strcmp(mode, 'curves') + spreadhandles = zeros(size(spreadst, 3),1); + for j = 1:size(spreadst, 3) + spreadhandles(j) = plot(tv, ... + spreadst(:,msi,j, dosei)); + set(spreadhandles(j),... + 'Color', p.SpreadColor,... + 'LineStyle', p.SpreadLineStyle,... + 'LineWidth', p.SpreadLineWidth); + hold on + end + outstuff = {spreadhandles(1)}; +else + warning(['No valid exp data plotting format'... + ' specified. No experimental data will be plotted.']) +end +hold on + +% process variable outputs +nout = nargout-1; +varargout(1:nout) = outstuff(1:nout); + + + +end + + + + + + + + + + + + + diff --git a/mcmc_simbio/src/medianIndex.m b/mcmc_simbio/src/medianIndex.m new file mode 100644 index 0000000..8cadb6a --- /dev/null +++ b/mcmc_simbio/src/medianIndex.m @@ -0,0 +1,163 @@ +function [ix, medianvals] = medianIndex(inputarray, dim) + % compute the index of the median element of an + % array along a specified dimension. If there is an + % even number of elements, then pick the bigger of the middle + % two elements. + % ix is an array of the same size as 'inputarray', but with a width + % of 1 on dimension number dim. The elements of ix give the element + % along 'dim' in the array 'inputarray' that is either the median + % element along that dimension, or if there are an even number of + % element in that dimension, then is the closest overestimate of + % the median. The output medianvals is an array of the values in + % inputarray that ix points to. + % + % (c) Vipul Singhal + + % EXAMPLE: inputarray is, for eg, 1-2-1-4 ---> 1-4-1-2 when dim = 3. + % time - ms - replicates - doses ---> replicates - doses - time - ms + shiftedarray = shiftdim(inputarray, dim-1); + + + % if all the leading dimensions to be shifted are singletons, then they + % will be lost when shiftdim works: ie, if dim = 3, so the shifting is + % to be by 2 dimensions, then the result of shifting is: + % 1-2-3-4 ---> 3-4-1-2 (stays 4D array!) + % 1-1-3-4 ---> 3-4 (becomes 2D array!) + % + % Also, note that + % rr = rand(2, 1, 3, 4); + % size(shiftdim(rr, 2)) + % + % ans = + % + % 3 4 2 + % + % ie, basically matlab will just not report trailing singletons, and + % therein lies the problem. + % + % + % so later when we do the shift back, have to do it as follows: + % + % First pad on the left with singletons equal to the number removed: + % + % [1-2-3-4 --->] 3-4-1-2 --(pad with 0)--> 3-4-1-2, then rotate by + % ndims - ((ndims - dim +1) - 0) == 2, ---> 1-2-3-4. [ORIGINAL] + % + % + % + % [1-1-3-4 --->] 3-4 --(pad with 2 singletons)--> 1-1-3-4, then rotate + % by ndims - ((ndims - dim +1) - 2) == 4, ---> 1-1-3-4 [ORIGINAL] + % + % + % + % [2-1-3-4 --->] 3-4-2 --(pad with one singleton)--> 1-3-4-2 + % then rotate by ndims -((ndims - dim +1) - 1) == 3, ---> 2-1-3-4 [ORIGINAL] + % + % Then do the rotation in the same direction as the original, but + % ndims - dim +1 + % + + %{ OLD: + % This can be undone by first rotating the + % remaining dimensions back, and then adding singletons on the right. + % if ndims(shiftedarray) < ndims(inputarray) + % dimsToRightPad = ndims(inputarray) - ndims(shiftedarray); + % end + % otherwise: + % undo using shiftdim(shiftedarray, ndims(array) - dim +1) + % actually instead of circularly shifting, can just do : + % shiftdim(I5, -dim+1) to undo it!!!! <--- NOPE. + % From the documentation: "When N is negative, shiftdim + % shifts the dimensions to the right and pads with singletons." + %} + + % EXAMPLE (cont): srted is 1-4-1-2, and in this case the same as + % shiftedarray, since dimension 1 has length 1, so there is nothing to + % sort. I is a array of ones of the saem size. + [srted, I] = sort(shiftedarray, 1); + + %index of the element in srted that is the closest + % overestimate of the median + % also indices the first dimension of I + + II = ceil((size(I, 1)+1)/2); + + % Reshape I into a matrix + III = I(:,:); + srted3 = srted(:,:); + % index using II (ie, find the closest-to-middle element) + I4 = III(II,:); + srted4 = srted3(II, :); + % Now reshape back using the original dimensions + + szmat = size(I); + I5 = reshape(I4, [1, szmat(2:end)]); + srted5 = reshape(srted4, [1, szmat(2:end)]); + % shift back to the original dimensions + % there are 2 cases here: + % if the original count of the dimension sizes was + % 1-1-nRep-nDoses, then the shiftdim above resulted in a 2D matrix + % + + singletonsToPad = ndims(inputarray) - ndims(shiftedarray); + padded_srted5 = shiftdim(srted5, -singletonsToPad); + padded_I5 = shiftdim(I5, -singletonsToPad); + + % dimsToRotateBy = ndims -((ndims - dim +1) - singletonsToPad) + % = dim -1 + singletonsToPad + dimsToRotateBy = dim - 1 + singletonsToPad; + ix = shiftdim(padded_I5,dimsToRotateBy); + medianvals = shiftdim(padded_srted5,dimsToRotateBy); + + + + +% coding notes: + % first bring the dimension along which the median is to + % be found to the first dimension. + %%%% + % dim is 3 + % 4 total # of dims. + % 1 2 3 5 (sumovertime, ms, rep, dose) + % > shiftdim by dim - 1 + % shifted has size: 3 5 1 2 (rep, dose, sumovertime, ms) + % ie, 3 element columns, and 5 columns per pane, 1 pane per set1, + % 2 set1's. + %%%% + + + % then sort along this dimension, and subsequently pick + % out the (possibly closest overestimate of the) median value. + + %%%% + % sort + % pick out median indices + + % Ix has size 1 5 1 2 and points to elements of shifted + % shifted(Ix) is a 1 5 1 2 array of the median elements. + % more than that, Ix is what we want. + % in fact, unshifting Ix as follows: + % shiftdim(Ix, numel(size(array)) - dim + 1) brings Ix to + % 1 2 1 5 <---- Ix2 + % Now use Ix2 to index the full array back in the calling function + % array2(:, Ix(1,:,:,:)) maybe? I have no idea. Try it out I guess. + % no i dont think this works. + + + + + % % bring dim to be the first dimension + % shftsrted = shiftdim(srted, dim-1); + % medianarray = shftsrted(ceil((size(I, dim)+1)/2), ) + + + + %%%% + + + + + + + + end diff --git a/mcmc_simbio/src/medianReplicate.m b/mcmc_simbio/src/medianReplicate.m new file mode 100644 index 0000000..fdb1393 --- /dev/null +++ b/mcmc_simbio/src/medianReplicate.m @@ -0,0 +1,29 @@ +function medianCurves = medianReplicate(dataArray, ix) + % Pick out the median replicate from data array, the replicate to be picked + % out specified by the index array ix. + % + % This function takes two inputs: dataArray and ix. dataArray is an array + % containing data that has dimensions time x measured species x replicates + % x doses. ix is an array of dimensions 1 x nMS x 1 x nDoses OR + % nTimePoints x nMS x 1 x nDoses. If it is the latter, only the element + % corresponding to the first time point is used. + % I.e., for the ith measures species, and the jth dose, the index of the + % replicate within dataArray that is used comes from ix(1,i, 1,j), so that + % if ix(2:end,i, 1,j) exist, they are not used in any way. The function that + % creates the ix array is medianindex, and this function is used, for example + % the computeDataStats function above, or in mcmc_runsim.m. + + + % data array must be time x measuredspecies x replicates x doses + + % Get the median curves. Can this be done via pure vector indexing? + % !UNIMPORTANT but could be fun to think about sometime. + medianCurves = zeros(size(dataArray, 1), size(dataArray, 2), ... + 1, size(dataArray, 4)); + + for i = 1:size(dataArray, 2) + for j = 1:size(dataArray, 4) + medianCurves(:,i,1,j) = dataArray(:, i, ix(1,i, 1,j), j); + end + end +end \ No newline at end of file diff --git a/mcmc_simbio/src/plotChains.m b/mcmc_simbio/src/plotChains.m new file mode 100644 index 0000000..146b75b --- /dev/null +++ b/mcmc_simbio/src/plotChains.m @@ -0,0 +1,36 @@ +function plotChains(m, nW, legends, varargin) +% Plot markov chain trajectories. +% m is a nParam x nWalkers x numSamples +% nW is the number of walkers to plot. +% legs is legends +p = inputParser; +p.addParameter('Visible', 'on', @ischar) +p.parse(varargin{:}); +p=p.Results; + +[nParam, nWalkers, nSamples] = size(m); +[n1 n2] = twofactors(nParam); + +if isprime(nParam) + n1 = ceil(nParam/5); % 5 columns + n2 = 5; +end + +wix = unique(ceil(rand(nW, 1)*nWalkers)); +m = m(:,wix, :); +figure('Visible', p.Visible) +ss = get(0, 'screensize'); +set(gcf, 'Position', [ss(3)*(1-1/1.1) ss(4)*(1-1/1.15) ss(3)/1.1 ss(4)/1.15]); +for i = 1:nParam + subplot(n1, n2, i) + for j = 1:length(wix) + plot(1:nSamples, squeeze(m(i, j, :)),... + 'LineWidth', 0.1,... + 'color', [0.2 0.7 0.1].^2) + hold on + end + title(legends{i}) +end + +end + diff --git a/mcmc_simbio/src/plotEstimTraces.m b/mcmc_simbio/src/plotEstimTraces.m new file mode 100644 index 0000000..873e885 --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces.m @@ -0,0 +1,139 @@ +function [f, meanconc, stdconc ] = plotEstimTraces(m,em,ts, datmat, ds, ms, varargin) +%plotEstimTraces Plot estimated trace mean and standard deviation for +%txtl data +% m is the MCMC data, either in 3D or 2D +% ts = tspan +% datmat = data matrix +% em = exported model object +% ds = dosing strat +% ms = measured species + + + +p = inputParser; +p.addParameter('title',[] ,@ischar) +p.addParameter('nplot',150,@isnumeric); +p.addParameter('colorseq', [7, 4, 6, 1, 8, 9, 10], @isnumeric); % meangirls (RNA), rainforest (GFP), swamplands (CFP), bog (YFP), ... +p.addParameter('Visible', 'on', @ischar) +p.parse(varargin{:}); +p=p.Results; + + +if ndims(m) == 3 + m = m(:,:)'; +end + +idx = randperm(size(m, 1), p.nplot); + + + +% construct measured names array +nMS = length(ms); +mn = cell(nMS, 1); % names of measured species +for i = 1:nMS + mn{i} = ms(i).objectName; +end + +% extract dose matrix and dose names from dosing strat +nICs = length(ds(1).concentrations); +nDSP = length(ds); +dn = cell(length(ds)); +for i = 1:length(ds) + dn{i} = ds(i).species; +end +dose = zeros(nICs , nDSP); +for j = 1:nICs + for i = 1:nDSP + dose(j,i) = ds(i).concentrations(j); + end +end +nts = length(ts); +% Compute sample trajectories, max values for axis limits, means, standard deviations +meanconc = zeros(nts, nMS, nICs); +stdconc = zeros(nts, nMS, nICs); +samp = zeros(nts, nMS, p.nplot, nICs); + +idxnotused = zeros(p.nplot, nICs); +for i = 1:nICs + for kk=1:p.nplot + try + sd = simulate(em, [exp(m(idx(kk),:)'); dose(i,:)']); + sd = resample(sd, ts); + spSD = selectbyname(sd, mn); + % output the relevant data to the samples + samp(:,:,kk, i) = spSD.Data; + catch + idx(kk) + idxnotused(kk, i) = idx(kk); + end + end + % compute mean and std over the p.nplot dimension, for each timepoint, IC and + % each species + kknotused = find(idxnotused(:,i)); + for j = 1:nMS + meanconc(:,j, i) = mean(samp(:,j,setdiff(1:(p.nplot), kknotused), i), 3); + stdconc(:, j, i) = std(samp(:,j,setdiff(1:(p.nplot), kknotused), i),0, 3); + end +end + +%% + +% +cc = colorschemes; +f = figure('Visible', p.Visible); +% ss = get(0, 'screensize'); +% set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% Compute species maxes for plotting +mxtemp = max(max(meanconc + stdconc, [], 1), [],3); +mxtemp = max(mxtemp, max(datmat)); % dm is the data matrix, DIM1:time, DIM2:measured species +maxsp = mxtemp'; + +h = zeros(nICs, nMS); +ptch = zeros(nICs, nMS); +d = zeros(nICs, nMS); + +csq = p.colorseq; + +for i = 1:nICs + for j = 1:nMS + subplot(nICs, nMS,nMS*(i-1)+j); + [h(i, j), ptch(i, j)] = boundedline(ts/3600, meanconc(:,j, i), stdconc(:, j, i)); + set(ptch(i, j), 'FaceColor', cc{2,csq(j)}(1,:), 'FaceAlpha', 0.5); + set(h(i, j), 'Color', cc{2,csq(j)}(2,:), 'LineStyle', '--'); + hold on + set(h(i, j), 'LineWidth', 2) + d(i, j)=plot(ts/3600,datmat((i-1)*nts + 1 : (i)*nts,j),'color',cc{2,csq(j)}(3,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxsp(j)*1.1, -(order(maxsp(j)*1.1)-2))]) + xlabel('time/h') + ylabel('conc/nM') + + title(sprintf('%s conc, dosed %s = %0.2g nM', mn{j}, dn{1}, dose(i,1))) + % right now I can only support a single species dose. Need to come up + % with an elegant way of putting all the dosing information in the + % title or in floating text. + + if i ==1 && j ==1 + ax = gca; + end + + end +end +if ~isempty(p.title) +suptitle(p.title) +end + +handles = [h(1, :), d(1,:)]; +lg1 = cell(1, nMS); +lg2 = cell(1, nMS); +for i = 1:nMS + lg1{i} = [mn{i} ' sample mean']; + lg2{i} = [mn{i} ' exp data']; +end + +legstr = [lg1, lg2]; + legend(ax, handles,legstr, 'Location', 'NorthWest'); + +end + + diff --git a/mcmc_simbio/src/plotEstimTraces003.m b/mcmc_simbio/src/plotEstimTraces003.m new file mode 100644 index 0000000..b46ef0d --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces003.m @@ -0,0 +1,174 @@ +function [ output_args ] = plotEstimTraces003(m,exportedMdlObj,tspan, ... + simulatedDataMatrix, dosedInitVals,... + measuredSpecies, varargin) +%plotEstimTraces003 Plot estimated trace mean and standard deviation for +%txtl data +% + +p = inputParser; +p.addParameter('paramID',[10 12 13],@isnumeric); +p.addParameter('titlestr',[] ,@ischar) +p.parse(varargin{:}); +p=p.Results; + +colIdx = p.paramID; +if ndims(m) == 3 + m = m(:,:)'; +end + +I = sampleIntersections(m, colIdx, 'mass', 0.4); + +numPostPlots = min([250, length(I)]); %plot numPostPlots samples... +if numPostPlots== 0 + error('No parameter points meet specified search criteria') +elseif numPostPlots<30 + warning(['Only using ' num2str(numPostPlots) 'parameter points. Expand Search parameters']) +end + +r = ceil(rand(numPostPlots,1)*size(I,1)); +% I = (1:size(msimple,1))'; +idx = I(r); +conc = simulatedDataMatrix; +nICs = size(dosedInitVals,1); +if length(measuredSpecies)==2 +[measuredNames{1}, measuredNames{2}] = deal(measuredSpecies.objectName); +elseif length(measuredSpecies)==1 + [measuredNames{1}] = deal(measuredSpecies.objectName); +end + + + +% Compute sample trajectories, max values for axis limits, means, standard deviations +meanGFP = zeros(length(tspan), nICs); +meanRNA = zeros(length(tspan), nICs); +stdevGFP = zeros(length(tspan), nICs); +stdevRNA = zeros(length(tspan), nICs); +GFPsampleTraj = zeros(length(tspan), nICs, numPostPlots); +RNAsampleTraj = zeros(length(tspan), nICs, numPostPlots); + +idxnotused = []; +kknotused = []; +for i = 1:nICs + for kk=1:numPostPlots + try + sd = simulate(exportedMdlObj, [exp(m(idx(kk),:)'); dosedInitVals(i,:)']); + sd = resample(sd, tspan); + spSD = selectbyname(sd, measuredNames); + [RNAsampleTraj(:,i, kk), GFPsampleTraj(:,i,kk)] = deal(spSD.Data(:,1), spSD.Data(:,2)); + catch + idx(kk) + idxnotused = [idxnotused; idx(kk)]; + kknotused = [kknotused;kk]; + + end + end + % compute mean and std + + meanGFP(:,i)=mean(GFPsampleTraj(:,i,setdiff(1:numPostPlots, kknotused)),3); + meanRNA(:,i)=mean(RNAsampleTraj(:,i,setdiff(1:numPostPlots, kknotused)),3); + stdevGFP(:,i)=std(GFPsampleTraj(:,i,setdiff(1:numPostPlots, kknotused)),0,3); + stdevRNA(:,i)=std(RNAsampleTraj(:,i,setdiff(1:numPostPlots, kknotused)),0,3); +end + +%% + +% +cc = colorschemes; +figure +ss = get(0, 'screensize'); +set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% + maxRNA = max(max(max(meanRNA+stdevRNA)), max(conc(:,1))); + maxGFP = max(max(max(meanGFP+stdevGFP)), max(conc(:,2))); + +for i = 1:nICs + + % RNA + subplot(nICs, 2,2*(i-1)+1); + [h(1), ptch(1)] = boundedline(tspan/3600, meanRNA(:,i), stdevRNA(:,i)); + set(ptch(1), 'FaceColor', cc{2,7}(1,:), 'FaceAlpha', 0.5); + set(h(1), 'Color', cc{2,7}(2,:), 'LineStyle', '--'); + hold on + set(h(1), 'LineWidth', 2) + h(2)=plot(tspan/3600,conc((i-1)*161 + 1 : (i)*161,1),'color',cc{2,7}(3,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxRNA+5, -1)]) + xlabel('time/h') + ylabel('conc/nM') + title(sprintf('%s conc, DNA=%0.2g nM', 'RNA', dosedInitVals(i,:)')) + + % GFP + subplot(nICs, 2,2*(i-1)+2); + [h(3), ptch(2)] = boundedline(tspan/3600, meanGFP(:,i), stdevGFP(:,i)); + set(ptch(2), 'FaceColor', cc{2,9}(3,:), 'FaceAlpha', 0.5); + set(h(3), 'Color', cc{2,9}(2,:), 'LineStyle', '--'); + hold on + set(h(3), 'LineWidth', 2) + h(4)=plot(tspan/3600,conc((i-1)*161 + 1 : (i)*161,2),'color',cc{2,9}(1,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxGFP+10, -1)]) + xlabel('time/h') + ylabel('conc/nM') + title(sprintf('%s conc, DNA=%0.2g nM', 'GFP', dosedInitVals(i,:)')) + + + ax = gca; +end +if ~isempty(p.titlestr) +suptitle(p.titlestr) +end + legend(ax, [h(1), h(2), h(3), h(4)],{'RNA sample mean','RNA Truth',... + 'GFP sample mean',... + 'GFP Truth'}) + + %% Plot the true traces, estimated means and shade the standard deviations. + figure +ss = get(0, 'screensize'); +set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% + maxRNA = max(max(max(meanRNA+stdevRNA)), max(conc(:,1))); + maxGFP = max(max(max(meanGFP+stdevGFP)), max(conc(:,2))); + +for i = 1:nICs + + % RNA + subplot(nICs, 2,2*(i-1)+1); + [h(1), ptch(1)] = boundedline(tspan/3600, meanRNA(:,i), stdevRNA(:,i)); + set(ptch(1), 'FaceColor', cc{2,7}(1,:), 'FaceAlpha', 0.5); + set(h(1), 'Color', cc{2,7}(2,:), 'LineStyle', '--'); + hold on + set(h(1), 'LineWidth', 2) + h(2)=plot(tspan/3600,conc((i-1)*161 + 1 : (i)*161,1),'color',cc{2,7}(3,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxRNA+5, -1)]) + xlabel('time/h') + ylabel('conc/nM') + title(sprintf('%s conc, DNA=%0.2g nM', 'RNA', dosedInitVals(i,:)')) + + % GFP + subplot(nICs, 2,2*(i-1)+2); + [h(3), ptch(2)] = boundedline(tspan/3600, meanGFP(:,i), stdevGFP(:,i)); + set(ptch(2), 'FaceColor', cc{2,9}(3,:), 'FaceAlpha', 0.5); + set(h(3), 'Color', cc{2,9}(2,:), 'LineStyle', '--'); + hold on + set(h(3), 'LineWidth', 2) + h(4)=plot(tspan/3600,conc((i-1)*161 + 1 : (i)*161,2),'color',cc{2,9}(1,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxGFP+10, -1)]) + xlabel('time/h') + ylabel('conc/nM') + title(sprintf('%s conc, DNA=%0.2g nM', 'GFP', dosedInitVals(i,:)')) + + + ax = gca; +end + +if ~isempty(p.titlestr) +suptitle(p.titlestr) +end + legend(ax, [h(1), h(2), h(3), h(4)],{'RNA sample mean','RNA Truth',... + 'GFP sample mean',... + 'GFP Truth'}) + +end + diff --git a/mcmc_simbio/src/plotEstimTraces008.m b/mcmc_simbio/src/plotEstimTraces008.m new file mode 100644 index 0000000..5fb2d07 --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces008.m @@ -0,0 +1,183 @@ +function plotEstimTraces008(m, genemodel, ICarray, tspan,... + spconc1, spconc2, varargin) +%plotEstimTraces008 Sample from the posterior distributions, and generate +%time traces, means, standard deviations. +% Optional Capability: if true parameters are provided, plot the true data too. +% +% +% +% +% +nSp = 3; +if ndims(m) == 3 + m = m(:,:)'; +end + +p = inputParser; +p.addParameter('mode','samplesonly',@ischar); % 'samplesonly', 'trueparams' +p.addParameter('sampling', 'all', @ischar) % 'all', 'MAPint, 'percentiles' +p.addParameter('paramID',[10 12 13],@isnumeric); +p.addParameter('titlestr',[] ,@ischar) +p.addParameter('ncurves',200 ,@isnumeric) +p.parse(varargin{:}); +p=p.Results; + +ncurves = p.ncurves; + +if strcmp(p.sampling, 'all') + idx = ceil(rand(ncurves, 1)*size(m,1)); +elseif strcmp(p.sampling, 'percentiles') + %!TODO, percentiles +elseif strcmp(p.sampling, 'MAPint') + %!TODO: MAP intervals, some mass around +end + +nICs = size(ICarray,1); +% compute the mean and the axis limits + +estimconc1 = zeros(length(tspan),nSp, nICs , ncurves); +estimconc2 = zeros(length(tspan),nSp, nICs , ncurves); +maxRNA_sd = 0; +maxGFP_sd = 0; +for i = 1:nICs + for kk=1:ncurves + sp0 = ICarray(i,:); + [~,estimconc1(:,:, i, kk)] = genemodel(m(idx(kk),[1:4 7 8]), sp0, tspan); + [~,estimconc2(:,:, i, kk)] = genemodel(m(idx(kk),[1:2 5:8]), sp0, tspan); + end + % compute mean and std + mean1 = mean(estimconc1, 4); % mean in dim 4 + mean2 = mean(estimconc2, 4); + std1 = std(estimconc1, 0, 4); % SD in dim 4 + std2 = std(estimconc2, 0, 4); + maxRNA_sd = max([maxRNA_sd, max(mean1(:,2, i)+std1(:,2, i)), max(mean2(:,2,i)+std2(:,2,i))]); + maxGFP_sd = max([maxGFP_sd, max(mean1(:,3,i)+std1(:,3,i)), max(mean2(:,3,i)+std2(:,3,i))]); +end + +maxRNA_curves = max(max(max(max([estimconc1(:,2,:,:), estimconc2(:,2,:,:)])))); +maxGFP_curves = max(max(max(max([estimconc1(:,3,:,:), estimconc2(:,3,:,:)])))); + +cc = colorschemes; +% figure +% ss = get(0, 'screensize'); +% set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% % plot the mean and individual curves +% for i = 1:nICs +% for kk=1:ncurves +% subplot(nICs, 4,4*(i-1)+1); +% h(1)=plot(tspan,estimconc1(:,2,i,kk),'color',[.6 .35 .3].^.2); +% hold on +% subplot(nICs, 4,4*(i-1)+2); +% h(2)=plot(tspan,estimconc1(:,3,i,kk),'color',[.2 .75 .2].^.2); +% hold on +% subplot(nICs, 4,4*(i-1)+3); +% h(3)=plot(tspan,estimconc2(:,2,i,kk),'color',[.6 .35 .3].^.2); +% hold on +% subplot(nICs, 4,4*(i-1)+4); +% h(4)=plot(tspan,estimconc2(:,3,i,kk),'color',[.2 .75 .2].^.2); +% hold on +% end +% subplot(nICs, 4,4*(i-1)+1); +% h(5)=plot(tspan,spconc1(:,2,i),'r','linewidth',2); +% h(9) = plot(tspan,mean1(:,2,i),'color',[.6 .35 .3],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 1, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxRNA_curves+5)]) +% hold on +% subplot(nICs, 4,4*(i-1)+2); +% h(6)=plot(tspan,spconc1(:,3,i),'g','linewidth',2); +% h(10) = plot(tspan,mean1(:,3,i),'color',[.2 .75 .2],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'GFP', 1, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxGFP_curves+5)]) +% hold on +% subplot(nICs, 4,4*(i-1)+3); +% h(7)=plot(tspan,spconc2(:,2,i),'r','linewidth',2); +% h(11) = plot(tspan,mean2(:,2,i),'color',[.6 .35 .3],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 2, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxRNA_curves+5)]) +% hold on +% subplot(nICs, 4,4*(i-1)+4); +% h(8)=plot(tspan,spconc2(:,3,i),'g','linewidth',2); +% h(12) = plot(tspan,mean2(:,3,i),'color',[.2 .75 .2],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'DNA', 2, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxGFP_curves+5)]) +% hold on +% end +% +% axis tight +% legend([h(1), h(2), h(5), h(6), h(9), h(10)],'RNA Samples','GFP Samples',... +% 'RNA True', 'GFP True', 'RNA mean', 'GFP mean') +% suplabel('Plots using parameter estimates' ,'t'); + +% plot the mean and standard deviation +figure +ss = get(0, 'screensize'); +set(gcf, 'Position', [50 100 ss(3)/1.8 ss(4)/1.8]); +for i = 1:nICs + % compute mean and std + + subplot(nICs, 4,4*(i-1)+1); + h(5)=plot(tspan,spconc1(:,2,i),'color',cc{2,7}(2,:) ,'linewidth',2); + hold on + [h(1), ptch(1)] = boundedline(tspan, mean1(:,2,i), std1(:,2,i)); + set(ptch(1), 'FaceColor', cc{2,7}(1,:), 'FaceAlpha', 0.5); + set(h(1), 'Color', cc{2,7}(2,:).^2, 'LineStyle', '--'); + hold on + set(h(1), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxRNA_sd+5)]) + title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 1, ICarray(i,1)), 'FontSize', 16) + + subplot(nICs, 4,4*(i-1)+2); + h(6)=plot(tspan,spconc1(:,3,i),'color',cc{2,9}(2,:),'linewidth',2); + title(sprintf('%s, E%d, DNA = %0.2g', 'GFP', 1, ICarray(i,1)), 'FontSize', 16) + hold on + [h(2), ptch(2)] = boundedline(tspan, mean1(:,3,i), std1(:,3,i)); + set(ptch(2), 'FaceColor', cc{2,9}(3,:), 'FaceAlpha', 0.5); + set(h(2), 'Color', cc{2,9}(2,:).^2, 'LineStyle', '--'); + hold on + set(h(2), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxGFP_sd+5)]) + + subplot(nICs, 4,4*(i-1)+3); + h(7)=plot(tspan,spconc2(:,2,i),'color',cc{2,7}(2,:) ,'linewidth',2); + title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 2, ICarray(i,1)), 'FontSize', 16) + hold on + [h(3),ptch(3)] = boundedline(tspan, mean2(:,2,i), std2(:,2,i)); + set(ptch(3), 'FaceColor', cc{2,7}(1,:), 'FaceAlpha', 0.5); + set(h(3), 'Color', cc{2,7}(2,:).^2, 'LineStyle', '--'); + hold on + set(h(3), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxRNA_sd+5)]) + + + subplot(nICs, 4,4*(i-1)+4); + h(8)=plot(tspan,spconc2(:,3,i),'color',cc{2,9}(2,:),'linewidth',2); + title(sprintf('%s, E%d, DNA = %0.2g', 'GFP', 2, ICarray(i,1)), 'FontSize', 16) + hold on + [h(4), ptch(4)] = boundedline(tspan, mean2(:,3,i), std2(:,3,i)); + set(ptch(4), 'FaceColor', cc{2,9}(3,:), 'FaceAlpha', 0.5); + set(h(4), 'Color', cc{2,9}(2,:).^2, 'LineStyle', '--'); + hold on + set(h(4), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxGFP_sd+5)]) + if i ==1 + ax = gca; + end +end + +% +% axis tight +% suplabel('Plots using parameter estimates' ,'t'); + [ax1,h1] =suplabel('time, arbitrary units' ); + [ax2,h2] =suplabel('concentration, arbitrary units' ,'y'); + set(h1,'FontSize',20) + set(h2,'FontSize',20) + legend(ax, [h(3), h(4), h(7), h(8)],{'RNA mean','GFP mean',... + 'RNA True', 'GFP True'}, 'FontSize', 12, 'Location', 'NorthEast') + + + +end diff --git a/mcmc_simbio/src/plotEstimTraces009s0s1s2.m b/mcmc_simbio/src/plotEstimTraces009s0s1s2.m new file mode 100644 index 0000000..6721ef2 --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces009s0s1s2.m @@ -0,0 +1,12 @@ +function [ output_args ] = plotEstimTraces009s0s1s2(m, S, varargin) +%plotEstimTraces008 Sample from the posterior distributions, and generate +%time traces, means, standard deviations. +% Optional Capability: if true parameters are provided, plot the true data too. +% +% +% +% +% + + +end diff --git a/mcmc_simbio/src/plotEstimTraces_singleplot.m b/mcmc_simbio/src/plotEstimTraces_singleplot.m new file mode 100644 index 0000000..7be15c3 --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces_singleplot.m @@ -0,0 +1,140 @@ +function [f, meanconc, stdconc ] = plotEstimTraces_singleplot(m,em,ts, datmat, ds, ms, varargin) +%plotEstimTraces Plot estimated trace mean and standard deviation for +%txtl data +% m is the MCMC data, either in 3D or 2D +% ts = tspan +% datmat = data matrix +% em = exported model object +% ds = dosing strat +% ms = measured species + + + +p = inputParser; +p.addParameter('title',[] ,@ischar) +p.addParameter('nplot',150,@isnumeric); +p.addParameter('colorseq', [7, 4, 6, 1, 8, 9, 10], @isnumeric); % meangirls (RNA), rainforest (GFP), swamplands (CFP), bog (YFP), ... +p.addParameter('Visible', 'on', @ischar) +p.parse(varargin{:}); +p=p.Results; + + +if ndims(m) == 3 + m = m(:,:)'; +end + +idx = randperm(size(m, 1), p.nplot); + + + +% construct measured names array +nMS = length(ms); +mn = cell(nMS, 1); % names of measured species +for i = 1:nMS + mn{i} = ms(i).objectName; +end + +% extract dose matrix and dose names from dosing strat +nICs = length(ds(1).concentrations); +nDSP = length(ds); +dn = cell(length(ds)); +for i = 1:length(ds) + dn{i} = ds(i).species; +end +dose = zeros(nICs , nDSP); +for j = 1:nICs + for i = 1:nDSP + dose(j,i) = ds(i).concentrations(j); + end +end +nts = length(ts); +% Compute sample trajectories, max values for axis limits, means, standard deviations +meanconc = zeros(nts, nMS, nICs); +stdconc = zeros(nts, nMS, nICs); +samp = zeros(nts, nMS, p.nplot, nICs); + +idxnotused = zeros(p.nplot, nICs); +for i = 1:nICs + for kk=1:p.nplot + try + sd = simulate(em, [exp(m(idx(kk),:)'); dose(i,:)']); + sd = resample(sd, ts); + spSD = selectbyname(sd, mn); + % output the relevant data to the samples + samp(:,:,kk, i) = spSD.Data; + catch + idx(kk) + idxnotused(kk, i) = idx(kk); + end + end + % compute mean and std over the p.nplot dimension, for each timepoint, IC and + % each species + kknotused = find(idxnotused(:,i)); + for j = 1:nMS + meanconc(:,j, i) = mean(samp(:,j,setdiff(1:(p.nplot), kknotused), i), 3); + stdconc(:, j, i) = std(samp(:,j,setdiff(1:(p.nplot), kknotused), i),0, 3); + end +end + +%% + +% +cc = colorschemes; +f = figure('Visible', p.Visible); +% ss = get(0, 'screensize'); +% set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% Compute species maxes for plotting +mxtemp = max(max(meanconc + stdconc, [], 1), [],3); +mxtemp = max(mxtemp, max(datmat)); % dm is the data matrix, DIM1:time, DIM2:measured species +maxsp = mxtemp'; + +h = zeros(nICs, nMS); +ptch = zeros(nICs, nMS); +d = zeros(nICs, nMS); + +csq = p.colorseq; + + +for j = 1:nMS + for i = 1:nICs + subplot(1, nMS,j); + [h(i, j), ptch(i, j)] = boundedline(ts/3600, meanconc(:,j, i), stdconc(:, j, i)); + set(ptch(i, j), 'FaceColor', i*(1/nICs)*cc{2,csq(j)}(1,:), 'FaceAlpha', 0.5); + set(h(i, j), 'Color', i*(1/nICs)*cc{2,csq(j)}(2,:), 'LineStyle', '--'); + hold on + set(h(i, j), 'LineWidth', 2) + d(i, j)=plot(ts/3600,datmat((i-1)*nts + 1 : (i)*nts,j),'color',i*(1/nICs)*cc{2,csq(j)}(3,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxsp(j)*1.1, -(order(maxsp(j)*1.1)-2))]) + xlabel('time/h') + ylabel('conc/nM') + + title(sprintf('%s conc, dosed %s = %0.2g nM', mn{j}, dn{1}, dose(i,1))) + % right now I can only support a single species dose. Need to come up + % with an elegant way of putting all the dosing information in the + % title or in floating text. + + if i ==1 && j ==1 + ax = gca; + end + + end +end +if ~isempty(p.title) +suptitle(p.title) +end + +handles = [h(1, :), d(1,:)]; +lg1 = cell(1, nMS); +lg2 = cell(1, nMS); +for i = 1:nMS + lg1{i} = [mn{i} ' sample mean']; + lg2{i} = [mn{i} ' exp data']; +end + +legstr = [lg1, lg2]; + legend(ax, handles,legstr, 'Location', 'NorthWest'); + +end + + diff --git a/mcmc_simbio/src/plotEstimTraces_ver201711.m b/mcmc_simbio/src/plotEstimTraces_ver201711.m new file mode 100644 index 0000000..0fea061 --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces_ver201711.m @@ -0,0 +1,159 @@ +function [f, meanconc, stdconc , idxnotused] = ... + plotEstimTraces_ver201711(m,em,ts, datmat, ds, ms, varargin) +%plotEstimTraces_ver201711 based on the function plotEstimTraces, except +%measured species is as used in the mcmc toolbox. Ie, ms is a cell array of +%cell arrays. +% +% m is the MCMC data, either in 3D or 2D +% ts = tspan +% datmat = data matrix +% em = exported model object +% ds = dosing strat +% ms = measured species + + + +p = inputParser; +p.addParameter('title',[] ,@ischar) +p.addParameter('nplot',150,@isnumeric); +p.addParameter('colorseq', [7, 4, 6, 1, 8, 9, 10],... + @isnumeric);... + % colorscheme: meangirls (RNA), rainforest (GFP), swamplands (CFP), +% bog (YFP), ... +p.addParameter('Visible', 'on', @ischar) +p.parse(varargin{:}); +p=p.Results; + + +if ndims(m) == 3 + m = m(:,:)'; +end + +idx = randperm(size(m, 1), p.nplot); + + +% extract dose matrix and dose names from dosing strat +nICs = length(ds(1).concentrations); +nDSP = length(ds); +dn = cell(length(ds)); +for i = 1:length(ds) + dn{i} = ds(i).species; +end +dose = zeros(nICs , nDSP); +for j = 1:nICs + for i = 1:nDSP + dose(j,i) = ds(i).concentrations(j); + end +end + +nts = length(ts); +nMS = length(ms); +% Compute sample trajectories, max values for axis limits, means, +% standard deviations +meanconc = zeros(nts, nMS, nICs); +stdconc = zeros(nts, nMS, nICs); +samp = zeros(nts, nMS, p.nplot, nICs); + +idxnotused = zeros(p.nplot, nICs); +for i = 1:nICs + for kk=1:p.nplot + try + sd = simulate(em, [exp(m(idx(kk),:)'); dose(i,:)']); + sd = resample(sd, ts); + + % since measured species is a cell array of cell arrays of strings, + % we need to loop over the outer cell, summing the values of the + % species in the inner cell. + for ss = 1:nMS + [~, XX] = selectbyname(sd, ms{ss}); + X = sum(XX, 2); + % output the relevant data to the samples + samp(:,ss,kk, i) = X; + + end + + + + + + catch ME + ME.message + idx(kk); + idxnotused(kk, i) = idx(kk); + end + end + % compute mean and std over the p.nplot dimension, + % for each timepoint, IC and + % each species + kknotused = find(idxnotused(:,i)); + for j = 1:nMS + meanconc(:,j, i) = ... + mean(samp(:,j,setdiff(1:(p.nplot),kknotused), i), 3); + stdconc(:, j, i) = ... + std(samp(:,j,setdiff(1:(p.nplot), kknotused), i),0, 3); + end +end + +%% + +% + +cc = colorschemes; +f = figure('Visible', p.Visible); +% ss = get(0, 'screensize'); +% set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% Compute species maxes for plotting +mxtemp = max(max(meanconc + stdconc, [], 1), [],3); +mxtemp = max(mxtemp, max(datmat)); % dm is the data matrix, +% DIM1:time, DIM2:measured species +maxsp = mxtemp'; + +h = zeros(nICs, nMS); +ptch = zeros(nICs, nMS); +d = zeros(nICs, nMS); + +csq = p.colorseq; + +for i = 1:nICs + for j = 1:nMS + subplot(nICs, nMS,nMS*(i-1)+j); + [h(i, j), ptch(i, j)] = boundedline(ts/3600, meanconc(:,j, i),... + stdconc(:, j, i)); + set(ptch(i, j), 'FaceColor', cc{2,csq(j)}(1,:), 'FaceAlpha', 0.5); + set(h(i, j), 'Color', cc{2,csq(j)}(2,:), 'LineStyle', '--'); + hold on + set(h(i, j), 'LineWidth', 2) + d(i, j)=plot(ts/3600,datmat((i-1)*nts + 1 : (i)*nts,j),... + 'color',cc{2,csq(j)}(3,:) ,'linewidth',2); + hold on + set(gca, 'Ylim', [0, round(maxsp(j)*1.1, -(order(maxsp(j)*1.1)-2))]) + xlabel('time/h') + ylabel('conc/nM') + +% title(sprintf('%s conc, dosed %s = %0.2g nM', mn{j}, dn{1}, dose(i,1))) + % right now I can only support a single species dose. Need to come up + % with an elegant way of putting all the dosing information in the + % title or in floating text. + if i ==1 && j ==1 + ax = gca; + end + end +end +% if ~isempty(p.title) +% suptitle(p.title) +% end + +% handles = [h(1, :), d(1,:)]; +% lg1 = cell(1, nMS); +% lg2 = cell(1, nMS); +% for i = 1:nMS +% lg1{i} = [mn{i} ' sample mean']; +% lg2{i} = [mn{i} ' exp data']; +% end + +% legstr = [lg1, lg2]; +% legend(ax, handles,legstr, 'Location', 'NorthWest'); + +end + + diff --git a/mcmc_simbio/src/plotEstimTraces_ver201801.m b/mcmc_simbio/src/plotEstimTraces_ver201801.m new file mode 100644 index 0000000..e745aea --- /dev/null +++ b/mcmc_simbio/src/plotEstimTraces_ver201801.m @@ -0,0 +1,174 @@ +function [f, meanconc, stdconc , idxnotused] = ... + plotEstimTraces_ver201801(m,em,ts, datmat, ds, ms, varargin) +%plotEstimTraces_ver201711 based on the function plotEstimTraces, except +%measured species is as used in the mcmc toolbox. Ie, ms is a cell array of +%cell arrays. +% +% m is the MCMC data, either in 3D or 2D +% ts = tspan +% datmat = data matrix of dimensions time by ms by replicates by doses +% em = exported model object +% ds = dosing strat. +% ms = measured species + +p = inputParser; +p.addParameter('title',[] ,@ischar) +p.addParameter('nplot',150,@isnumeric); +p.addParameter('colorseq', [7, 4, 6, 1, 8, 9, 10],... + @isnumeric);... + % colorscheme: meangirls (RNA), rainforest (GFP), swamplands (CFP), +% bog (YFP), ... +p.addParameter('Visible', 'on', @ischar) +p.parse(varargin{:}); +p=p.Results; + + +if ndims(m) == 3 + m = m(:,:)'; +end + +idx = randperm(size(m, 1), p.nplot); + + +% extract dose matrix and dose names from dosing strat +nICs = length(ds(1).concentrations); +nDSP = length(ds); % number of dosed species per combination +dn = cell(length(ds)); % dose names (ie the species that get dosed. +for i = 1:length(ds) + dn{i} = ds(i).species; +end +dose = zeros(nICs , nDSP); % # of dose combos x number of species per combo +for j = 1:nICs + for i = 1:nDSP + dose(j,i) = ds(i).concentrations(j); + end +end + +nts = length(ts); % number of timepoints +nMS = length(ms); % number of measured species. +% Compute sample trajectories, max values for axis limits, means, +% standard deviations +meanconc = zeros(nts, nMS, nICs); +stdconc = zeros(nts, nMS, nICs); +samp = zeros(nts, nMS, p.nplot, nICs); + +idxnotused = zeros(p.nplot, nICs); +for i = 1:nICs + for kk=1:p.nplot + try + sd = simulate(em, [exp(m(idx(kk),:)'); dose(i,:)']); + sd = resample(sd, ts); + % since measured species is a cell array of cell arrays of strings, + % we need to loop over the outer cell, summing the values of the + % species in the inner cell. + for ss = 1:nMS + [~, XX] = selectbyname(sd, ms{ss}); + X = sum(XX, 2); + % output the relevant data to the samples + samp(:,ss,kk, i) = X; + end + catch ME + ME.message + idx(kk); + idxnotused(kk, i) = idx(kk); + end + end + % compute mean and std over the p.nplot dimension, + % for each timepoint, IC and + % each species + kknotused = find(idxnotused(:,i)); + for j = 1:nMS + meanconc(:,j, i) = ... + mean(samp(:,j,setdiff(1:(p.nplot),kknotused), i), 3); + stdconc(:, j, i) = ... + std(samp(:,j,setdiff(1:(p.nplot), kknotused), i),0, 3); + end +end + +%% + +% + +cc = colorschemes; +f = figure('Visible', p.Visible); +% ss = get(0, 'screensize'); +% set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% Compute species maxes for plotting +mxtemp = max(max(meanconc + stdconc, [], 1), [],3); +mxtemp = max(mxtemp, max(max(max(datmat, [], 1), [], 3),[],4)); % dm is the data matrix, +% DIM1:time, DIM2:measured species +maxsp = squeeze(mxtemp)'; + +h = zeros(nICs, nMS); +ptch = zeros(nICs, nMS); +d = zeros(nICs, nMS); +hdata= zeros(nICs, nMS); +ptchdata = zeros(nICs, nMS); +csq = p.colorseq; + +if size(datmat, 3)>1 + hasreplicates = true; +else + hasreplicates = false; + +end + +for i = 1:nICs + for j = 1:nMS + subplot(nICs, nMS,nMS*(i-1)+j); + [h(i, j), ptch(i, j)] = boundedline(ts/3600, meanconc(:,j, i),... + stdconc(:, j, i)); + set(ptch(i, j), 'FaceColor', cc{2,csq(j)}(1,:), 'FaceAlpha', 0.5); + set(h(i, j), 'Color', cc{2,csq(j)}(2,:), 'LineStyle', '--'); + hold on + set(h(i, j), 'LineWidth', 1) + +% the data can have replicates, and therefore have a mean and a +% standard deviation. + + if hasreplicates + % compute standard deviation of the data and plot the shaded region + [hdata(i, j), ptchdata(i, j)] = boundedline(... + ts/3600, mean(datmat(:,j, :, i), 3),... + std(datmat(:,j, :, i),0, 3)); + set(ptchdata(i, j), 'FaceColor', cc{2,csq(j)}(4,:), 'FaceAlpha', 0.5); + set(hdata(i, j), 'Color', cc{2,csq(j)}(3,:), 'LineStyle', '-'); + hold on + set(hdata(i, j), 'LineWidth', 2) + else + d(i, j)=plot(ts/3600,mean(datmat(:,j, :, i), 3),... + 'color',cc{2,csq(j)}(3,:) ,'linewidth',2); + end + + hold on + set(gca, 'Ylim', [0, round(maxsp(j)*1.1, -(order(maxsp(j)*1.1)-2))]) + xlabel('time/h', 'FontSize', 14) + ylabel('conc/nM', 'FontSize', 14) + +% title(sprintf('%s conc, dosed %s = %0.2g nM', mn{j}, dn{1}, dose(i,1))) + % right now I can only support a single species dose. Need to come up + % with an elegant way of putting all the dosing information in the + % title or in floating text. + if i ==1 && j ==1 + ax = gca; + end + end +end +if ~isempty(p.title) +suptitle(p.title) +end + +% handles = [h(1, :), d(1,:)]; +% lg1 = cell(1, nMS); +% lg2 = cell(1, nMS); +% for i = 1:nMS +% lg1{i} = [mn{i} ' sample mean']; +% lg2{i} = [mn{i} ' exp data']; +% end + +% legstr = [lg1, lg2]; +% legend(ax, handles,legstr, 'Location', 'NorthWest'); + +end + + diff --git a/mcmc_simbio/src/plotOrigTraces008.m b/mcmc_simbio/src/plotOrigTraces008.m new file mode 100644 index 0000000..32dbb96 --- /dev/null +++ b/mcmc_simbio/src/plotOrigTraces008.m @@ -0,0 +1,190 @@ +function plotOrigTraces008(m, genemodel, ICarray, tspan,... + spconc1, spconc2, varargin) +%plotEstimTraces008 Sample from the posterior distributions, and generate +%time traces, means, standard deviations. +% Optional Capability: if true parameters are provided, plot the true data too. +% +% +% +% +% +nSp = 3; +if ndims(m) == 3 + m = m(:,:)'; +end + +p = inputParser; +p.addParameter('mode','samplesonly',@ischar); % 'samplesonly', 'trueparams' +p.addParameter('sampling', 'all', @ischar) % 'all', 'MAPint, 'percentiles' +p.addParameter('paramID',[10 12 13],@isnumeric); +p.addParameter('titlestr',[] ,@ischar) +p.addParameter('ncurves',200 ,@isnumeric) +p.parse(varargin{:}); +p=p.Results; + +ncurves = p.ncurves; + +if strcmp(p.sampling, 'all') + idx = ceil(rand(ncurves, 1)*size(m,1)); +elseif strcmp(p.sampling, 'percentiles') + %!TODO, percentiles +elseif strcmp(p.sampling, 'MAPint') + %!TODO: MAP intervals, some mass around +end + +nICs = size(ICarray,1); +% compute the mean and the axis limits + +estimconc1 = zeros(length(tspan),nSp, nICs , ncurves); +estimconc2 = zeros(length(tspan),nSp, nICs , ncurves); +maxRNA_sd = 0; +maxGFP_sd = 0; +for i = 1:nICs + for kk=1:ncurves + sp0 = ICarray(i,:); + [~,estimconc1(:,:, i, kk)] = genemodel(m(idx(kk),[1:4 7 8]), sp0, tspan); + [~,estimconc2(:,:, i, kk)] = genemodel(m(idx(kk),[1:2 5:8]), sp0, tspan); + end + % compute mean and std + mean1 = mean(estimconc1, 4); % mean in dim 4 + mean2 = mean(estimconc2, 4); + std1 = std(estimconc1, 0, 4); % SD in dim 4 + std2 = std(estimconc2, 0, 4); + maxRNA_sd = max([maxRNA_sd, max(mean1(:,2, i)+std1(:,2, i)), max(mean2(:,2,i)+std2(:,2,i))]); + maxGFP_sd = max([maxGFP_sd, max(mean1(:,3,i)+std1(:,3,i)), max(mean2(:,3,i)+std2(:,3,i))]); +end + +maxRNA_curves = max(max(max(max([estimconc1(:,2,:,:), estimconc2(:,2,:,:)])))); +maxGFP_curves = max(max(max(max([estimconc1(:,3,:,:), estimconc2(:,3,:,:)])))); + +cc = colorschemes; +% figure +% ss = get(0, 'screensize'); +% set(gcf, 'Position', [50 100 ss(3)/1.1 ss(4)/1.3]); +% plot the mean and individual curves +% for i = 1:nICs +% for kk=1:ncurves +% subplot(nICs, 4,4*(i-1)+1); +% h(1)=plot(tspan,estimconc1(:,2,i,kk),'color',[.6 .35 .3].^.2); +% hold on +% subplot(nICs, 4,4*(i-1)+2); +% h(2)=plot(tspan,estimconc1(:,3,i,kk),'color',[.2 .75 .2].^.2); +% hold on +% subplot(nICs, 4,4*(i-1)+3); +% h(3)=plot(tspan,estimconc2(:,2,i,kk),'color',[.6 .35 .3].^.2); +% hold on +% subplot(nICs, 4,4*(i-1)+4); +% h(4)=plot(tspan,estimconc2(:,3,i,kk),'color',[.2 .75 .2].^.2); +% hold on +% end +% subplot(nICs, 4,4*(i-1)+1); +% h(5)=plot(tspan,spconc1(:,2,i),'r','linewidth',2); +% h(9) = plot(tspan,mean1(:,2,i),'color',[.6 .35 .3],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 1, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxRNA_curves+5)]) +% hold on +% subplot(nICs, 4,4*(i-1)+2); +% h(6)=plot(tspan,spconc1(:,3,i),'g','linewidth',2); +% h(10) = plot(tspan,mean1(:,3,i),'color',[.2 .75 .2],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'GFP', 1, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxGFP_curves+5)]) +% hold on +% subplot(nICs, 4,4*(i-1)+3); +% h(7)=plot(tspan,spconc2(:,2,i),'r','linewidth',2); +% h(11) = plot(tspan,mean2(:,2,i),'color',[.6 .35 .3],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 2, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxRNA_curves+5)]) +% hold on +% subplot(nICs, 4,4*(i-1)+4); +% h(8)=plot(tspan,spconc2(:,3,i),'g','linewidth',2); +% h(12) = plot(tspan,mean2(:,3,i),'color',[.2 .75 .2],'linewidth',2, 'LineStyle',':'); +% title(sprintf('%s, E%d, DNA = %0.2g', 'DNA', 2, ICarray(i,1))) +% set(gca, 'Ylim', [0, round(maxGFP_curves+5)]) +% hold on +% end +% +% axis tight +% legend([h(1), h(2), h(5), h(6), h(9), h(10)],'RNA Samples','GFP Samples',... +% 'RNA True', 'GFP True', 'RNA mean', 'GFP mean') +% suplabel('Plots using parameter estimates' ,'t'); + +% plot the mean and standard deviation +figure +ss = get(0, 'screensize'); +set(gcf, 'Position', [50 100 ss(3)/1.8 ss(4)/1.8]); +for i = 1:nICs + % compute mean and std + + subplot(nICs, 4,4*(i-1)+1); + h(5)=plot(tspan,spconc1(:,2,i),'color',cc{2,7}(2,:) ,'linewidth',2); + hold on +% [h(1), ptch(1)] = boundedline(tspan, mean1(:,2,i), std1(:,2,i)); +% set(ptch(1), 'FaceColor', cc{2,7}(1,:), 'FaceAlpha', 0.5); +% set(h(1), 'Color', cc{2,7}(2,:), 'LineStyle', '--'); +% hold on +% set(h(1), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxRNA_sd+5)]) + title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 1, ICarray(i,1)), 'FontSize', 16) + + subplot(nICs, 4,4*(i-1)+2); + h(6)=plot(tspan,spconc1(:,3,i),'color',cc{2,9}(2,:),'linewidth',2); + title(sprintf('%s, E%d, DNA = %0.2g', 'GFP', 1, ICarray(i,1)), 'FontSize', 16) + hold on +% [h(2), ptch(2)] = boundedline(tspan, mean1(:,3,i), std1(:,3,i)); +% set(ptch(2), 'FaceColor', cc{2,9}(3,:), 'FaceAlpha', 0.5); +% set(h(2), 'Color', cc{2,9}(2,:), 'LineStyle', '--'); +% hold on +% set(h(2), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxGFP_sd+5)]) + + subplot(nICs, 4,4*(i-1)+3); +% if i ==1 +% h(1)=plot(tspan,spconc2(:,2,i),'color',cc{2,7}(2,:) ,'linewidth',2); +% else + h(7)=plot(tspan,spconc2(:,2,i),'color',cc{2,7}(2,:) ,'linewidth',2); +% end + title(sprintf('%s, E%d, DNA = %0.2g', 'RNA', 2, ICarray(i,1)), 'FontSize', 16) + hold on +% [h(3),ptch(3)] = boundedline(tspan, mean2(:,2,i), std2(:,2,i)); +% set(ptch(3), 'FaceColor', cc{2,7}(1,:), 'FaceAlpha', 0.5); +% set(h(3), 'Color', cc{2,7}(2,:), 'LineStyle', '--'); +% hold on +% set(h(3), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxRNA_sd+5)]) + + + subplot(nICs, 4,4*(i-1)+4); +% if i ==1 +% h(2)=plot(tspan,spconc2(:,3,i),'color',cc{2,9}(2,:),'linewidth',2); +% else + h(8)=plot(tspan,spconc2(:,3,i),'color',cc{2,9}(2,:),'linewidth',2); +% end + title(sprintf('%s, E%d, DNA = %0.2g', 'GFP', 2, ICarray(i,1)), 'FontSize', 16) + hold on +% [h(4), ptch(4)] = boundedline(tspan, mean2(:,3,i), std2(:,3,i)); +% set(ptch(4), 'FaceColor', cc{2,9}(3,:), 'FaceAlpha', 0.5); +% set(h(4), 'Color', cc{2,9}(2,:), 'LineStyle', '--'); +% hold on +% set(h(4), 'LineWidth', 2) + + set(gca, 'Ylim', [0, round(maxGFP_sd+5)]) + if i ==1 + ax = gca; + end +end + +% +% axis tight +% suplabel('Plots using parameter estimates' ,'t'); + [ax1,h1] = suplabel('time, arbitrary units' ); + [ax2,h2] = suplabel('concentration, arbitrary units' ,'y'); + set(h1,'FontSize',20) + set(h2,'FontSize',20) + legend(ax, [h(7), h(8)],{'RNA True', 'GFP True'}, 'FontSize', 14, 'Location', 'NorthEast') + + + +end diff --git a/mcmc_simbio/src/project_init.m b/mcmc_simbio/src/project_init.m new file mode 100644 index 0000000..1181a80 --- /dev/null +++ b/mcmc_simbio/src/project_init.m @@ -0,0 +1,74 @@ +function [tstamp, specificprojdir, st] = project_init(varargin) +% this function is to be called at the beginning of the project script stored in the projects +% directory of the MCMC toolbox. +% The script calling this should be a project file (.m) that sets up the data, models, +% MCMC simulation, runs it, and optionally plots the results. +% The main function of this file is that every time a given project is run, a directory +% created (using a timestamp) where the results of that run are stored. +% +% This function can be run without any arguments, in which case it will check if a directory +% with the same name as the project that calls it exists in the projects folder. If it does, +% then all the simulation data is stored in timestamped subdirectories within this directory, +% and this directory is what gets checked first for the various things like models, +% experimental data, est_info structs, model_map structs, etc. If any of these things are +% not present in this directory, then the models or data directories in the main directory +% are checked. If those directories also do not contain the files required, an error is thrown. +% If the directory does not exist, then a directory with the same name as the project file +% gets created, and all the models/ data etc are looked for in the main directories. This +% directory will be where the simulation data will be stored. +% +% on the other hand, if a string argument is supplied, then the above applied, but instead of +% the project name, we use this string as the name. +% +% the output of this function is a string that specifies the timestamp for the purposes of +% identifying the directory created. + + + +% get the name of the (project) file calling this function, and its path. +st = dbstack(1,'-completenames'); + +fp = st(1).file; +slashes = regexp(fp, '/'); +projdir = fp(1:slashes(end)-1); % full path to the directory containing the +% function/script that called this function, without the final slash .../dirname +projname = st(1).name; % name of the function / script that called this function +tstamp = datestr(now, 'yyyymmdd_HHMMSS'); + +% get the optional input (the name of the directory where things will be stored) +p = inputParser; +addOptional(p, 'proj', projname, @ischar); % +p.parse(varargin{:}); +p=p.Results; +specificprojdir = [projdir '/' p.proj]; +% display some output text +disp(sprintf('############################################ \n')) +disp(sprintf('File and directory info:\n')) +disp(sprintf('Project name: \n ''%s'' \n', projname)) +disp(sprintf('Directory where the project file is stored: \n ''%s'' \n', projdir)) +disp(sprintf('Directory where data will be stored: \n ''%s'' \n', specificprojdir)) +disp(sprintf('Timestamp for this run (yyyymmdd_HHMMSS): \n ''%s'' \n', tstamp)) + +% check if the project directory already exists, and create it if it doesnt. +if exist( specificprojdir ,'dir') + disp(sprintf(['Project directory already exists, using this to store data\n' ... + ' (in a subdirectory named ''%s''). \n'], ['simdata_' tstamp])); + + addpath(specificprojdir); + addpath(genpath(specificprojdir)); + rmpath(genpath(specificprojdir)); + mkdir([specificprojdir '/simdata_' tstamp]); + addpath([specificprojdir '/simdata_' tstamp]); +else + mkdir(specificprojdir); + mkdir([specificprojdir '/simdata_' tstamp]); + addpath([specificprojdir '/simdata_' tstamp]); + disp(sprintf(['Project directory does not exist. Creating it,' ... + ' and using this to store data\n' ... + ' (in a subdirectory named ''%s''). \n'], ['simdata_' tstamp])); + +end +disp(sprintf('############################################ \n')) + +end + diff --git a/mcmc_simbio/src/rebuildMasterVec.m b/mcmc_simbio/src/rebuildMasterVec.m new file mode 100644 index 0000000..4ab520c --- /dev/null +++ b/mcmc_simbio/src/rebuildMasterVec.m @@ -0,0 +1,38 @@ +function [mv] = rebuildMasterVec(rmv, mai) + % rmv = reduced master vector or array of vectors + % mai: master_info struct + % mv = rebuilt master vector. + + % sg = cell array of semantic group vectors + sg = mai.semanticGroups; + mv = zeros(size(mai.paramRanges, 1), size(rmv, 2)); + % size(rmv, 2) should be 1 right? No! This can also be + % of lenght number of walkers! so for example, minit can be + % fed into this function, as done by the integrableLHS_v2.m + + for i = 1:length(sg) + + % build the matrix of semanticGroup normalizations + % This is very interesting! can write a paper about this. + % data initialization structure. + % This is a cool bit of code! Very non intuitive why I am doing + % this here. Worth explaining somewhere for future reference.... + % but am I going to? + % + % + % +% y = datasample(s,1:2,2,'Replace',false) + % + multfact = ones(numel(sg{i}), size(rmv, 2)); + for k = 1:size(rmv, 2) + y = datasample(1:numel(sg{i}),numel(sg{i}),'Replace',false); + for j = 1:length(y) + % work on the y(j)th index + multfact(y(j), k) = (-0.25 + 0.5*rand(1) + 0.74)^(y(j)-1); + + end + end + + mv(sg{i}, :) = multfact.*repmat(rmv(i, :), numel(sg{i}), 1); + end +end diff --git a/mcmc_simbio/src/reduceMasterVec.m b/mcmc_simbio/src/reduceMasterVec.m new file mode 100644 index 0000000..2d80ae2 --- /dev/null +++ b/mcmc_simbio/src/reduceMasterVec.m @@ -0,0 +1,35 @@ +function [rmv, rpr, sgnames] = reduceMasterVec(master_info) + % in mcmc_info = mcmc_info_constgfp3ii(modelObj), we have the note: + % semanticGroups = {1, [2 4] [3 5]}; % cant do this, then the points never +% get differentiated at all. need some jitter. think about this actually. +% +% return to that and try some things out. % working on this +% nov 2018 +% +% Copyright, Vipul Singhal + mv = master_info.masterVector; + estParamsIx = setdiff((1:length(mv))', master_info.fixedParams); + logp = mv(estParamsIx); + % reduce the logp (take only the first element for each group) + % nreduc = sum(cellfun(@numel, master_info.semanticGroups)) ... + % - numel(master_info.semanticGroups); + % reducedLength = length(logp) - nreduc; + reducedLength = length(master_info.semanticGroups); + % reduced master vector (actually the to-be-estimated part of it) + rmv = zeros(reducedLength, 1); + % reduced parameter ranges + pr = master_info.paramRanges; + rpr = zeros(reducedLength, 2); + sgnames = cell(reducedLength, 1); % list of names for the sematic groups + + mastnames = master_info.estNames; + for i = 1:reducedLength + sgi = master_info.semanticGroups{i}; % ith sematic group + rmv(i) = logp(sgi(1)); + rpr(i, :) = [max(pr(sgi, 1)) min(pr(sgi, 2))]; + sgnames{i} = mastnames{sgi(1)}; % the first name in the group + end + + + +end diff --git a/mcmc_simbio/src/sampleIntersections.m b/mcmc_simbio/src/sampleIntersections.m new file mode 100644 index 0000000..7e8e53f --- /dev/null +++ b/mcmc_simbio/src/sampleIntersections.m @@ -0,0 +1,43 @@ +function [Ikeep, bds, mx, mp] = sampleIntersections(m, colIdx, varargin) +%sampleIntersections Indices of the MCMC samples to use for plotting +% generate sample intersections using either custom bounds, percentile +% bounds, or MAP estimate mass bounds +% m is numSamples x numParameters matrix +% colIdx is the indices of the parameters to to use + +p = inputParser; +p.addParameter('mode','MAPmass',@ischar); % other modes: percentile, custom +p.addParameter('npoints',100,@isnumeric); +p.addParameter('bw',[],@isnumeric); +p.addParameter('mass',0.2,@isnumeric); + +p.parse(varargin{:}); +p=p.Results; + +if strcmp(p.mode, 'MAPmass') + bds = zeros(2, length(colIdx)); + mx = zeros(1, length(colIdx)); + mp = zeros(1, length(colIdx)); + Ikeep = (1:size(m, 1))'; + for i = 1:length(colIdx) + if isempty(p.bw) + [bds(:,i), mx(i), mp(i) ] = MPinterval(m(:,colIdx(i)), 'npoints', p.npoints,... + 'mass', p.mass); + else + [bds(:,i), mx(i), mp(i) ] = MPinterval(m(:,colIdx(i)), 'npoints', p.npoints,... + 'mass', p.mass, 'bw', p.bw); + BW(i) = p.bw; + end + + + Inew = find(m(:,colIdx(i))>bds(1,i) & m(:,colIdx(i)) one different initial values vector) -data.x0 = repmat(data.x0,1,size(speciesInitialValues,2)); -data.x0(speciesInd,:) = speciesInitialValues; - -% which species are subject to initial value estimation -%data.x0_mapping = speciesInd; -data.x0_mapping = []; -% adjusting parameter vector size based on the initial values -if ~isempty(data.x0_mapping) - data.p0 = [data.p0; data.x0(speciesInd,:)']; -end - -% select parameters for estimation, list is given by parseGetEqOutput (parameterNames field) -ParametersToEstimate = {'TXTL_TL_rate',... - 'TXTL_UTR_RBS_F',... - 'TXTL_transcription_rate1' - }; - -pSelect = cellfun(@(x) findStringInAList(dataOut.parameterNames,x),ParametersToEstimate); - -% selecting parameters of interest -data.p_mapping = pSelect; - -gfp = findStringInAList(dataOut.speciesNames,'[protein deGFP*]'); - -% cost function target species -data.targetSpecies = gfp; -% conversion btw nM vs uM -data.speciesScaleFactor = 0.001; -% ploting simulation results with the origianl data in each step -data.debugMode =0; -% output check -data.outputCheck = gfp; - -% parameter perturbation -% select parameters to perturb: -parametersToPerturb = data.p_mapping; -data.parameterPerturbNum = 10; -% perturbation range in percentage -data.parameterPerturbRange = 0.5; -% give the indexes of parameters are subject to parameter estimation - -data.parameterPerturbList = find(ismember(data.p_mapping,parametersToPerturb) == 1); -data.notParameterPerturbList = find(ismember(data.p_mapping,parametersToPerturb) == 0); - -% parameter dependency rules -ntp_consumption_rate = findStringInAList(dataOut.parameterNames,'TXTL_NTP_consumption'); -aa_consumption_rate = findStringInAList(dataOut.parameterNames,'TXTL_TL_AA_consumption'); -K_tx = findStringInAList(dataOut.parameterNames,'TXTL_transcription_rate1'); -K_tl = findStringInAList(dataOut.parameterNames,'TXTL_TL_rate'); -data.p_dependecies{1} = sprintf('p(%d)=9*p(%d)',ntp_consumption_rate,K_tx); -data.p_dependecies{2} = sprintf('p(%d)=2*p(%d)',aa_consumption_rate,K_tl); - -options = psoptimset('TolMesh',1e-8,'MaxIter',2000); - -% parameters subject to parameter estimation -pVec = data.p0(data.p_mapping); - -% default generate lower and upper bounds -default_lb = 0; -default_ub = 100; - -LB = repmat(default_lb,size(pVec),1); -UB = repmat(default_ub,size(pVec),1); - -% handle perturbed initial parameter cases -% Latin Hypercube sampling from a uniform distribution -p0Array = zeros(size(data.parameterPerturbList,2)+size(data.notParameterPerturbList,2),data.parameterPerturbNum); -p0Array(data.parameterPerturbList,:) = lhsu(pVec(data.parameterPerturbList)-data.parameterPerturbRange.*pVec(data.parameterPerturbList),(1+data.parameterPerturbRange).*pVec(data.parameterPerturbList),data.parameterPerturbNum)'; -p0Array(data.notParameterPerturbList,:) = repmat(pVec(data.notParameterPerturbList),1,data.parameterPerturbNum); - - -for k = 1:data.parameterPerturbNum - -[x(:,k),fval(k),exitflag(k),output(k)] = ... - patternsearch(@(y)data.costFcn(y,data),p0Array(:,k),[],[],[],[],LB,UB,[],options); -end - - - - - - diff --git a/sysID/MCMC_example.m b/sysID/MCMC_example.m deleted file mode 100644 index 042ffd1..0000000 --- a/sysID/MCMC_example.m +++ /dev/null @@ -1,5 +0,0 @@ -% MCMC Parameter Estimation example using the TXTL toolbox -% This file demonstrates how to use the Goodman and Weare MCMC algorithm -% for performing Bayesian Parameter inference. - -% \ No newline at end of file diff --git a/tests/RNAdegradation_test.m b/tests/RNAdegradation_test.m deleted file mode 100644 index ac60de8..0000000 --- a/tests/RNAdegradation_test.m +++ /dev/null @@ -1,47 +0,0 @@ -% Accessing individual parameters: Mobj.UserData.ReactionConfig.RNase_F for example. -% Accessing initial concentrations. Mobj.species(2).initialAmount (for the -% second species. can use findspecies to get tthe index). - -% We do a parameter study of the RNA degradation. - -% Nominal Parameters -close all; clear all; clc -tube1 = txtl_extract('E30VNPRL'); -tube2 = txtl_buffer('E30VNPRL'); -tube3 = txtl_newtube('gene_expression'); -dna_deGFP = txtl_add_dna(tube3, 'p70(50)', 'rbs(20)', 'deGFP(1000)', 0, 'plasmid'); - -Mobj = txtl_combine([tube1, tube2, tube3]); -txtl_addspecies(Mobj, 'RNA rbs--deGFP', 200); - - - -[simData] = txtl_runsim(Mobj,14*60*60); -t_ode = simData.Time; -x_ode = simData.Data; -RNA1 = findspecies(Mobj, 'RNA rbs--deGFP'); -RNA2 = findspecies(Mobj, 'Ribo:RNA rbs--deGFP'); -RNA3 = findspecies(Mobj, 'AA:AGTP:Ribo:RNA rbs--deGFP') ; -RNA4 = findspecies(Mobj, 'RNA rbs--deGFP:RNase') ; -RNA5 = findspecies(Mobj, 'AA:AGTP:Ribo:RNA rbs--deGFP:RNase') ; -RNA6 = findspecies(Mobj, 'Ribo:RNA rbs--deGFP:RNase') ; - -figure -subplot(2,1,1) -semilogy(t_ode/60, x_ode(:,RNA1)+x_ode(:,RNA2)+x_ode(:,RNA3)+x_ode(:,RNA4)+x_ode(:,RNA5)+x_ode(:,RNA6)) -axis([0.01 240 0.01 300]) -title('logy, RNA degradation, initial RNA conc = 200nM') - -subplot(2,1,2) -plot(t_ode/60, x_ode(:,RNA1)+x_ode(:,RNA2)+x_ode(:,RNA3)+x_ode(:,RNA4)+x_ode(:,RNA5)+x_ode(:,RNA6)) -axis([0 240 0.01 300]) -title('RNA degradation, initial RNA conc = 200nM') - -% The graph has an initial half life of 15 min, about what we want, and this half life stays at this value. -% This is pretty good. What we would like is to get the number closer to -% 12 min. -% Lets see what happens when we change initial RNase conc, the k_deg, and the -% k_on and k_off. - -% Actually lets do this later. For now, this is good enough. We can try to get it close to 12 nM in the next iteration. - diff --git a/tests/atp_test.m b/tests/atp_test.m deleted file mode 100644 index ce11144..0000000 --- a/tests/atp_test.m +++ /dev/null @@ -1,30 +0,0 @@ -% ATP degradation work - -close all; clear all; clc -tube1 = txtl_extract('E30'); -tube2 = txtl_buffer('E30'); -tube3 = txtl_newtube('gene_expression'); -dna_deGFP = txtl_add_dna(tube3, 'p70(50)', 'rbs(20)', 'deGFP(1000)', 0, 'plasmid'); - -Mobj = txtl_combine([tube1, tube2, tube3]); - - - - -[simData] = txtl_runsim(Mobj,14*60*60); -t_ode = simData.Time; -x_ode = simData.Data; -ATP1 = findspecies(Mobj, 'ATP'); -ATP2 = findspecies(Mobj, 'AA:ATP:Ribo:RNA rbs--deGFP:RNase'); -ATP3 = findspecies(Mobj, 'AA:ATP:Ribo:RNA rbs--deGFP'); - -figure -subplot(2,1,1) -semilogy(t_ode/60, x_ode(:,ATP1)+x_ode(:,RNA2)+x_ode(:,RNA3)+x_ode(:,RNA4)+x_ode(:,RNA5)+x_ode(:,RNA6)) -axis([0 240 0.01 300]) -title('logy, RNA degradation, initial RNA conc = 200nM') - -subplot(2,1,2) -plot(t_ode/60, x_ode(:,ATP1)+x_ode(:,RNA2)+x_ode(:,RNA3)+x_ode(:,RNA4)+x_ode(:,RNA5)+x_ode(:,RNA6)) -axis([0 240 0.01 300]) -title('RNA degradation, initial RNA conc = 200nM') \ No newline at end of file diff --git a/tests/geneexpr_DNAsweep.m b/tests/geneexpr_DNAsweep.m deleted file mode 100644 index e45f278..0000000 --- a/tests/geneexpr_DNAsweep.m +++ /dev/null @@ -1,52 +0,0 @@ -close all; clear all - -% 30 nM initial DNA conc, match figure 1c in PRL paper. -tube1 = txtl_extract('E31VNPRL'); -tube2 = txtl_buffer('E31VNPRL'); -tube3 = txtl_newtube('gene_expression'); -dna_deGFP = txtl_add_dna(tube3,'p70(50)', 'rbs(20)', 'deGFP(1000)', 30, 'plasmid'); -Mobj = txtl_combine([tube1, tube2, tube3]); -[simData] = txtl_runsim(Mobj,14*60*60); -txtl_plot(simData,Mobj); -t_ode = simData.Time; - x_ode = simData.Data; -RNA1 = findspecies(Mobj, 'RNA rbs--deGFP'); -RNA2 = findspecies(Mobj, 'Ribo:RNA rbs--deGFP'); -RNA3 = findspecies(Mobj, 'AA:AGTP:Ribo:RNA rbs--deGFP') ; -RNA4 = findspecies(Mobj, 'RNA rbs--deGFP:RNase') ; -RNA5 = findspecies(Mobj, 'AA:AGTP:Ribo:RNA rbs--deGFP:RNase') ; -RNA6 = findspecies(Mobj, 'Ribo:RNA rbs--deGFP:RNase') ; -figure -plot(t_ode/60, x_ode(:,RNA1)+x_ode(:,RNA2)+x_ode(:,RNA3)+x_ode(:,RNA4)+x_ode(:,RNA5)+x_ode(:,RNA6)) -title('RNA levels, initial DNA conc = 30nM') - -count = 0; Mobj = cell(6,1); simData = cell(6,1); t_ode = cell(6,1); x_ode = cell(6,1); -for DNAinitial = [0.5 5 10 20 30 40] - count = count+1; - tube1 = txtl_extract('E31VNPRL'); - tube2 = txtl_buffer('E31VNPRL'); - tube3 = txtl_newtube('gene_expression'); - dna_deGFP = txtl_add_dna(tube3, 'p70(50)', 'rbs(20)', 'deGFP(1000)', DNAinitial, 'plasmid'); - Mobj{count} = txtl_combine([tube1, tube2, tube3]); - [simData{count}] = txtl_runsim(Mobj{count},8*60*60); - t_ode{count} = simData{count}.Time; - x_ode{count} = simData{count}.Data; -end -figure -colororder = lines; -for i = 1:6 - iGFP = findspecies(Mobj{i}, 'protein deGFP*'); - h(i) = plot(t_ode{i}/60, x_ode{i}(:,iGFP)); - hold on - set(h(i), 'Color', colororder(i,:), 'LineWidth', 1.5); - hold on -end - axis([0 55 0 1350]) - legend(h, {'0.5 nM', '5 nM', '10 nM', '20 nM', '30 nM', '40 nM'}, 'Location', 'NorthEastOutside') - title('deGFP expression as a function of initial DNA conc'); -xlabel('time, min') -ylabel('protein conc nM') -% Automatically use matlab mode in emacs (keep at end of file) -% Local variables: -% mode: matlab -% End: diff --git a/txtl_de_init.m b/txtl_de_init.m new file mode 100644 index 0000000..de9f5bd --- /dev/null +++ b/txtl_de_init.m @@ -0,0 +1,15 @@ +function txtldir = txtl_de_init +fp = mfilename('fullpath'); +slashes = regexp(fp, '/'); +filedir = fp(1:slashes(end)-1); +rmpath(filedir); +rmpath([filedir '/auxiliary']) +rmpath([filedir '/components']) +rmpath([filedir '/config']) +rmpath([filedir '/core']) +rmpath([filedir '/examples']) +rmpath([filedir '/examples/CompanionFiles']) +rmpath([filedir '/tests']) +rmpath([filedir '/data']) +txtldir = filedir; +end diff --git a/txtl_init.m b/txtl_init.m index 740b008..12c7043 100644 --- a/txtl_init.m +++ b/txtl_init.m @@ -1,9 +1,16 @@ -addpath([pwd '/auxiliary']) -addpath([pwd '/components']) -addpath([pwd '/config']) -addpath([pwd '/core']) -addpath([pwd '/examples']) -addpath([pwd '/examples/CompanionFiles']) -addpath([pwd '/modules/paramest']) -addpath([pwd '/tests']) -addpath([pwd '/data']) \ No newline at end of file +function txtldir = txtl_init +fp = mfilename('fullpath'); +slashes = regexp(fp, '/'); +filedir = fp(1:slashes(end)-1); +addpath(filedir); +addpath(genpath([filedir '/auxiliary'])) +addpath([filedir '/components']) +addpath([filedir '/config']) +addpath([filedir '/core']) +addpath([filedir '/examples']) +addpath([filedir '/examples/CompanionFiles']) +addpath([filedir '/tests']) +addpath([filedir '/data']) +addpath([filedir '/mcmc_simbio']) +txtldir = filedir; +end diff --git a/txtl_tutorial.m b/txtl_tutorial.m index 7d70410..7b32f7f 100644 --- a/txtl_tutorial.m +++ b/txtl_tutorial.m @@ -1,5 +1,6 @@ %% TXTL Tutorial % txtl_tutorial.m - basic usage of the TXTL modeling toolbox +% % Vipul Singhal, 28 July 2017 % % This file contains a simple tutorial of the TXTL modeling toolbox. You @@ -33,7 +34,7 @@ Mobj = txtl_combine([tube1, tube2, tube3]); % Run a simulaton -% +% % At this point, the entire experiment is set up and loaded into 'Mobj'. % So now we just use standard Simbiology and MATLAB commands to run % and plot our results!