Questions related to Bioinformatics and Computational Biology
I am looking for journals that will publish newly developed tool/server/web application/pipeline that are useful in biology, or a newly curated database with biological significance.
Can anyone kindly suggest some journals that publishes Bioinformatics and Computational Biology papers that will publish -
- Bioinformatics Tools/Servers (Machine Learning, Deep Learning based or else)
- Text Mining
- Pipeline etc.
I know a few such as:
- Nucleic Acids Research
- Nature Scientific Data
- Nature Computational Science
- Briefings in Bioinformatics
- BMC Bioinformatics
- PLOS Computational Biology
- Journal of Cheminformatics
If you know more, kindly suggest the journal names. Thank you in advance.
I have been working on a protein-ligand complex simulation. While I have been careful all the way in preparing the necessary files including the .top and the .gro files I have come across an error stating "2 particles communicated to PME rank 4 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x" while running the mdrun. Initial lookout into this issue gave indications of the system getting blown up. I initially tried to troubleshoot the issue by lessening the time steps as suggested in the gromacs documentation but couldn't resolve the issue. Could anybody give suggestions regarding this issue?
How to determine which bacterial virulence factor (bacterial toxins or cell wall components) in relevance to human sepsis or bacterial infection will interact or regulate my target protein of interest. I have examined with LPS treatment in a dose and time dependent fashion. However, I did not notice any difference in expression. Are there any panel of bacterial virulence factors commercially available or bioinformatically possible?
I am studying a simultaneous proton transfer, bond breakage and nucleophilic attack (by water molecule), using US approach for which I had already performed 5ns QM/MM simulation.
All three reactions takes places in a single step (Inversion mechanism for Glycoside hydrolase). Now, I am confused in defining the restraint variables.
I have selected 4 Reaction Coordinates:
1. RC1: Proton transfer from Base residue to leaving group
OE1-HE1 -> C----O4 (this glycosidic bond breaks and HE1 is transferred to O4 )
So, the reaction coordinate for this reaction is difference in distance between OE1-HE1 and O4--HE1.
2. RC2: Glycosidic bond breakage:
C-----O4 -> C O4. The reaction coordinate for this reaction is the distance between C and O4
3. RC3: Nucleophilic attack by water:
H(i)O(w)H(w) [this is nucleophilic water] ---- C (anomeric carbon of the broken glycosidic bond). The reaction coordinate for this reaction is the distance between C and O(w).
4. RC4: Proton transfer from water (H(i)) to Acid Residue
H(i)O(w)H(w) -- OD1 (Acid residue). The reaction coordinate for this step is difference in distance between O(w)-H(i) and OD1-H(i).
For the RC2, I have made the following restraint file:
# distance restraint
&rst iat=8122,8132 r1=0, r2=1.8, r3=1.8, r4=5, rstwt=1,-1, rk2 = 500.0, rk3 = 500.0, /
I have increased the the value for r2 & r3 by 0.2 and upto 3.4. I am not able to understand what should be the value for r1 and r4 ? Could anyone pls comment on it and explain it briefly?
I also not able to understand how to make the restraint file for difference in distances between two set of atoms, as in case of RC4 and RC1. I would be helpful for me if somebody explains it too with an example.
I also want to visualize all the four reaction steps so which trajectory files from all the four RCs I should see?
Since I am new to US, it would be a great help if somebody can guide me through this.
Hi everyone! I'm trying to work on the acquisition of the Raman Spectra of a leaf section using Confocal Raman Spectroscopy. The samples to be used are pure, dried, and powdered leaf samples. I am going to use a 785 nm laser source.
However, the only thing I get was a spectra with no peaks or it is strongly masked by fluorescence. Do you have any tricks/sample preparations to avoid the fluorescence because I'm afraid that it covers the raman signal or enhance the Raman Signal because the compounds might have a relatively weak Raman Signal compare to the background signal and the fluorescence? Are there any sample preparations that can be done without the use of water or an immersion objective like the use of solid matrices which can be mixed with the sample? Thank you.
We were carrying out in vacuo energy minimization studies of a protein dimer (which is experimentally proven to be a dimer). Earlier, the same work has been done in our lab using an older version of GROMACS (4.5.5) and used Group cutoff schemes with coulomb type= cutoff and with no pbc.
When we reinitiated the work again and have to use the Gromacs 5.0.4, the default cutoff scheme is changed to Verlet. We are observing that using Verlet cutoff scheme, the monomers dissociate from each other which is not the case even in this version when using Group cutoff scheme.
I searched for literatures and found out the differences are probably in the pairlist generation. In my graduate courses, I have read about energy drift in molecular dynamics simulation and is aware (though not in details) that Verlet algorithm has something to do with it.
Can anyone elucidate on this problem? The minimization runs fine and the protein remains dimerized when using Group cutoff. This happens even after solvation. We have used an xyz pbc and grid neighbour searching type with default fourier spacing and rlist as we have not mentioned the last two parameters explicitly in the mdp file.
I want to know the theory in play behind this. Please help.
I have performed RAPD for V. cholerae isolates with 1281 and 1283 random primers and found a distinct band pattern. I have attached a picture.
I am looking fora command that will modify 3 chains available in the original pdb into a single chain and then renumber all of the residues. I have tried using alter command but when I export the pdb I get only one chain (of the initial trimer) and not the merged chain
I am looking for a recent diagnosis for chikungunya virus through computational biology techniques.
So my last year project is Drug Efflux Pumps and Persistence in Methicillin Resistant Staphylococcus aureus and we gonna focus on persister cells to study the path way of antimicrobial resistance...my question is how can i link bioinformatics and some coding to this project without requiring wgs cause it's not an option inside our lab !I need a small yet beneficial technique/ tools in small scale that i can learn and implement by my self .PS I love programming in general but im still new to bioinformatics so i need help to link my passion for coding and my field "biotechnology"
Hope everyone is having a good day.
I want to learn computational biology. I have a PhD. in pharmacology. Lots of times I heard about the computational biology/bioinformatics but never had a guideline how to learn or to start this interesting field of research.
It would be very helpful if you can guide me through this.
Have a nice day.
I am curious if there exist any bioinformatics tools that can predict changes in secondary structure for proteins and/or nucleic acids. For instance, say a C-terminal loop on a protein reorganizes into a helix in response to binding to RNA. Or say an intrinsically unstructured segment of RNA forms a transient stem-loop in order to bind to a protein. Are there any computational tools that could predict such a change short of performing in-lab experiments?
Hello, im Phd student, In my master's thesis, I investigated the cytotoxic, apoptotic and cell cycle effects of an anticancer drug (Danusertib) on pancreatic cancer cells (CFPAC-1and Mia-PaCa-2) by using xCelligence and Flow cytometry in Cell culture lab.
However, I want to do my Phd thesis with virtual experiments using databases ( OMIM, COSMIC, GAD, TCGA) and computer power (maybe on Amazon web services, google cloud or azure) due to financial insufficiency and I like to spend time with computers. So I don't know where to start research about these things and can I do a logical research with these databases? Can anyone give a tip or advice ?
INVdock software has been used to predict the first receptor of drug with low molecular weight and finding or predicting cell target, I would be thankful to anybody who could let me know how could I get the software and procedure for working with the same.
How is the popularity of INVdock software?
is this software free and what is the procedure of working with that?
Applications of bioinformatics in medicine is a key factor in technological advancement in the field of modern medical technologies.
In which areas of medical technology are the technological achievements of bioinformatics used?
What are the applications of bioinformatics in medicine?
I invite you to the discussion
Thank you very much
I'm trying to use GridMAT to get the area per lipid and thickness so I installed activeperl on my laptop (on windows) and put the three necessary files in and ran this command:
> perl GridMAT-MD.pl param_example
It doesn't give me any valuable output.
#### I attach perl screen and its error ####
Would you help me please to get the desired results?
Usually for genetic association analysis there are lots of SNPs, but we generally select few tagSNPs based on LD value (r2)? How can we calculate the r2 value to know which SNPs are to be chosen?
In the fasta output of Prokka listing the name of genes, some genes does not have any name ("gene: NA"). My question is whether these genes are hypothetical or they do not have any name?
If the former one is the case, how Prokka determine them?
Dear RG members,
I am trying to install AMBER in parallel in one cluster having ifort compiler. I am getting the error MPIF90 command not found. I read the configure2 file in AmberTools/src, that tells I need to install serial first.
setenv AMBERHOME "amber path"
./configure -noX11 intel
It is running perfectly.
How can I proceed to add MPI run.
Kindly suggest the next commands.
Should I hit
./configure -mpi intel
or some other tricks.
I am failing each time with few errors.
Kindly share the complete commands after the searius installation steps.
I need to do MD simulation of wild type and ten variants at 50 ns. I am looking for a low-cost cloud service/ simulation environment. Would you please suggest me any?
Thanks in advance.
Can anybody help me in submitting 8 Leishmania donovani partial gene sequences in genbank (ncbi)?
I have eight sequences with small differences.
The promoter region of the ldmdr1 gene of L. donovani was amplified and the PCR products were sequenced using sanger dideoxy sequencing method. Now I have eight different sequences (450-490 nucleotides long) but I don't know the "features" of those sequences as I am not very good in molecular biology. Kindly somebody help me in submitting my sequences. I shall be highly obliged and will acknowledge the person in my PhD thesis :-)
I'm a molecular biologist, and i have a few projects coming up in transcriptomes and small RNA analysis. Can i get by without knowing any programming using user-friendly software such an Geneious Prime or another program you can suggest or is it absolutely a must?
When I load repeat simulation of my mutated protein from 10ns-20ns-30ns, its RMSD graph is in picture. I watched my dcd, my protein goes out of the water box. I tried to put it inside of the box with "pbc wrap -centersel "protein" -center com -compound residue -all" code. The protein entered the box but the RMSD values doesn't change. How can I solve this problem?
Hi to all,
I'm approaching to the haddock web-tool for the first time. I got the username and password for the easy interface.
I'd like to know wheather i'm on the right way.
Once I've uploaded the pdb files to be docked, I have to specify both the active and the passive residues.
In order to determine the active residues I have performed an NMR titration of the unlabelled protein with the labelled ligand and vice versa. Then I've calculated the chemical shift perturbation.
Now I have to determine which among them are the active residues in the protein-ligand interaction.
So, shall I have to submit the pdb to a SASA (solvent accessible surface area) calculation program and chose the chemical shift perturbation residues that match with those solvent accessible by the SASA program?
is it correct?
do you advise any software/webtool? (i know NACCESS, but there is a very tedious procedure that i have to follow in order to get codes for decrypt the rar files)
what have i do for the passive residues, is reliable the option on haddock that allows to determine them automatically?
Can you suggest some of the free journals in the field of bioinformatics, computational Biology, that provide green open access. If any one can recommend a Free-To-Publish journal with relevant scope, will be greatly appreciated!.
I downloaded a database from Binding DB. It contains a lot of duplicate structures. How can I remove these duplicate structures?
Suppose i have a DNA sequence and i want to find transcription strat site, CDS, poly A signal etc., which software will be useful to find this out?
In the context of mapping next generation sequence reads (of RNA->cDNA) to a reference genome to estimate allele specific (AS) expression:
Allelic imballence (AI: more reads mapping to one allele than another) can be due to a variety of technical and biological factors, so it is important to control for causes of AI that are not biological if you want to estimate AS expression. There are several strategies that have been developed to try and address these problems, including read masking and genomic blacklists.
What is the difference between read masking and genomic blacklists?
I've been study the expression of 5 microRNAs using TaqMan assays in several cell lines (microRNAs were selected through literature review) and I obtained statistically significant results in spite of the results were not the expected.
Is it possible now to perform gene ontology studies and construct networks with the cytoscape to better understand the role of these microRNAs in my samples without a previously global microRNA expression analysis?
Few of us wanted to create a discord server for Biophysics. What we intend is to begin a commonplace for discussions/numerical experiments. Also possibly document the results in the form of blogs or other media.
I believe that there are many biophysics/computational biophysics/Molecular Dynamics enthusiasts here. Here is the server link: https://discord.gg/qRQRq2k
Come and join us. Let us learn together.
I was trying to design a nanocluster of 10 nm diameter using Material studio(MS) software. Due to the lack of the "nm" size option in the material studio, I have used a 50Å (angstrom) radius option available in MS to construct the nanocluster. I was confused when I found 60000 atoms in the generated nanocluster of size 10 nm diameter (100Å). Whether my conversion (nm to Å ) is correct or the Å mentioned in MS is different from my conversion?. I have doubt that a 10 nm-sized nanoparticle will have 60000 atoms in its cluster form. Please help me with this issue.
I actually have two queries. Biochemical studies suggested presence of an enzyme in an organism but the gene encoding the enzyme is not known. I would like to find out gene candidates based on homology search/sequence similarity, using sequences of similar enzymes present in other organisms. My questions are:
1. What points should I consider to select an already known gene/protein for use in homology search
2. To find out the orthologue of the known gene/prtoein by bioinformatics, which database/software should I use?
Please suggest papers/websites/softwares for beginners.
I have 14 miRNA that is related to a particular disease. I want to draw a network like Gene Networking (GeneMania).I can draw a network easily by inputting the gene name in genemania but which softwere can take input miRNA name like this? Which software is better? I was trying to use Cytoscape but it require pre-networking data (if I am not wrong). I am not sure whether I can get any pre-networking data for miRNA. Some of the miRNA is quite new and some old version of the software can't recognize that one.
Please help me how I can get a network like Genemania. I only can input different miRNA name and particular disease. Thanks.
RNA docking using autodock has a different approach to deal with. What are the steps that are required to compute the gasteiger charges in particular?
I am interested in doing analysis on a set of differentially expressed genes. We use to have access to Ingenuity, where I discovered the "Upstream Regulator" tool, identifying upstream regulatory proteins enriched for your dataset.
Since we no longer have access to Ingenuity, I have been trying to find an alternative, preferably free.
I am not looking for pathway analysis tools, but specifically for these upstream regulator-tool.
I am working on FIV and have sequenced each gene individually and retrieved multiple complete genomes and individual genes from Genbank. I am trying to align all the different genes to the complete genome. I need a program that can align each individual sequence to one reference sequence. I have tried MEGA but from what I understand the pairwise alignment compares 2 subsequent sequences in the list and tries to align them then the next pair and so forth (this won't work as I don't want to align different genes), while the multiple sequence alignment tries to align each sequence to all the other sequences (this also won't work for the same reason). Is there a setting in MEGA that will allow me to set a reference sequences to which it must compare all other sequences to or another program that will allow me to do this.
The only other option I can think of is to compare each sequence individually to the reference sequence and that will be a nightmare as I have 4289 sequences. Please if you have any suggestions let me know. Thank you.
Hi everyone, currently I' m doing a course "Whole-genome sequencing and its applications" from the technical university of Denmark, working on a final project
So I have five unknown samples,
And those five unknown samples, they are in the sample genomes.
That means they are in FASTA format, not FASTQ or RARI.
They already assembled.
And in those five genomes that will get,
three of them are the outbreak strains.
So, what I have to find out from the five strains, I have to find out which one are the three outbreak strains. So in order to identify which one are the outbreak strains.
Of course, because they're unknown, I have to know what they are.
Then I have to know how to treat them.
To know those questions, they give me some hint.
For example, what they are?
I can use KmerFinder to know a species and once I know species I also can know sequence type by using MLST tool.
And then I can see if my samples contain any plasmid using PlasmidFinder.
And if my sample contains plasmid and what kind of plasmid in my sample,
I can do Pm(t) or plasma typing.
I am trying to run SANDER in parallel in cluster. In Gromacs, I used to run
dplace -c 0-59 mdrun -v -s em.tpr -c em.pdb -nt 60 (here I was defining CPU numbers 0-60, and -nt was assigning the number of nodes).
For Sander in Amber, I tried
mpirun -np 10 sander -O -i min.in -o min.out -p test-solv.prmtop -c test-solv.inpcrd -r min.rst -ref test.inpcrd
I too also have tried
mpirun -np 10 pmemd -O -i heat.in -o heat.out -p test-solv.prmtop -c min.rst -r heat.rst -x heat.mdcrd -ref min.rst -inf heat.mdinfo &
However, it is running only in one cpu. I too have tried using -nt 10 or -t 10 etc. This is firing errors.
In this regard, can somebody help me in finding the correct command that can assign 10 CPU to calculate MD using Sander. I have referred mpirun from Amber FAQ and group discussions, but these are not working in my cluster.
For bacterial taxonomy based on the 16S rRNA gene, it is currently accepted the similarity cutoff of <97% for species differentiation and <94% for genera differentiation.
A the use of only 16S fails to distinguish among closer relatives, researchers start to use multi locus sequence analysis (MLSA), where the concatenated conservers sequences of several housekeep genes are used.
Does anyone know what are the similarity threshold for species and genus delineation in this case? when using MLSA approach for identifying bacterial isolated?
Thanks in advance
Can anyone please explain me how to use Genscan tool to find exons and introns for a given gene sequence? I want to calculate sensitivity and specificity to check its performance for a given gene sequence.
Thank you in advance
Any suggestions on which software to use and I would like to know if I can use aligned gene sequences in FASTA format and then concatenate or first concatenate all the genes and then align for different species and use for phylogeny.
I've been using the refine.bio website to download normalized transcriptome data; each downloaded dataset consists in a compressed directory with an expression matrix in .tsv format, its metadata in .tsv format too and an aggregated metadata file in .json format.
I'm trying to associate the expression matrix with its metadata using R programming language, but I don't know how to do it, and I don't find the way in the site's documentation. I only know that I need reed these files with these commands:
> expression_df <- read.delim('SRP068114/SRP068114.tsv', header = TRUE,
> row.names = 1, stringsAsFactors = FALSE)
> metadata_list <- fromJSON(file = 'aggregated_metadata.json')
but I have no idea how to merge them for generating a full-informative matrix.
Can someone help me, please?
Thank you so much.
Does anybody know about published high-throughput mRNA expression data, with microRNA over-expression or knockdown vs. control experiments in HEK293 cells? It can be using any technology like micro arrays, RNA-seq, CAGE or other. Could you point to the paper and/or GEO accession?
I need a free web based or windows software based bioinformatics tool that can design an antibody against given antigen sequence. I have found Abie Pro 3.0 but i could not understood how to use it.
Where can we get best web-course beginning from basic Python programming and using Kernels till analyzing data with Machine learning algorithms.
I think most of the Biologist don't need much to know on how to write programs or design Kernels but should be able to write essential codes to analyze data through already available programs.
I found some online courses, one is: pythonforbiologists.com that appears instantly when searched with the concerned keywords, but was not able to get detailed reviews.
Another is Python for Genomic Data Science by Coursera, the course content looks good, but the reviews say that it lacks later materials (Weeks 3 and 4) in uses/applications that makes virtually impossible to finish the course.
Suggestions appreciated for recommending efficient web-course on Python for Biologist...
I am having some difficulties in understanding the functioning of JPRED4 for Multiple sequence alignment. I have aligned 300 amino acid sequences from a protein family and was able to run JPRED4 with this alignment. The secondary structure prediction is shown for the first amino acid sequence in FASTA format.
I will appreciate help with following questions:
A. Has prediction algorithm used all the sequences in multiple alignment ?
B. The secondary structure prediction is shown for the first amino acid sequence. Is this prediction valid only for the first amino acid sequence ?
C. What should I do if I am interested in an amino acid sequence different from the first amino acid sequence in FASTA file ? Can I just keep it at top ?
D. I am interested to obtain secondary structure prediction which reflects all the sequences in multiple sequence alignment. How can I use JPRED4 to obtain this result ?
Dear community members,
We have available WGBS data for a species. We used Bismark to map the reads to the reference genome. Unmethylated lambda phage DNA was spiked-in, to estimate the bisulfite conversion error rate. My question is, how can I use the information of this error rate to distinguish truly methylated reads from false positives?
I have seen that many people use a binomial distribution to achieve this, and apply a Benjamin-Hochberg correction citing their 1995 paper (https://www.jstor.org/stable/2346101?seq=1#page_scan_tab_contents).
Is there a tool or an easy way to do this? Excuse my ignorance but I am really new to the field.
Thank you for your time,
Help appreciated!! We are having Empirical data sets of plant barcodes (matK, ITS and rbcL), and want to compare analytical methodology used for Empirical data sets with the Synthetic data sets.
How can we generate such kind of Synthetic datasets? I have found relevant papers but was not able to understand methodology. I would appreciate if any one could provide detailed procedure or provide any tutorial.
REFERENCE: Supervised DNA barcodes paper:
Real DNA Barcode datasets are simulated with Coalescent package in Mesquite version 2.75 (see the related work ). The data are simulated considering time of species divergence and the effective population size (Ne), i.e., the number of individuals in a population (of a species) that are contributing genes to the succeeding generations.
Another similar paper...
Conference Paper Species Identification using DNA Barcode Sequences through S...
After reading the infamous publication "More Bang for your Bucks", I developed a workstation of my own using Xeon CPU (>3GHz, E5 series) along with one RTX and one GTX GPUs. In case of single runs, for approx 60k atom systems, I am getting 140-150ns/day.
Problem starts when I'm trying to run two simulations in parallel without overscribing (16 threads). I am even not going beyond 8 threads.
For single run, PME/PP ratio is around 1.04-1.05 and load imbalance is around 2-3% with DLB on. Fourier spacing kept at 0.10 and cut offs at 1.0 nm.
Is there any specific reason for this? Is there any way to solve this?
I'm looking for a book for microarray data analysis. I'm a mathematician and I'm interested to find a book able to give a framework for microarray data analysis (from the beginning to the end-backgroung correction, normalization, dim. reduction, clustering, etc...). I found this: http://www.springer.com/gp/book/9781402072604
There are some more appropriated ?
I am frustrated about these
compressed .vcf files that contains
a compiled list of variants related to genes.
My question is: how can I get access to the content of those files?
It is explained that they are compressed text files. So, I understand
that once decompressed, I can open them with any text editor software?
Or do I need VCFtools software to open them?
Thank you for your help!
currently using fastq-dump of sratool kit, but it is taking long time. I have to download really large data of bacterial genomes, any alternative ???
We don't have any Reviewed sequence in Uniprot and no tertiary structure available for any of the protein of this Virus. How we can work on protein level?
I am using RING web server (Residue Interaction Network Generator http://protein.bio.unipd.it/ring/ ) to generate residue interaction network from a pdb file. I am using the complex interface of RING web server for generation of network of a protein based on alpha carbon distance. The cutoff alpha carbon distance I am using is 8.0. All the other parameters are set to default values.
The problem is that the .sif file generated always has two amino acid missing. I tried out with 3/4 proteins but every time the generated .sif file has two residues missing. Why is it so? How can I solve this problem?
I am currently trying to find homologues of a protein I am working with, but BLAST has been giving me nothing useable. I have now found a dataset of 1500 protein sequences of potential candidates that I want to align to my reference sequence. I have tried Clustal, Mega, Muscle, MAFFT and pretty much everything under the sun, but with this many sequences and only limited experience, I am having trouble achieving what I want to do, as the programs simply crash or lock up after a few minutes..
Instead of the traditional multiple sequence alignment, where every sequence gets aligned to every other sequence with multiple iterations, I want all of the sequences from the dataset to only be aligned to my one reference sequence. Think of it as doing 1500 pairwise alignments only. What would be the best way to perform this kind of alignment?
I am working on Bioinformatics and Computational Biology. I am using Cytoscape tool for network visualization or analysis. Now I am working on lung cancer-related genes. In this work, I want to add some clustering methods to analyze for a better outcome. Please suggest me which one is better than any other plugin?
I already use the clusterOne plugin but want something more such like as MCL, k-medoid clustering.
Thanks for advance.
The Genetic Algorithm can be modified in different ways. Even Goldberg's book, Genetic Algorithms in Search, Optimization and Machine Learning, specifies various enhancements that could be done for genetic search.
Which is the exact method implemented in Weka for genetic search for attribute selection? The software does refer Goldberg's book, but I'm still not sure about its deeper specifics.
Some of which are
- Precise fitness evaluation function
- If fitness modification to add weight to diversity is implemented
- Structure encoding of the chromosome
- Type/variant of crossover implemented: single point, multi-point, and which point to be exact.
I am very new to RDKit and it is extremely confusing. I have a set of ligands. I have taken their SDF files as well. Now I need to find the common core structure among all the ligands. So I am thinking of using RDKit for that. If you could send me a link (other than https://www.rdkit.org/docs/GettingStartedInPython.html), or help me with it, it would be helpful. Thanks and good day!
I have two signalling networks that I want to merge with "retaining network layout". I simply merged the networks with network merging utility in cytoscape however, doing this I lost the manually adjusted layout of both the networks.
The search for shortest unique substrings (shustrings) of the genome is an important problem, for in some sense it is the mix of these shustrings that defines the phenotype of an organism. A related question concerns the set of shustrings that *is not* found within any genome, for the presence of these may correspond to specific detrimental consequence in organisms. So, my question addresses both the issue of any prior review (there has been) and any discussion of justification for the observance of non-representation of these shustrings within organisms. The shorter the length of such sequences, the more important would be any corresponding justification, I should think.
I want to analyze ligand-protein interaction (3D) with bioinformatics tools, please give recommendation about reliable software or web service.
Until now, there has been many programs developed for the specific task of peptide spectra matching. A few commonly used ones in proteomics studies are : MSGFPlus, Sequest, Mascot, Andromeda, X!Tandem, PEAKS DB, Myrimatch, pFind, Comet and OMSSA.
Personally, I uses MSGFPlus most often. Its performance is also recognized by many users. Another reason I like, it is a Java program and easy to integrate to different bioinformatic pipelines.
MS-GF+ makes progress towards a universal database search tool for proteomics. Nature Communications (2014)
See the article here. https://www.nature.com/articles/ncomms6277.
For label-free proteomics, it is more common to use Andromeda+MaxQuant combination. It is distributed as a Windows program with nice and friendly interface to work with.
Unfortunately, as a matter of fact, only 35-40% of peptide spectra in a bottom-up proteomics experiment (data dependent acquisition) are identified. Recently, a few programs using deep learning, neural networks are developed. I listed a few here. These listed method claimed a significant improvement over previous methods.
Is there anyone who has tested these tools and would like share their experiences? Discussions are very much welcome here.
Protein identification with deep learning: from abc to xyz. arXiv:1710.02765, 2017.
De novo peptide sequencing by deep learning. Proceedings of the National Academy of Sciences, 2017.
Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nature Methods volume 16, pages 63–66 (2019)
Dynamic Bayesian network (DBN)
Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry. Uncertain Artif Intell. 2014; 30: 320–329.
Proteomics of natural bacterial isolates powered by deep learning-based de novo identification. bioRxiv September 27, 2018.
Peptide-Spectra Matching from Weak Supervision. arXiv 22 Aug 2018
DeepIso: A Deep Learning Model for Peptide Feature Detection. 9 Dec 2017
where can i find good molecular dynamics simulations papers without experimental (wet lab) counterparts. This is for computational biologist without a wet lab set up. Given that most journals require computational works supported or validated by experimental work. Wonder if there are journal that consider pure computational works without experimental work. Particularly interested in areas of protein-ligand, ion channel, membrane simulation etc.
I’ve used the same normalization and chip to search a gene expression in R2 Genomics Analysis and Visualization Plataform, but I have compared very different cells (like lung cancer and melanoma). So my doubt is: is it correct/safe to say that a gene is more expressed in lung cancer, for example, than in melanoma (p<0,05) based on comparations made in R2 Genomics?
Ps: my intention is to prove this experimentally, and the results obtainned on R2 would serve as a guide for me.
I have 1000 pdb files. I want add polar hydrogen in the files. Could someone suggest any command line program to add polar hydrogens to pdb files?
As the wall time is up, the production run stopped. now i have prd.cpt & prd_prev.cpt. so what is the difference between them and which one should be used in restarting simulation.Also in tutorial topol.tpr is a topology file or a binary file produce by grompp.
Several .tpr file have produced during simulation like "#prd.tpr.67# ", "#prd.tpr.64#" and "prd.tpr ". what is difference between them ? which one should be used to restart the simulation?