Science topic

Bioinformatics and Computational Biology - Science topic

Explore the latest questions and answers in Bioinformatics and Computational Biology, and find Bioinformatics and Computational Biology experts.
Questions related to Bioinformatics and Computational Biology
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I am looking for journals that will publish newly developed tool/server/web application/pipeline that are useful in biology, or a newly curated database with biological significance.
Can anyone kindly suggest some journals that publishes Bioinformatics and Computational Biology papers that will publish -
  • Bioinformatics Tools/Servers (Machine Learning, Deep Learning based or else)
  • Text Mining
  • Databases
  • Datasets
  • Pipeline etc.
I know a few such as:
  1. Bioinformatics
  2. Nucleic Acids Research
  3. Database
  4. GigaScience
  5. Nature Scientific Data
  6. Nature Computational Science
  7. Briefings in Bioinformatics
  8. BMC Bioinformatics
  9. PLOS Computational Biology
  10. Journal of Cheminformatics
If you know more, kindly suggest the journal names. Thank you in advance.
Relevant answer
Answer
I think my suggestion still holds. If your tool is usefull for let's say people working on viruses journals that are focussed on this topic might be interested.
Perhaps a good way might be to 'just' search for "bioinformatics tool" in Google Scholar to see which type of journals 'pop up' (besides the ones you already know).
Again good luck.
Best regards.
  • asked a question related to Bioinformatics and Computational Biology
Question
15 answers
can ny1 help me out in converting maestro format to pdb???
thanx in advnce.
Relevant answer
Answer
Thank you Rahul sir.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I have been working on a protein-ligand complex simulation. While I have been careful all the way in preparing the necessary files including the .top and the .gro files I have come across an error stating "2 particles communicated to PME rank 4 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x" while running the mdrun. Initial lookout into this issue gave indications of the system getting blown up. I initially tried to troubleshoot the issue by lessening the time steps as suggested in the gromacs documentation but couldn't resolve the issue. Could anybody give suggestions regarding this issue? 
Thanks
Relevant answer
Answer
From the page:
Possible causes include:
  • you didn't minimize well enough,
  • you have a bad starting structure, perhaps with steric clashes,
  • you are using too large a timestep (particularly given your choice of constraints),
  • you are doing particle insertion in free energy calculations without using soft core,
  • you are using inappropriate pressure coupling (e.g. when you are not in equilibrium, Berendsen can be best while relaxing the volume, but you will need to switch to a more accurate pressure-coupling algorithm later),
  • ...
There is a troubleshooting list here on how to diagnose the problem. I suggest trying the list first.
I'm sorry I don't have a better answer. I'm having similar issues and if I solve them I'll return with more info.
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
How to determine which bacterial virulence factor (bacterial toxins or cell wall components) in relevance to human sepsis or bacterial infection will interact or regulate my target protein of interest. I have examined with LPS treatment in a dose and time dependent fashion. However, I did not notice any difference in expression. Are there any panel of bacterial virulence factors commercially available or bioinformatically possible?
Relevant answer
Answer
But it doesn't contain data about all pathogenic bacteria. As I'm searching toxins of Gardrenella vaginalis and couldn't get that.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Relevant answer
Answer
Thank you for your answer. If you mean the university library, I have no access to it currently due to the pandemic, Hidetoshi Shimizu
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Hi,
I am studying a simultaneous proton transfer, bond breakage and nucleophilic attack (by water molecule), using US approach for which I had already performed 5ns QM/MM simulation.
All three reactions takes places in a single step (Inversion mechanism for Glycoside hydrolase). Now, I am confused in defining the restraint variables.
I have selected 4 Reaction Coordinates:
1. RC1: Proton transfer from Base residue to leaving group
OE1-HE1 -> C----O4 (this glycosidic bond breaks and HE1 is transferred to O4 )
So, the reaction coordinate for this reaction is difference in distance between OE1-HE1 and O4--HE1.
2. RC2: Glycosidic bond breakage:
C-----O4 -> C O4. The reaction coordinate for this reaction is the distance between C and O4
3. RC3: Nucleophilic attack by water:
H(i)O(w)H(w) [this is nucleophilic water] ---- C (anomeric carbon of the broken glycosidic bond). The reaction coordinate for this reaction is the distance between C and O(w).
4. RC4: Proton transfer from water (H(i)) to Acid Residue
H(i)O(w)H(w) -- OD1 (Acid residue). The reaction coordinate for this step is difference in distance between O(w)-H(i) and OD1-H(i).
For the RC2, I have made the following restraint file:
# distance restraint
&rst iat=8122,8132 r1=0, r2=1.8, r3=1.8, r4=5, rstwt=1,-1, rk2 = 500.0, rk3 = 500.0, /
I have increased the the value for r2 & r3 by 0.2 and upto 3.4. I am not able to understand what should be the value for r1 and r4 ? Could anyone pls comment on it and explain it briefly?
I also not able to understand how to make the restraint file for difference in distances between two set of atoms, as in case of RC4 and RC1. I would be helpful for me if somebody explains it too with an example.
I also want to visualize all the four reaction steps so which trajectory files from all the four RCs I should see?
Since I am new to US, it would be a great help if somebody can guide me through this.
Regards
BHARAT
Relevant answer
Answer
Bharat Gupta how you become able to write your umbrella sampling input disang file (as mentioned in the amber tutorial) using the LCOD method. Can you tell me, please?
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I have RNA-Seq data for different cell lines and I'm looking to find lncRNAs which maybe deferentially expressed.
Relevant answer
Answer
Is there any method to work with NONCODE in R?
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Hi everyone! I'm trying to work on the acquisition of the Raman Spectra of a leaf section using Confocal Raman Spectroscopy. The samples to be used are pure, dried, and powdered leaf samples. I am going to use a 785 nm laser source.
However, the only thing I get was a spectra with no peaks or it is strongly masked by fluorescence. Do you have any tricks/sample preparations to avoid the fluorescence because I'm afraid that it covers the raman signal or enhance the Raman Signal because the compounds might have a relatively weak Raman Signal compare to the background signal and the fluorescence? Are there any sample preparations that can be done without the use of water or an immersion objective like the use of solid matrices which can be mixed with the sample? Thank you. 
Relevant answer
Answer
All of the suggesttions can be not feasible for John. I guess Raman microscope is what he has had and he needs to obtain Raman peaks out of the strong fluorescence background.
John, may I ask what is the model of your equipment? In general, you could try different configuration to enhace Raman peaks and reducing fluorescence background: objective lens, generally the larger magnifier the better; the shorter exposure time with larger acquisition numbers; adjusting sampling focal point, the best Raman response obained not always from the focused sample; sampling on larger particles/or particulates, the larger particles the smaller surface area and weaker fluorescence.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi,
We were carrying out in vacuo energy minimization studies of a protein dimer (which is experimentally proven to be a dimer). Earlier, the same work has been done in our lab using an older version of GROMACS (4.5.5) and used Group cutoff schemes with coulomb type= cutoff and with no pbc.
When we reinitiated the work again and have to use the Gromacs 5.0.4, the default cutoff scheme is changed to Verlet. We are observing that using Verlet cutoff scheme, the monomers dissociate from each other which is not the case even in this version when using Group cutoff scheme.
I searched for literatures and found out the differences are probably in the pairlist generation. In my graduate courses, I have read about energy drift in molecular dynamics simulation and is aware (though not in details) that Verlet algorithm has something to do with it.
Can anyone elucidate on this problem? The minimization runs fine and the protein remains dimerized when using Group cutoff. This happens even after solvation. We have used an xyz pbc and grid neighbour searching type with default fourier spacing and rlist as we have not mentioned the last two parameters explicitly in the mdp file.
I want to know the theory in play behind this. Please help.
Relevant answer
Answer
I believe the answer to this question is covered here:
  • asked a question related to Bioinformatics and Computational Biology
Question
23 answers
I have performed RAPD for V. cholerae isolates with 1281 and 1283 random primers and found a distinct band pattern. I have attached a picture.
Relevant answer
Answer
You can use GelJ software, it is java-based, free, and user friendly, which can draw different dendrogram with different analysis.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I am looking fora command that will modify 3 chains available in the original pdb into a single chain and then renumber all of the residues. I have tried using alter command but when I export the pdb I get only one chain (of the initial trimer) and not the merged chain
Relevant answer
Answer
You can first renumber the chains using the alter command (https://pymolwiki.org/index.php?title=Alter&redirect=no) in such a way that each residue has a unique residue number,
e.g.
alter (chain B),resi=str(int(resi)+100)
alter (chainC),resi=str(int(resi)+200)
to give chains B and C an offset of 100 and 200, respectively.
then again use the alter command to change the chain label
e.g.
alter (all), chain='A'
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I am looking for a recent diagnosis for chikungunya virus through computational biology techniques.
Relevant answer
Answer
Molecular docking
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
So my last year project is Drug Efflux Pumps and Persistence in Methicillin Resistant Staphylococcus aureus and we gonna focus on persister cells to study the path way of antimicrobial resistance...my question is how can i link bioinformatics and some coding to this project without requiring wgs cause it's not an option inside our lab !I need a small yet beneficial technique/ tools in small scale that i can learn and implement by my self .PS I love programming in general but im still new to bioinformatics so i need help to link my passion for coding and my field "biotechnology"
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Hope everyone is having a good day.
I want to learn computational biology. I have a PhD. in pharmacology. Lots of times I heard about the computational biology/bioinformatics but never had a guideline how to learn or to start this interesting field of research.
It would be very helpful if you can guide me through this.
Have a nice day.
Relevant answer
Answer
Dear Apu, Bioinformatics is a mean not an end in itself and the confusion of a mean with an end is the reason of the drmatic crisis science (not technology) is experiencing in these days (see for example https://www.pnas.org/content/113/34/9384.short).
Thus, first of all, you must aquire a 'quantitative sensibility' for biological problems that means: in the face of a biological problem how to restate the issue in order to have a simple recognition of which are the statistical units, the variables of interest , the most interesting scale where to look and if I can provide a suitable metrics preserving the original biological meaining.
Then the informatics will come by alone, tis means you must learn statistics (with a special emphasis on multidimensional descriptive methods like PCA, Cluster Analysis, MDS..), complex networks analysis, non-linear dynamics fundamentals (what an attractor is, what is a transition) and fundamentals of probability.
Attached you will find a sketchy representation of the quantitative needs for facing biological problems.
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
(For survival tests and probit analysis)
Relevant answer
Answer
Dear Mr. Kapoor,
you will have to email Dr. Ehab Mostafa Bakr the registration number of your Ldp line software to get the key number. the registration number can be found on the registration window when you open ldp line on trial verson. his email address is [email protected]. I got mine for free.
  • asked a question related to Bioinformatics and Computational Biology
Question
10 answers
Please suggest some protein ligand docking servers to do docking online
also need some webservers that allow the multiplwe ligands at a time to doking
thanking you to all the knowledgable persons
Relevant answer
Answer
You might try Webina too: https://durrantlab.pitt.edu/webina/ . It runs AutoDock Vina in your browser, without having to install anything.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
I am curious if there exist any bioinformatics tools that can predict changes in secondary structure for proteins and/or nucleic acids. For instance, say a C-terminal loop on a protein reorganizes into a helix in response to binding to RNA. Or say an intrinsically unstructured segment of RNA forms a transient stem-loop in order to bind to a protein. Are there any computational tools that could predict such a change short of performing in-lab experiments?
Relevant answer
Answer
As indicated by Farhana Rumzum Bhuiyan you most likely have to look at program packages like GROMACS (requires quite some efforts to understand the way it works).
It sounds to me that you need to enter the field of molecular docking:
A variety of programs (and approaches like molecular dynamics as used by GROMACS and Monte Carlo etc.) can be used:
Hope this helps you further.
Best regards.
PS. The suggestions like PSIPRED are excellent ways to get a pretty reliable prediction of the protein (itself) but does not allow to combine this with a prediction of the same protein in the presence of some kind of ligand.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Hello, im Phd student, In my master's thesis, I investigated the cytotoxic, apoptotic and cell cycle effects of an anticancer drug (Danusertib) on pancreatic cancer cells (CFPAC-1and Mia-PaCa-2) by using xCelligence and Flow cytometry in Cell culture lab.
However, I want to do my Phd thesis with virtual experiments using databases ( OMIM, COSMIC, GAD, TCGA) and computer power (maybe on Amazon web services, google cloud or azure) due to financial insufficiency and I like to spend time with computers. So I don't know where to start research about these things and can I do a logical research with these databases? Can anyone give a tip or advice ?
Relevant answer
Answer
Yes, you can use these datasets for research work equivalent to a PhD thesis. As a reference, you can check the publications by TCGA and other groups which utilized TCGA data. A series of these publications have been published by Cell Press as TCGA-Pan Cancer Atlas.
You can see, just in silico work published in the Cell Press journals. But, before thinking of that extend i.e., to entirely rely on these datasets, think what novel question you can address. If you have a highly relevant question, you can go for it. Otherwise, a simple and safe plan can be using hypothesis generation by datasets followed by validation using in vitro studies or vice-versa. This type of combinational work is regularly published and will be more acceptable to most universities and individuals. All the best.
You can check our papers also where we have used simple tools to analyze TCGA data.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Besides gene essentiality and non homology with human proteins.
Relevant answer
Answer
Receptors are good targets, because the drug does not have to enter the cell. Proteins expressed in large quantity are poor targets, because you need a lot of drug. Proteins with fewer variants are better targets, because it may be harder for the pathogen to evolve resistance.
  • asked a question related to Bioinformatics and Computational Biology
Question
12 answers
INVdock software has been used to predict the first receptor of drug with low molecular weight and finding or predicting cell target, I would be thankful to anybody who could let me know how could I get the software and procedure for working with the same.
How is the popularity of INVdock software?
is this software free and what is the procedure of working with that?
Relevant answer
After all that time, unfortunately, it still unaccessible.
Kindly see the attached file and the link below:
Regards
  • asked a question related to Bioinformatics and Computational Biology
Question
23 answers
Applications of bioinformatics in medicine is a key factor in technological advancement in the field of modern medical technologies.
In which areas of medical technology are the technological achievements of bioinformatics used?
What are the applications of bioinformatics in medicine?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
In precision med... Precision medicine is rapidly emerging as a strategy to tailor medical treatment to a small group or even individual patients based on their genetics, environment and lifestyle. Precision medicine relies heavily on developments in systems biology and omics disciplines, including metabolomics. Combination of metabolomics with sophisticated bioinformatics analysis and mathematical modeling has an extreme power to provide a metabolic snapshot of the patient over the course of disease and treatment or classifying patients into subpopulations and subgroups requiring individual medical treatment.... Azad, R. K., & Shulaev, V. (2019). Metabolomics technology and bioinformatics for precision medicine. Briefings in bioinformatics, 20(6), 1957-1971.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
Can anyone help with a simplified work flow for Allele Specific Expression from RNA Seq data?
Relevant answer
Answer
Just to add a very different approach to all the ones mentioned already: you could also use competitive read mapping. The idea is that you map your reads for individual 1 to two parental genotype-specific references. Then you parse both BAM files and compare the alignment and mismatch score for unique mapping reads overlapping a given genomic interval. Then count the reads in the sets that are best in each reference.
I wrote a Python program that can streamline this hosted on GitHub (https://github.com/santiagosnchez/CompMap). In addition, the program will assign the number of equally good mapping reads to each genotype-specific count. You can do this with a direct measurement or stochastically by sampling from a binomial distribution.
I've tested it using simulated reads and it works pretty well for genes with species-level divergence. I still need to test it with genes with less/population-level divergence.
Feel free to give it a try.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I'm trying to use GridMAT to get the area per lipid and thickness so I installed activeperl on my laptop (on windows) and put the three necessary files in and  ran this command:
> perl GridMAT-MD.pl param_example
It doesn't give me any valuable output.
#### I attach perl screen and its error ####
Would you help me please to get the desired results?
Relevant answer
Answer
In case you are still facing the problem, can you please post your param example file?
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Usually for genetic association analysis there are lots of SNPs, but we generally select few tagSNPs based on LD value (r2)? How can we calculate the r2 value to know which SNPs are to be chosen?
Relevant answer
Answer
Plink v.1.9 - Pairwise LD measures for multiple SNPs (genome-wide)
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
I have done my dockings of a ligand to a protein. I want to save protein-ligand complex as a PDB file in AutoDock so that protein viewer can see it. Or Why is it that Pymol did not see Autodock result I saved through ''write complex''.Thanks
Relevant answer
Answer
The reason might be the file type, while browsing for docking complex, change the file format accordingly, pymol will visualize your structure.
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
In the fasta output of Prokka listing the name of genes, some genes does not have any name ("gene: NA"). My question is  whether these genes are hypothetical or they do not have any name?
If the former one is the case,  how Prokka determine them?
Relevant answer
Answer
If you mean gene feature, then you can use --addgenes option for Prokka. Sebawe Syaj
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Dear RG members,
    I am trying to install AMBER in parallel in one cluster having ifort compiler. I am getting the error MPIF90 command not found. I read the configure2 file in AmberTools/src, that tells I need to install serial first.
 for searial 
setenv AMBERHOME "amber path"
./configure -noX11 intel
make install 
 
It is running perfectly.
 
How can I proceed to add MPI run. 
Kindly suggest the next commands.
Should I hit 
./configure -mpi intel
make install
or some other tricks.
 
I am failing each time with few errors.
 
Kindly share the complete commands after the searius installation steps.
 
 
Relevant answer
Answer
Bikash Ranjan Sahoo What was the command which you have given for the successful run of MPI? Could you please share here.
Thank you.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Hi everyone,
I need to do MD simulation of wild type and ten variants at 50 ns. I am looking for a low-cost cloud service/ simulation environment. Would you please suggest me any?
Thanks in advance.
Relevant answer
Answer
you can check docker: https://www.docker.com/
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Can anybody help me in submitting 8 Leishmania donovani partial gene sequences in genbank (ncbi)?
I have eight sequences with small differences.
The promoter region of the ldmdr1 gene of L. donovani was amplified and the PCR products were sequenced using sanger dideoxy sequencing method. Now I have eight different sequences (450-490 nucleotides long) but I don't know the "features" of those sequences as I am not very good in molecular biology. Kindly somebody help me in submitting my sequences. I shall be highly obliged and will acknowledge the person in my PhD thesis :-) 
Relevant answer
Answer
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I'm a molecular biologist, and i have a few projects coming up in transcriptomes and small RNA analysis. Can i get by without knowing any programming using user-friendly software such an Geneious Prime or another program you can suggest or is it absolutely a must?
Relevant answer
Answer
Hi,
To be an efficient bioinformatician, you need to learn at least any programming language. You need not[ be high-end developer, but at least know how to do your bits. Also, you can be comfortable and can use various user-friendly GUI, but it would need more time and space, whereas in coding you can customize, according to your needs.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
When I load repeat simulation of my mutated protein from 10ns-20ns-30ns, its RMSD graph is in picture. I watched my dcd, my protein goes out of the water box. I tried to put it inside of the box with "pbc wrap -centersel "protein" -center com -compound residue -all" code. The protein entered the box but the RMSD values doesn't change. How can I solve this problem?
Relevant answer
Answer
I asumed you use Gromacs package for rmsd calculation.
So first you should remove all pbc issue in your trajectory file. Try following steps:
1. If your system contains protein+ligand, make an Index file having group of protein+ligand
2. gmx trjconv -f xxx.xtc -s xxx.tpr -n index.ndx -o xxx_nowater.xtc -pbc cluster
select group number of protein+ligand.
3. now calculate rmsd using xxx_nowater.xtc
Hope it will work
  • asked a question related to Bioinformatics and Computational Biology
Question
12 answers
Hi to all,
I'm approaching to the haddock web-tool for the first time. I got the username and password for the easy interface.
I'd like to know wheather i'm on the right way.
Once I've uploaded the pdb files to be docked, I have to specify both the active and the passive residues.
In order to determine the active residues I have performed an NMR titration of the unlabelled protein with the labelled ligand and vice versa. Then I've calculated the chemical shift perturbation.
Now I have to determine which among them are the active residues in the protein-ligand interaction.
So, shall I have to submit the pdb to a SASA (solvent accessible surface area) calculation program and chose the chemical shift perturbation residues that match with those solvent accessible by the SASA program?
is it correct?
do you advise any software/webtool? (i know NACCESS, but there is a very tedious procedure that i have to follow in order to get codes for decrypt the rar files)
thank you.
what have i do for the passive residues, is reliable the option on haddock that allows to determine them automatically?
Bye
Relevant answer
Answer
HADDOCK is a very good Protein-Protein docking server, and the new upgraded version, HADDOCK 2.4 is much more advanced comparatively. Besides the results from the server come refined and energy-minimized by default. For knowing the active and passive interacting residues, you dont even have to other software, since a server named CPORT, from the same developers can help you with the list of active and passive residues in the pdb files you have uploaded, and hence these residues can be used for docking analysis in HADDOCK 2.4.
Here are the links for the both these servers:
Link for HADDOCK 2.4 server:
Link for CPORT server:
Hope it helps.
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
Can you suggest some of the free journals in the field of bioinformatics, computational Biology, that provide green open access. If any one can recommend a Free-To-Publish journal with relevant scope, will be greatly appreciated!.
Relevant answer
Answer
I have extracted and compiled necessary information about all academic journals in excel format here. It also mentions which journals are free (subscription available) or paid open access:
please request and recommend on the above link.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
Mainly in B cells
Relevant answer
Answer
A good protein designing is required in order to expect it to induce a strong immune response.
Here are some of the criteria you can follow:
1. Choosing the epitopes, i.e., the epitopes with good score and good binding affinities
2. The chosen epitopes should be promiscuous in nature, i.e., efficient in covering a wide range of immune molecules
3. The protein should have good antigenic score
4. The protein should be highly immunogenic and efficient to evoke the hypothesized immune response
5. The protein should contain sequences and conformation that can be easily identified by molecules such as TLRs
In case the protein is not immunogenic, as in upto the mark, it can be enhanced by the connected it with an adjuvant molecule, via suitable linkers.
Hope it helps.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I downloaded a database from Binding DB. It contains a lot of duplicate structures. How can I remove these duplicate structures?
Relevant answer
Answer
I realize this is a very old post. Nevertheless, I think that many people might encounter the problem of removing duplicates from a multi-structure file. One convenient way to do this is to use OpenBabel. Suppose you have an SDF file ("oldfile.sdf") that contains multiple structures and you would like to generate a new SDF file ("new_noduplicates.sdf") with duplicates removed. In a terminal or command prompt, do the following:
> obabel -i sdf oldfile.sdf -o sdf -O new_noduplicates.sdf --unique
The new file is generated and the output in the terminal window shows a listing of all the duplicates that were identified in the process.
  • asked a question related to Bioinformatics and Computational Biology
Question
11 answers
Suppose i have a DNA sequence and i want to find transcription strat site, CDS, poly A signal etc., which software will be useful to find this out?
Relevant answer
Answer
#GlimmerHMM is a new gene finder based on a Generalized Hidden Markov Model (GHMM). It actually helps you to annotate the draft genome by running a few simple commands.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
In the context of mapping next generation sequence reads (of RNA->cDNA) to a reference genome to estimate allele specific (AS) expression:
Allelic imballence (AI: more reads mapping to one allele than another) can be due to a variety of technical and biological factors, so it is important to control for causes of AI that are not biological if you want to estimate AS expression. There are several strategies that have been developed to try and address these problems, including read masking and genomic blacklists.
What is the difference between read masking and genomic blacklists?
Thanks!!!
Relevant answer
Answer
Do you know about Blacklist of Biological Factors ?
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Hi everyone!
I've been study the expression of 5 microRNAs using TaqMan assays in several cell lines (microRNAs were selected through literature review) and I obtained statistically significant results in spite of the results were not the expected.
Is it possible now to perform gene ontology studies and construct networks with the cytoscape to better understand the role of these microRNAs in my samples without a previously global microRNA expression analysis?
Relevant answer
Answer
Hi everyone!
I've been using the app BiNGO from Cytoscape but my output is always empty when i select my network for analysis. Can anyone help me to resolve this?
In attachement is the output that i obtain.
Thank you. :)
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Few of us wanted to create a discord server for Biophysics. What we intend is to begin a commonplace for discussions/numerical experiments. Also possibly document the results in the form of blogs or other media.
I believe that there are many biophysics/computational biophysics/Molecular Dynamics enthusiasts here. Here is the server link: https://discord.gg/qRQRq2k
Come and join us. Let us learn together.
Relevant answer
Answer
Dear Devanand,
Can you explain more what is the purpose of this post and what the discord means?
Bog
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
I was trying to design a nanocluster of 10 nm diameter using Material studio(MS) software. Due to the lack of the "nm" size option in the material studio, I have used a 50Å (angstrom) radius option available in MS to construct the nanocluster. I was confused when I found 60000 atoms in the generated nanocluster of size 10 nm diameter (100Å). Whether my conversion (nm to Å ) is correct or the Å mentioned in MS is different from my conversion?. I have doubt that a 10 nm-sized nanoparticle will have 60000 atoms in its cluster form. Please help me with this issue.
Thanking you
Hari Prasath
Relevant answer
Answer
What is the material that you are using for cluster?
Do you apply PBC or without?
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I actually have two queries. Biochemical studies suggested presence of an enzyme in an organism but the gene encoding the enzyme is not known. I would like to find out gene candidates based on homology search/sequence similarity, using sequences of similar enzymes present in other organisms. My questions are:
1. What points should I consider to select an already known gene/protein for use in homology search
2. To find out the orthologue of the known gene/prtoein by bioinformatics, which database/software should I use?
Please suggest papers/websites/softwares for beginners.
Thanks!
Relevant answer
Answer
1. It is recommended that you use genes / proteins in which the functionality has been verified using a database of nearby species for which there is information.
2. You can use the NCBI database to obtain the complete sequence of the genes or use the InterPro database to obtain the domains(Recommendable). Then align your sequences with one of the two databases using BLAST.
  • asked a question related to Bioinformatics and Computational Biology
Question
29 answers
I have 14 miRNA that is related to a particular disease. I want to draw a network like Gene Networking (GeneMania).I can draw a network easily by inputting the gene name in genemania but which softwere can take input miRNA name like this? Which software is better? I was trying to use Cytoscape but it require pre-networking data (if I am not wrong). I am not sure whether I can get any pre-networking data for miRNA. Some of the miRNA is quite new and some old version of the software can't recognize that one.
Please help me how I can get a network like Genemania. I only can input different miRNA name and particular disease. Thanks.   
Relevant answer
Answer
We recently developed miRViz to interpret microRNA datasets using microRNA networks:
To build miR-mRNA network, a Canadian group has developed miRNet: https://www.mirnet.ca/miRNet/home.xhtml
Both may be useful for you. No need for computational skills. I would suggest first miRViz, and then miRNet.
If you want more information, we also published in 2015 a paper building microRNA networks (we use these networks in miRViz):
'hope this will help you.
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
RNA docking using autodock has a different approach to deal with. What are the steps that are required to compute the gasteiger charges in particular?
Relevant answer
Answer
're you got the answer already? because I have a same problem
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
During preparation of .pdbqt file I had a warning  like that : Total kollman charge added = -166.952. What's that? explain please.
Relevant answer
Answer
Use prepare_receptor4.py
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I can not make flexible and rigid.pdbqt files of DNA. I can not  select sugar and nucleotides separately .
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
I am interested in doing analysis on a set of differentially expressed genes. We use to have access to Ingenuity, where I discovered the "Upstream Regulator" tool, identifying upstream regulatory proteins enriched for your dataset. 
Since we no longer have access to Ingenuity, I have been trying to find an alternative, preferably free. 
I am not looking for pathway analysis tools, but specifically for these upstream regulator-tool.  
Thank you, 
Relevant answer
Answer
We settled for Enrichr - transcription factor binding. Though it is not ideal since no experiment-specific background can be uploaded.
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
I am working on FIV and have sequenced each gene individually and retrieved multiple complete genomes and individual genes from Genbank. I am trying to align all the different genes to the complete genome. I need a program that can align each individual sequence to one reference sequence. I have tried MEGA but from what I understand the pairwise alignment compares 2 subsequent sequences in the list and tries to align them then the next pair and so forth (this won't work as I don't want to align different genes), while the multiple sequence alignment tries to align each sequence to all the other sequences (this also won't work for the same reason). Is there a setting in MEGA that will allow me to set a reference sequences to which it must compare all other sequences to or another program that will allow me to do this.
The only other option I can think of is to compare each sequence individually to the reference sequence and that will be a nightmare as I have 4289 sequences. Please if you have any suggestions let me know. Thank you.
Relevant answer
Answer
Hi,
you can use MUSCLE. It allows you to align two alignments with each other (known as profile-profile alignment),
For single sequence in the profile:-
muscle -profile -in1 first.fasta -in2 second..fasta -out combined_result.fasta
this aligns the alignments in the two files first.fasta and second.fasta to each other and writes them to combined_result.fasta
You can also do the same profiles thing with Mafft.
with ClustalW2:-
clustalw2 -profile1=data.fa -sequences -profile2=first.fa
assuming one seq is first.fa and the prealigned others are in data.fa
you can also look AVID
Bray, N., Dubchak, I. and Pachter, L. (2003) AVID: A Global Alignment Program. Genome Res., 13:97
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi everyone, currently I' m doing a course "Whole-genome sequencing and its applications" from the technical university of Denmark, working on a final project
So I have five unknown samples,
And those five unknown samples, they are in the sample genomes.
That means they are in FASTA format, not FASTQ or RARI.
They already assembled.
And in those five genomes that will get,
three of them are the outbreak strains.
So, what I have to find out from the five strains, I have to find out which one are the three outbreak strains. So in order to identify which one are the outbreak strains.
Of course, because they're unknown, I have to know what they are.
Then I have to know how to treat them.
To know those questions, they give me some hint.
For example, what they are?
I can use KmerFinder to know a species and once I know species I also can know sequence type by using MLST tool.
And then I can see if my samples contain any plasmid using PlasmidFinder.
And if my sample contains plasmid and what kind of plasmid in my sample,
I can do Pm(t) or plasma typing.
Relevant answer
Answer
Your question is not quite clear enough for me to understand. It sounds like you would be asking about bacteria infecting humans. But it could possibly be an agricultural (pigs, cattle, horses, chickens, fish etc) or other type of outbreak. And even within human bacterial infections or "outbreaks" there is a large difference between different types of bacteria. The Escherchia coli that carry a shiga toxin gene on a plasmid (STEC), for example, are quite different from Klebsiella pneumoniae or other "outbreak" bacteria.
Some plasmids (or phages carried on plasmids) are rather promiscuous and can travel between rather diverse host bacterial "species", while others are usually associated with a single bacterial lineage. Sometimes it is a drug resistance issue, making the infection more difficult to treat, and sometimes it is a bacterial toxin causing a more accute issue. Some bacteria are spread by wastewater or other unsanitary conditions, and some are food-borne or require close human-to-human contact.
Anyway, the type of "detective work" needed in the databases to sort this out can be quite different for the different bacteria.
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
I am trying to run SANDER in parallel in cluster. In Gromacs, I used to run
dplace -c 0-59 mdrun -v -s em.tpr -c em.pdb -nt 60 (here I was defining CPU numbers 0-60, and -nt was assigning the number of nodes).
For Sander in Amber, I tried
mpirun -np 10 sander -O -i min.in -o min.out -p test-solv.prmtop -c test-solv.inpcrd -r min.rst -ref test.inpcrd
I too also have tried
mpirun -np 10 pmemd -O -i heat.in -o heat.out -p test-solv.prmtop -c min.rst -r heat.rst -x heat.mdcrd -ref min.rst -inf heat.mdinfo &
However, it is running only in one cpu. I too have tried using -nt 10 or -t 10 etc. This is firing errors.
In this regard, can somebody help me in finding the correct command that can assign 10 CPU to calculate MD using Sander. I have referred mpirun from Amber FAQ and group discussions, but these are not working in my cluster.
Relevant answer
Answer
Here is my amber parallel note. It is in turkish but all you need is commands. So I think you can take advantages : https://www.evernote.com/shard/s685/sh/cf35c8ff-cba9-0f18-d20a-e664cd0d14d9/16b27a12b90c39faf16ea63e11c79d50
  • asked a question related to Bioinformatics and Computational Biology
Question
29 answers
Which book should I read to understand bioinformatics from the very beginning?
Relevant answer
Answer
Hi we have been using these books :introduction to bioinformatics by arthur lesk.
understading bioinformatics by jeremy baum.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Hello everyone, 
For bacterial taxonomy based on the 16S rRNA gene, it is currently accepted the similarity cutoff of <97% for species differentiation and <94% for genera differentiation.
A the use of only 16S fails to distinguish among closer relatives, researchers start to use multi locus sequence analysis (MLSA), where the concatenated conservers sequences of several housekeep genes are used.
Does anyone know what are the similarity threshold for species and genus delineation in this case? when using MLSA approach for identifying bacterial isolated?
Thanks in advance
Relevant answer
Answer
Nithin Sam .R You can do that by collecting the sequence of the housekeeping genes and create a phylogenetic Tree using software like MEGA (freely available).
Cheers,
Timsy
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi,
Can anyone please explain me how to use Genscan tool to find exons and introns for a given gene sequence? I want to calculate sensitivity and specificity to check its performance for a given gene sequence.
Thank you in advance
Relevant answer
Answer
Explanation to understand GENSCAN tool
Gn.Ex : gene number, exon number (for reference)
Type : Init = Initial exon (ATG to 5' splice site)
Intr = Internal exon (3' splice site to 5' splice site)
Term = Terminal exon (3' splice site to stop codon)
Sngl = Single-exon gene (ATG to stop)
Prom = Promoter (TATA box / initation site)
PlyA = poly-A signal (consensus: AATAAA)
S : DNA strand (+ = input strand; - = opposite strand)
Begin : beginning of exon or signal (numbered on input strand)
End : end point of exon or signal (numbered on input strand)
Len : length of exon or signal (bp)
Fr : reading frame (a forward strand codon ending at x has frame x mod 3)
Ph : net phase of exon (exon length modulo 3)
I/Ac : initiation signal or 3' splice site score (tenth bit units)
Do/T : 5' splice site or termination signal score (tenth bit units)
CodRg : coding region score (tenth bit units)
P : probability of exon (sum over all parses containing exon)
Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)
Comments
The SCORE of a predicted feature (e.g., exon or splice site) is a log-odds measure of the quality of the feature based on local sequence
properties. For example, a predicted 5' splice site with score > 100 is strong; 50-100 is moderate; 0-50 is weak; and
below 0 is poor (more than likely not a real donor site).
  • asked a question related to Bioinformatics and Computational Biology
Question
47 answers
Any suggestions on which software to use and I would like to know if I can use aligned gene sequences in FASTA format and then concatenate or first concatenate all the genes and then align for different species and use for phylogeny.
Relevant answer
Answer
Use this code in R to concatenate genes
setwd("D:\\project\\snake\\sequences")
library(ape)
ex1 <- read.dna("Cytb.fasta", format="fasta")
ex2 <- read.dna("c-mos.fasta", format="fasta")
ex3 <- read.dna("16s.fasta", format="fasta")
output <- cbind(ex1, ex2, ex3, fill.with.gaps=TRUE)
write.dna(output, "Cytb_c-mos_16s.txt", format="fasta")
  • asked a question related to Bioinformatics and Computational Biology
Question
2 answers
Hi!
I've been using the refine.bio website to download normalized transcriptome data; each downloaded dataset consists in a compressed directory with an expression matrix in .tsv format, its metadata in .tsv format too and an aggregated metadata file in .json format.
I'm trying to associate the expression matrix with its metadata using R programming language, but I don't know how to do it, and I don't find the way in the site's documentation. I only know that I need reed these files with these commands:
> library(rjson)
>
> expression_df <- read.delim('SRP068114/SRP068114.tsv', header = TRUE,
> row.names = 1, stringsAsFactors = FALSE)
> metadata_list <- fromJSON(file = 'aggregated_metadata.json')
but I have no idea how to merge them for generating a full-informative matrix.
Can someone help me, please?
Thank you so much.
Relevant answer
Answer
First you would need to flatten your json file :
library(jsonlite)
metadata <- fromJSON("File.json", flatten = TRUE)
View(metadatadata)
#After that you will read your expression table :
expression <-read.csv(" SRP068114.csv",header = TRUE)
Mergedataset <-merge(expression, metadata[, c("ColumnName")])
head(Mergedataset)
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
To be used in Genome MuSIC.
Relevant answer
Answer
  • asked a question related to Bioinformatics and Computational Biology
Question
10 answers
Does anybody know about published high-throughput mRNA expression data, with microRNA over-expression or knockdown vs. control experiments in HEK293 cells? It can be using any technology like micro arrays, RNA-seq, CAGE or other. Could you point to the paper and/or GEO accession?
Relevant answer
Answer
I think this website can help you.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I need a free web based or windows software based bioinformatics tool that can design an antibody against given antigen sequence. I have found Abie Pro 3.0 but i could not understood how to use it. 
Relevant answer
Answer
Dear Mustafa,
However, it's limited to a few model organisms.
Don't forget to cite.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I am looking for databases that contain microRNA-drug interactions. Any suggestions or recommendations?
Relevant answer
Answer
Dear Ali Akbar Jamali ,
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
Where can we get best web-course beginning from basic Python programming and using Kernels till analyzing data with Machine learning algorithms.
I think most of the Biologist don't need much to know on how to write programs or design Kernels but should be able to write essential codes to analyze data through already available programs.
I found some online courses, one is: pythonforbiologists.com that appears instantly when searched with the concerned keywords, but was not able to get detailed reviews.
Another is Python for Genomic Data Science by Coursera, the course content looks good, but the reviews say that it lacks later materials (Weeks 3 and 4) in uses/applications that makes virtually impossible to finish the course.
Suggestions appreciated for recommending efficient web-course on Python for Biologist...
Relevant answer
Answer
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi all,
I am having some difficulties in understanding the functioning of JPRED4 for Multiple sequence alignment. I have aligned 300 amino acid sequences from a protein family and was able to run JPRED4 with this alignment. The secondary structure prediction is shown for the first amino acid sequence in FASTA format.
I will appreciate help with following questions:
A. Has prediction algorithm used all the sequences in multiple alignment ?
B. The secondary structure prediction is shown for the first amino acid sequence. Is this prediction valid only for the first amino acid sequence ?
C. What should I do if I am interested in an amino acid sequence different from the first amino acid sequence in FASTA file ? Can I just keep it at top ?
D. I am interested to obtain secondary structure prediction which reflects all the sequences in multiple sequence alignment. How can I use JPRED4 to obtain this result ?
Thank you,
Rohit
Relevant answer
Answer
Thanks for your answers. Inwilkbtryvout CLUSTAL W.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
Dear community members,
We have available WGBS data for a species. We used Bismark to map the reads to the reference genome. Unmethylated lambda phage DNA was spiked-in, to estimate the bisulfite conversion error rate. My question is, how can I use the information of this error rate to distinguish truly methylated reads from false positives?
I have seen that many people use a binomial distribution to achieve this, and apply a Benjamin-Hochberg correction citing their 1995 paper (https://www.jstor.org/stable/2346101?seq=1#page_scan_tab_contents).
Is there a tool or an easy way to do this? Excuse my ignorance but I am really new to the field.
Thank you for your time,
Panos
Relevant answer
Answer
DNA Methylation Data Analysis Workshop
How to use bisulfite-treated sequencing to study DNA methylation
When?
12-15 November 2019
Where?
Berlin, Germany
Link?
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
Help appreciated!! We are having Empirical data sets of plant barcodes (matK, ITS and rbcL), and want to compare analytical methodology used for Empirical data sets with the Synthetic data sets.
How can we generate such kind of Synthetic datasets? I have found relevant papers but was not able to understand methodology. I would appreciate if any one could provide detailed procedure or provide any tutorial.
REFERENCE: Supervised DNA barcodes paper:
From paper....
Synthetic data:
Real DNA Barcode datasets are simulated with Coalescent package in Mesquite version 2.75 (see the related work [8]). The data are simulated considering time of species divergence and the effective population size (Ne), i.e., the number of individuals in a population (of a species) that are contributing genes to the succeeding generations.
Another similar paper...
Relevant answer
Answer
Hey,
may I ask for what purpose you need this synthetic dataset and how shall it look like and what did you do with your "real" dataset (methodically)?
Regards,
André
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
Hello,
After reading the infamous publication "More Bang for your Bucks", I developed a workstation of my own using Xeon CPU (>3GHz, E5 series) along with one RTX and one GTX GPUs. In case of single runs, for approx 60k atom systems, I am getting 140-150ns/day.
Problem starts when I'm trying to run two simulations in parallel without overscribing (16 threads). I am even not going beyond 8 threads.
For single run, PME/PP ratio is around 1.04-1.05 and load imbalance is around 2-3% with DLB on. Fourier spacing kept at 0.10 and cut offs at 1.0 nm.
Is there any specific reason for this? Is there any way to solve this?
Relevant answer
Answer
Souparno Adhikary Optimal performance typically requires pinning mdrun lines to specific CPU cores and GPUs. You can think of it a sort-of "divide and conquer" approach. I'd recommend running your simulations in parallel on separate GPUs because crosstalk between them is likely slow.
I'm guessing here but I think you could run at least 4 simulations (2 per GPU) with that setup before you start losing performance. Here's an example part of an old script I use to run 4 simulations on a Broadwell node with 32 cores and 2 GPUs (0,1).
##########################################################
gmx mdrun -deffnm production -ntmpi 2 -ntomp 4 -gputasks 00 -pin on -pinoffset 0 -nb gpu -nstlist 120 -pme gpu -npme 1 &
gmx mdrun -deffnm production -ntmpi 2 -ntomp 4 -gputasks 00 -pin on -pinoffset 8 -nb gpu -nstlist 120 -pme gpu -npme 1 &
gmx mdrun -deffnm production -ntmpi 2 -ntomp 4 -gputasks 11 -pin on -pinoffset 16 -nb gpu -nstlist 120 -pme gpu -npme 1 &
gmx mdrun -deffnm production -ntmpi 2 -ntomp 4 -gputasks 11 -pin on -pinoffset 24 -nb gpu -nstlist 120 -pme gpu -npme 1 &
wait
#########################################################
I think the standard flags/options here are: "-pin on", -pme gpu, "-npme 1", and a -nstlist at/over 100. I'd start by benchmarking short runs with different combinations of the -ntmpi, -ntomp, -pinoffset, and -gputasks flags.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I need to have this program instaled in my own pc. I tried other analyse programs but I didn't like. I want cellquest from BD.
Relevant answer
Answer
Same program I am need too.Did you find the solution.? If please advice me
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
hey guys send me few good bioinoformatics ebooks
Relevant answer
Answer
Bioinformatics for Dummies
Book by Cedric Notredame and Jean-Michel Claverie
  • asked a question related to Bioinformatics and Computational Biology
Question
9 answers
I'm looking for a book for microarray data analysis. I'm a mathematician and I'm interested to find a book able to give a framework for microarray data analysis (from the beginning to the end-backgroung correction, normalization, dim. reduction, clustering, etc...). I found this: http://www.springer.com/gp/book/9781402072604
There are some more appropriated ?
Thanks
Relevant answer
Answer
thanks
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Hi! everybody,
I am frustrated about these
compressed .vcf files that contains
a compiled list of variants related to genes.
My question is: how can I get access to the content of those files?
It is explained that they are compressed text files. So, I understand
that once decompressed, I can open them with any text editor software?
Or do I need VCFtools software to open them?
Thank you for your help!
bernard
Relevant answer
  • asked a question related to Bioinformatics and Computational Biology
Question
24 answers
currently using fastq-dump of sratool kit, but it is taking long time. I have to download really large data of bacterial genomes, any alternative ???
Relevant answer
Answer
It's a very old question, but this can save a lot of time and also it works pretty well
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
We don't have any Reviewed sequence in Uniprot and no tertiary structure available for any of the protein of this Virus. How we can work on protein level?
Relevant answer
Answer
I suggest reading this recent article:
Hope it helps.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
I am using RING web server (Residue Interaction Network Generator http://protein.bio.unipd.it/ring/ ) to generate residue interaction network from a pdb file. I am using the complex interface of RING web server for generation of network of a protein based on alpha carbon distance. The cutoff alpha carbon distance I am using is 8.0. All the other parameters are set to default values.
The problem is that the .sif file generated always has two amino acid missing. I tried out with 3/4 proteins but every time the generated .sif file has two residues missing. Why is it so? How can I solve this problem?
Relevant answer
Answer
use bio3d plugin in R , you will get network graph of all residues.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I am currently trying to find homologues of a protein I am working with, but BLAST has been giving me nothing useable. I have now found a dataset of 1500 protein sequences of potential candidates that I want to align to my reference sequence. I have tried Clustal, Mega, Muscle, MAFFT and pretty much everything under the sun, but with this many sequences and only limited experience, I am having trouble achieving what I want to do, as the programs simply crash or lock up after a few minutes..
Instead of the traditional multiple sequence alignment, where every sequence gets aligned to every other sequence with multiple iterations, I want all of the sequences from the dataset to only be aligned to my one reference sequence. Think of it as doing 1500 pairwise alignments only. What would be the best way to perform this kind of alignment?
Relevant answer
Do pwa in jalview and retrieve the first n-1 alignments in the result file.
I think it is easy as you work with GUI.
  • asked a question related to Bioinformatics and Computational Biology
Question
20 answers
I have to calculate the molecular descriptor using freeware. Which software is best for calculating the molecular descriptor? Or does anyone have a license for any software(paid)? Please help me out.
Relevant answer
Answer
For those who have some basic experience of Python or are willing to acquire basic Python notions:
RDkit is a quick and free way to get a bunch of simple descriptors, which range from 1D to 3D. Also, note that if your molecular names are not completely niche, you can easily convert them into SMILES using the cirpy library. I had a good experience with this library: from a bunch of about 4000 molecules, it could convert name to smiles automatically for about 3800 of them and I had just 200 to convert manually. Once you have SMILES of your molecules, descriptors can be easily calculated with RDkit.
Check the list of descriptors, 3D descriptors and fingerprints here:
Note that fragment descriptors, for group contributions, can also be computed once suitably defined.
I recommend using the last version of Python as installed with Anaconda. This way, you will have the least possible hassle with updating libraries. Personally I simply use notepad++ as the text editor and I execute my script using the windows command line. This way, the entire workflow is free.
As an example, here is a simple script I've done that converts a bunch of names (1 by line) given in names.txt file into their SMILES and prints them on the command line output.
#!/usr/bin/python
# -*- coding: Latin-1 -*-
# ----- Get SMILES from list of names -----
#
#
# (c) 2019, University of Nanjing
#
# A: Theophile Gaudin
#
import cirpy
with open("names.txt") as file:
data = file.readlines()
for molecule in data:
try:
name = molecule.rstrip("\n\r")
SMILES = cirpy.resolve(name, 'smiles')
formula = cirpy.resolve(SMILES, 'formula')
Mw = cirpy.resolve(SMILES, 'Mw')
print(name, ";", SMILES, ";", formula, ";", Mw)
except:
print("none")
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
Dear altruist,
I am working on Bioinformatics and Computational Biology. I am using Cytoscape tool for network visualization or analysis. Now I am working on lung cancer-related genes. In this work, I want to add some clustering methods to analyze for a better outcome. Please suggest me which one is better than any other plugin?
I already use the clusterOne plugin but want something more such like as MCL, k-medoid clustering.
Thanks for advance.
Relevant answer
Answer
Hi
you can find here a list of plugins that can be used for clustering on Cytoscape.
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
The Genetic Algorithm can be modified in different ways. Even Goldberg's book, Genetic Algorithms in Search, Optimization and Machine Learning, specifies various enhancements that could be done for genetic search.
Which is the exact method implemented in Weka for genetic search for attribute selection? The software does refer Goldberg's book, but I'm still not sure about its deeper specifics.
Some of which are
  • Precise fitness evaluation function
  • If fitness modification to add weight to diversity is implemented
  • Structure encoding of the chromosome
  • Type/variant of crossover implemented: single point, multi-point, and which point to be exact.
Relevant answer
Answer
The GA in WEKA is from Goldberg (1989)
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I am very new to RDKit and it is extremely confusing. I have a set of ligands. I have taken their SDF files as well. Now I need to find the common core structure among all the ligands. So I am thinking of using RDKit for that. If you could send me a link (other than https://www.rdkit.org/docs/GettingStartedInPython.html), or help me with it, it would be helpful. Thanks and good day!
  • asked a question related to Bioinformatics and Computational Biology
Question
4 answers
I have two signalling networks that I want to merge with "retaining network layout". I simply merged the networks with network merging utility in cytoscape however, doing this I lost the manually adjusted layout of both the networks.
Relevant answer
Answer
If I were you I would not use cytoscape, I used it before and I don't really like it that much. Try using STRING or NetworkAnalyst (personal favourites), but if you have to use cytoscape then integrate the platforms of different softwares then you won't have to deal with any technical difficulties.
  • asked a question related to Bioinformatics and Computational Biology
Question
6 answers
The search for shortest unique substrings (shustrings) of the genome is an important problem, for in some sense it is the mix of these shustrings that defines the phenotype of an organism. A related question concerns the set of shustrings that *is not* found within any genome, for the presence of these may correspond to specific detrimental consequence in organisms. So, my question addresses both the issue of any prior review (there has been) and any discussion of justification for the observance of non-representation of these shustrings within organisms. The shorter the length of such sequences, the more important would be any corresponding justification, I should think.
Relevant answer
Answer
Dr. Konopka:
Ultimately, nullomers are short sequences of nucleotides that are not represented within a DNA molecule. For a long sequence drawn on a small alphabet, such as DNA, there are many, many short sub-sequences that are not represented; I will forward to you in a subsequent message an analysis of the shustrings and nullomers of the RAND million random digits.
I am especially interested in particularly short nullomers; why are they not conserved in the genetic record. Please consider this graph, derived from shustring analysis of the Vibreo cholera genome.
wrb
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
I want to analyze ligand-protein interaction (3D) with bioinformatics tools, please give recommendation about reliable software or web service.
Thanks.
Relevant answer
Answer
1. Free academic molecular docking software:
AutoDoc (khttp://autodock.scripps.edu/)
2. Commercial Software:
If you need technical support, and I know that Creative Biostructure may have related services. I have worked with them before, I feel good, you can try.
Hope it helps!
  • asked a question related to Bioinformatics and Computational Biology
Question
5 answers
Until now, there has been many programs developed for the specific task of peptide spectra matching. A few commonly used ones in proteomics studies are : MSGFPlus, Sequest, Mascot, Andromeda, X!Tandem, PEAKS DB, Myrimatch, pFind, Comet and OMSSA.
Personally, I uses MSGFPlus most often. Its performance is also recognized by many users. Another reason I like, it is a Java program and easy to integrate to different bioinformatic pipelines.
MS-GF+ makes progress towards a universal database search tool for proteomics. Nature Communications (2014)
For label-free proteomics, it is more common to use Andromeda+MaxQuant combination. It is distributed as a Windows program with nice and friendly interface to work with.
Unfortunately, as a matter of fact, only 35-40% of peptide spectra in a bottom-up proteomics experiment (data dependent acquisition) are identified. Recently, a few programs using deep learning, neural networks are developed. I listed a few here. These listed method claimed a significant improvement over previous methods.
Is there anyone who has tested these tools and would like share their experiences? Discussions are very much welcome here.
DeepNovo
Protein identification with deep learning: from abc to xyz. arXiv:1710.02765, 2017.
De novo peptide sequencing by deep learning. Proceedings of the National Academy of Sciences, 2017.
DeepNovo-DIA
Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nature Methods volume 16, pages 63–66 (2019)
Dynamic Bayesian network (DBN)
Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry. Uncertain Artif Intell. 2014; 30: 320–329.
Kaiko
Proteomics of natural bacterial isolates powered by deep learning-based de novo identification. bioRxiv September 27, 2018.
Peptide-Spectra Matching from Weak Supervision. arXiv 22 Aug 2018
DeepIso: A Deep Learning Model for Peptide Feature Detection. 9 Dec 2017
Relevant answer
Answer
A recently published new search engine IdentiPy by Levitsky et al JPR 2018, compared IdentiPy with several above mentioned methods including Comet, MS-GF+, Maxquant, Morpheus and X!Tandem.
In three selected datasets, the two best search engine (yielding most PSMs at 1% FDR) among compared methods are IdentiPy and MS-GF+. In two out of three datasets, IdentiPy identified slightly more PSMs than MS-GF+, in the third dataset, MS-GF+ identified the most number of PSMs.
  • asked a question related to Bioinformatics and Computational Biology
Question
18 answers
where can i find good molecular dynamics simulations papers without experimental (wet lab) counterparts. This is for computational biologist without a wet lab set up. Given that most journals require computational works supported or validated by experimental work. Wonder if there are journal that consider pure computational works without experimental work. Particularly interested in areas of protein-ligand, ion channel, membrane simulation etc.
Relevant answer
Answer
I don't think it necessarily has to do with the journal. Good MD research gets published when it either:
A) is corroborated by wet lab, experimental data (like Stéphane Abel said), and/or
B) uses best practices for MD simulations to quantify uncertainty and sampling quality of your system
It seems like you want option B.
I have found Daniel Zuckerman's blog < http://statisticalbiophysicsblog.org/ > very helpful in understanding issues relevant to MD simulations.
I think the Best Practices checklists laid out in the papers from LiveCoMS (Living Journal of Computational Molecular Science) are a good place to start a MD-only research project. Here are a few:
Best Practices for Foundations in Molecular Simulations: https://bit.ly/2HPkcmv
Best Practices for Quantification of Uncertainty & Sampling Quality in Molecular Simulations: https://bit.ly/2H2Vj5a
Simulation Best Practices for Lipid Membranes: https://bit.ly/2J5isGj
  • asked a question related to Bioinformatics and Computational Biology
Question
1 answer
I’ve used the same normalization and chip to search a gene expression in R2 Genomics Analysis and Visualization Plataform, but I have compared very different cells (like lung cancer and melanoma). So my doubt is: is it correct/safe to say that a gene is more expressed in lung cancer, for example, than in melanoma (p<0,05) based on comparations made in R2 Genomics?
Ps: my intention is to prove this experimentally, and the results obtainned on R2 would serve as a guide for me.
thank you!
Relevant answer
Answer
What is the normalization based on?
I think differences in gene expression between tissues are tricky on two levels.
In the first place, there is the normalization issue. In my experience, if you test for differential gene expression between tissues, a large proportion of the transcriptome comes out DEG, and it all depends on the normalizations used. So for a given gene: do you consider its expression relative to the rest of the transcriptome? Dangerous, because highly expressed genes in the background may "dilute" your transcript of interest, even if stoichiometrically, it is "upregulated" relative to the genes it interacts with. Or do you normalize relative to housekeeping genes? It is known that housekeeping genes exhibit tissue-specific expression. Do you consider it relative to the genes it interacts with? Interesting, but requires a lot of prior knowledge...
Secondly, there is also the question of what it actually means to be differentially expressed between tissues. Transcripts (and/or their corresponding products) hardly ever function on their own. They interact with the rest of the transcriptome/proteome, causing feedback to gene expression etc. These "backgrounds" are drastically different between cell types. Also, the same amount of mRNA may lead to different amounts of protein depending on the status of the cell (presence/activation of ribosomes, tRNA,...). As a result, the impact of a change in the number of copies of a transcript may be enormous in one tissue, versus negligible in the other.
These are actually considerations that to a large extent hold true even when comparing gene expression between conditions in the same tissue, but obviously the more different your samples are to start with, the more exaggerated the effects will be. So, my two cents is: be careful, it's a dangerous comparison. I would rather try healthy lung vs lung cancer and healthy skin vs melanoma.
  • asked a question related to Bioinformatics and Computational Biology
Question
3 answers
I'd like to share tips and tricks about this useful software.
Relevant answer
Answer
I use Tableau more of for visualization
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
I have 1000 pdb files. I want add polar hydrogen in the files. Could someone suggest any command line program to add polar hydrogens to pdb files?
Relevant answer
Answer
If you have Python (2.7) and Chimera installed you can do the steps below.
1) Save a text file of the list of .pdb files you wish to edit, as pdblist.txt
2) Save the code below as a Python code as addph.py ;
import sys, os if __name__ == "__main__": with open(sys.argv[1]) as file: plist = file.readlines() plist = [i.strip() for i in plist] script = "" for p in plist: script += "open %s\naddh\ndelete HC\nwrite #0 %s\nclose #0\n" % (p, p) with open("script.com",'w') as file: file.write(script)
3) Make sure the 2 files generated above, and the .pdb files to be edited are in the directory where you will run the commands.
4) In command line, type the command below, which should generate a script.com file;
pathto/python addph.py pdblist.txt
5) Now, again in command line, type;
pathto/chimera --nogui script.com
This will use Chimera to edit each pdb file step by step. Wait until all the files are edited.
  • asked a question related to Bioinformatics and Computational Biology
Question
8 answers
1ns/hr for a 60k atom system unrestrained... GROMACS 2019... 1.0nm cutoff and 0.14 fft spacing...
160ns approx in 9 days... Count the power outage (totalling almost 9-10hrs last week), 3hrs rest per day...
Bad??? I don't think so... Any opinion???
Relevant answer
Answer
I find post MD 'analysis' more time consuming that pushes your limits to know more and more about your system in order to come out with something meaningful.
  • asked a question related to Bioinformatics and Computational Biology
Question
7 answers
As the wall time is up, the production run stopped. now i have prd.cpt & prd_prev.cpt. so what is the difference between them and which one should be used in restarting simulation.Also in tutorial topol.tpr is a topology file or a binary file produce by grompp.
Several .tpr file have produced during simulation like "#prd.tpr.67# ", "#prd.tpr.64#" and "prd.tpr ". what is difference between them ? which one should be used to restart the simulation?
Thanks.
Relevant answer
Answer
For extend a MD 1ns:
- gmx convert-tpr -s md.tpr -extend 1000 -o prod2.tpr
- gmx mdrun -s prod2.tpr -deffnm md2 -cpi md_prev.cpt -append
In this case, you need another .tpr beacause the previous one finished the time specified.
If you wanna to re-start an MD use:
- gmx mdrun -s md.tpr -cpi md_prev.cpt -append -deffnm md2
This is when for a reason you stop your MD and you want to complete it.