Human biology relies on a high level of protein complexity to perform cellular functions. Thousands of amino acid variations and a wide variety of posttranslational modifications (PTMs) lead to myriad proteoforms with diverse functions, each with a specific set of PTMs at specific amino acid locations along a polypeptide backbone. We developed Proteoform Suite, an interactive software solution for the analysis of intact proteoform MS data. The source code is openly available at https://github.com/smith-chem-wisc/proteoform-suite. This software identifies and quantifies proteoforms by comparing intact mass and lysine count for each observed proteoform to all theoretical proteoforms generated from known protein sequences and annotated PTMs. These comparisons reveal both exact-mass matches and mass differences characteristic of known PTMs. It can also quantify relative proteoform abundances between two conditions by calculating intensity ratios for each identified proteoform. Finally, it streamlines the visualization of proteoform abundance changes and PTM relationships in the program Cytoscape. The constellations of proteoforms can then be traced to better understand the biological changes measured in the MS experiment.
We demonstrate the capabilities of this analysis with intact proteoform MS data for no-stress and salt-stress baker's yeast. Proteoform Suite found 2712 unique proteoforms in this dataset, which after connecting acceptable mass differences formed 349 proteoform families containing 1536 proteoforms. These families comprise 88% of the 42,197 total experimental proteoform observations, with the remainder being orphans, i.e. not associated with other proteoforms or UniProt accession numbers. A total of 128 families (590 proteoforms) correspond to a known protein, in that they are associated with a single UniProt accession number; 16 families (359 proteoforms) have some ambiguity in identification, in that they were associated with two or more accession numbers; and the remaining 205 families (587 proteoforms) remain unidentified. Relative quantification of proteoforms between no-stress and salt-stress sample conditions showed 92 proteoforms had fold changes greater than 2 with p-values below 0.05 by randomization tests. GO analysis of these proteoforms revealed changes to machinery involved in translation, RNA-binding, and response to metal ions (p-values below 0.05 by randomization tests).