Search Toolkit Book for CFormatGuess
Class implements different ad-hoc unreliable file format identifications. More...
#include <util/format_guess.hpp>
Class implements different ad-hoc unreliable file format identifications.
Definition at line 50 of file format_guess.hpp.
◆ EFormatThe formats are checked in the same order as declared here.
Enumerator eUnknownunknown format
eBinaryASNBinary ASN.1.
eRmoRepeatMasker Output.
eGtf_POISENEDOld and Dead GFF/GTF style annotations.
eGlimmer3Glimmer3 predictions.
eAgpAGP format assembly, AgpRead.
eXmlXML.
eWiggleUCSC WIGGLE file format.
eBedUCSC BED file format, CBedReader.
eBed15UCSC BED15 or microarray format.
eNewickNewick file.
eAlignmentText alignment.
eDistanceMatrixDistance matrix file.
eFlatFileSequenceGenBank/GenPept/DDBJ/EMBL flat-file sequence portion.
eFiveColFeatureTableFive-column feature table.
eSnpMarkersSNP Marker flat file.
eFastaFASTA format sequence record, CFastaReader.
eTextASNText ASN.1.
eTaxplotTaxplot file.
ePhrapAcePhrap ACE assembly file.
eTableGeneric table.
eGtfNew GTF, CGtfReader.
eGff3GFF3, CGff3Reader.
eGff2GFF2, CGff2Reader, any GFF-like that doesn't fit the others.
eHgvsHGVS, CHgvsParser.
eGvfGVF, CGvfReader.
eZipzip compressed file
eGZipGNU zip compressed file.
eBZip2bzip2 compressed file
eLzolzo compressed file
eSraINSDC Sequence Read Archive file.
eBamBinary alignment/map file.
eVcfVCF, CVcfReader.
eUCSCRegionUSCS Region file format.
eGffAugustusGFFish output of Augustus Gene Prediction.
eJSONJSON.
ePslPSL alignment format.
eAltGraphX eBed5FloatScore eBedGraph eBedRnaElements eBigBarChart eBigBed eBigPsl eBigChain eBigMaf eBigWig eBroadPeak eChain eClonePos eColoredExon eCtgPos eDownloadsOnly eEncodeFiveC eExpRatio eFactorSource eGenePred eLd2 eNarrowPeak eNetAlign ePeptideMapping eRmsk eSnake eVcfTabix eWigMaf eFlatFileGenbank eFlatFileEna eFlatFileUniProt eZstdZstandard (zstd) compressed data.
eFormat_maxMax value of EFormat.
Definition at line 54 of file format_guess.hpp.
◆ EMode ◆ EOnError Enumerator eDefaultReturn eUnknown.
eThrowOnBadSourceThrow an exception if the data source (stream, file) can't be read.
Definition at line 161 of file format_guess.hpp.
◆ ESequenceType ◆ ESTStrictness Enumerator eST_LaxImplement historic behavior, risking false positives.
eST_DefaultBe relatively strict, but still allow for typos.
eST_StrictRequire 100% encodability of printable non-digits.
Definition at line 155 of file format_guess.hpp.
◆ CFormatGuess() [1/3] CFormatGuess::CFormatGuess ( ) ◆ CFormatGuess() [2/3] ◆ CFormatGuess() [3/3] ◆ ~CFormatGuess() CFormatGuess::~CFormatGuess ( ) ◆ EnsureSplitLines() bool CFormatGuess::EnsureSplitLines ( ) protectedDefinition at line 3693 of file format_guess.cpp.
References data, NStr::fSplit_Tokenize, i, m_bSplitDone, m_iTestBufferSize, m_iTestDataSize, m_pTestBuffer, m_TestLines, and NStr::Split().
Referenced by IsAllComment(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBed(), TestFormatBed15(), TestFormatDistanceMatrix(), TestFormatFiveColFeatureTable(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileSequence(), TestFormatFlatFileUniProt(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatHgvs(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatRepeatMasker(), TestFormatSnpMarkers(), TestFormatTable(), TestFormatVcf(), and TestFormatWiggle().
◆ EnsureStats() bool CFormatGuess::EnsureStats ( ) protectedDefinition at line 688 of file format_guess.cpp.
References EnsureTestBuffer(), fAlpha, fDigit, fDNA_Main_Alphabet, fProtein_Alphabet, fSpace, i, init_symbol_type_table(), m_bStatsAreValid, m_iStatsCountAaChars, m_iStatsCountAlNumChars, m_iStatsCountBraces, m_iStatsCountData, m_iStatsCountDnaChars, m_iTestDataSize, m_pTestBuffer, NcbiGetline(), ncbi::grid::netcache::search::fields::size, and symbol_type_table.
Referenced by TestFormatBed(), TestFormatBed15(), TestFormatFasta(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileUniProt(), TestFormatHgvs(), TestFormatRepeatMasker(), TestFormatTextAsn(), TestFormatVcf(), and TestFormatWiggle().
◆ EnsureTestBuffer() bool CFormatGuess::EnsureTestBuffer ( ) protectedDefinition at line 626 of file format_guess.cpp.
References IsAllComment(), m_iTestBufferSize, m_iTestDataSize, m_pTestBuffer, m_Stream, NULL, and CStreamUtils::Stepback().
Referenced by EnsureStats(), GuessFormat(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatCLUSTAL(), TestFormatDistanceMatrix(), TestFormatFiveColFeatureTable(), TestFormatFlatFileSequence(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatGZip(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatSnpMarkers(), TestFormatSra(), TestFormatTable(), TestFormatXml(), TestFormatZip(), and TestFormatZstd().
◆ Format() [1/2] ◆ Format() [2/2] ◆ GetFormatHints() ◆ GetFormatName()Definition at line 290 of file format_guess.cpp.
References format, NStr::IntToString(), and NCBI_THROW.
Referenced by CMultiReader::LoadGFF3Fasta(), CMultiReader::LoadIndexedAnnot(), ReadProject(), CFormatGuessApp::Run(), CMakeBlastDBApp::x_AddSequenceData(), CFileLoadWizard::x_CheckFormatConflict(), CFileLoadManager::x_CheckFormatConflict(), CAgpconvertApplication::x_LoadTemplate(), CGapStatsApplication::x_ReadFileOrAccn(), xCreateASNStream(), and CMultiReader::xCreateASNStream().
◆ GuessFormat() [1/2]Definition at line 446 of file format_guess.cpp.
References eDefault.
Referenced by CCompressedFile::CCompressedFile(), Format(), CMacroFunction_UpdateProteinSeqs::TheFunction(), CFileLoadWizard::x_CheckFormatConflict(), CAlignFilter::x_GetRegionMap(), CCompressedFile::x_GuessFormat(), CCompressedFile::x_GuessFormatNetwork(), CGapStatsApplication::x_ReadFileOrAccn(), CUpdateMultipleSeq_Input::x_ReadFromStream(), CUpdateSeq_Input::x_ReadFromStream(), and CMultiReader::xAnnotGetFormat().
◆ GuessFormat() [2/2]Definition at line 453 of file format_guess.cpp.
References eNewick, EnsureTestBuffer(), eQuick, eUnknown, f, CFormatGuess::CFormatHints::IsDisabled(), CFormatGuess::CFormatHints::IsEmpty(), CFormatGuess::CFormatHints::IsPreferred(), m_Hints, m_Stream, sm_CheckOrder, sm_CheckOrder_Size, TestFormatNewick(), x_TestFormat(), and x_TestInput().
◆ Initialize() void CFormatGuess::Initialize ( void ) protected ◆ IsAllComment() bool CFormatGuess::IsAllComment ( ) protected ◆ IsAsciiText() bool CFormatGuess::IsAsciiText ( ) protected ◆ IsAsnComment() ◆ IsEnabled() ◆ IsInputRepeatMaskerWithHeader() bool CFormatGuess::IsInputRepeatMaskerWithHeader ( ) protected ◆ IsInputRepeatMaskerWithoutHeader() bool CFormatGuess::IsInputRepeatMaskerWithoutHeader ( ) protected ◆ IsLabelNewick() ◆ IsLineAgp() ◆ IsLineAugustus() ◆ IsLineFlatFileSequence() ◆ IsLineGff2() ◆ IsLineGff3() ◆ IsLineGlimmer3() ◆ IsLineGtf() ◆ IsLineGvf() ◆ IsLineHgvs() ◆ IsLinePhrapId() ◆ IsLinePsl() ◆ IsLineRmo() ◆ IsSampleNewick() ◆ IsSupportedFormat() bool CFormatGuess::IsSupportedFormat ( EFormat format ) static ◆ SequenceType()Guess sequence type.
Function calculates sequence alphabet and identifies if the source belongs to nucleotide or protein sequence
Definition at line 308 of file format_guess.cpp.
References eNucleotide, eProtein, eST_Default, eST_Lax, eST_Strict, eUndefined, fAlpha, fDigit, fDNA_Ambig_Alphabet, fDNA_Main_Alphabet, fProtein_Alphabet, fSpace, i, init_symbol_type_table(), str(), and symbol_type_table.
Referenced by CFastaReader::AssignMolType(), CPsiBlastValidate::QueryFactory(), CAlnReader::x_GetSequenceMolType(), and x_TryProcessCLUSTALSeqData().
◆ TestFormat() [1/2] ◆ TestFormat() [2/2] ◆ TestFormatAgp() ◆ TestFormatAlignment() bool CFormatGuess::TestFormatAlignment ( EMode ) protected ◆ TestFormatAugustus() bool CFormatGuess::TestFormatAugustus ( EMode ) protected ◆ TestFormatBam() bool CFormatGuess::TestFormatBam ( EMode mode ) protected ◆ TestFormatBed()Definition at line 1748 of file format_guess.cpp.
References columns, EnsureSplitLines(), EnsureStats(), NStr::fSplit_Tokenize, ITERATE, m_TestLines, s_IsTokenPosInt(), NStr::Split(), NStr::StartsWith(), str(), and NStr::TruncateSpaces().
Referenced by x_TestFormat().
◆ TestFormatBed15() ◆ TestFormatBinaryAsn() bool CFormatGuess::TestFormatBinaryAsn ( EMode ) protected ◆ TestFormatBZip2() ◆ TestFormatCLUSTAL() bool CFormatGuess::TestFormatCLUSTAL ( void ) protected ◆ TestFormatDistanceMatrix() bool CFormatGuess::TestFormatDistanceMatrix ( EMode ) protected ◆ TestFormatFasta() ◆ TestFormatFiveColFeatureTable() bool CFormatGuess::TestFormatFiveColFeatureTable ( EMode ) protected ◆ TestFormatFlatFileEna() bool CFormatGuess::TestFormatFlatFileEna ( EMode ) protected ◆ TestFormatFlatFileGenbank() bool CFormatGuess::TestFormatFlatFileGenbank ( EMode ) protected ◆ TestFormatFlatFileSequence() bool CFormatGuess::TestFormatFlatFileSequence ( EMode ) protected ◆ TestFormatFlatFileUniProt() bool CFormatGuess::TestFormatFlatFileUniProt ( EMode ) protected ◆ TestFormatGff2() ◆ TestFormatGff3() ◆ TestFormatGlimmer3() bool CFormatGuess::TestFormatGlimmer3 ( EMode ) protected ◆ TestFormatGtf() ◆ TestFormatGvf() ◆ TestFormatGZip() ◆ TestFormatHgvs() ◆ TestFormatJson()Definition at line 2809 of file format_guess.cpp.
References NStr::eTrunc_Begin, NStr::IsBlank(), m_iTestDataSize, m_pTestBuffer, NStr::TruncateSpacesInPlace(), x_CheckJsonStart(), x_CheckStripJsonNumbers(), x_CheckStripJsonPunctuation(), x_IsTruncatedJsonKeyword(), x_IsTruncatedJsonNumber(), x_StripJsonKeywords(), and x_StripJsonStrings().
Referenced by x_TestFormat().
◆ TestFormatLzo() ◆ TestFormatNewick() bool CFormatGuess::TestFormatNewick ( EMode ) protectedDefinition at line 1062 of file format_guess.cpp.
References EnsureSplitLines(), EnsureTestBuffer(), NStr::FindNoCase(), i, IsSampleNewick(), ITERATE, m_iTestDataSize, m_pTestBuffer, m_Stream, m_TestLines, NPOS, read_size(), and CStreamUtils::Stepback().
Referenced by GuessFormat(), and x_TestFormat().
◆ TestFormatPhrapAce() bool CFormatGuess::TestFormatPhrapAce ( EMode ) protected ◆ TestFormatPsl() bool CFormatGuess::TestFormatPsl ( EMode mode ) protected ◆ TestFormatRepeatMasker() bool CFormatGuess::TestFormatRepeatMasker ( EMode ) protected ◆ TestFormatSnpMarkers() bool CFormatGuess::TestFormatSnpMarkers ( EMode ) protected ◆ TestFormatSra() ◆ TestFormatTable() ◆ TestFormatTaxplot() bool CFormatGuess::TestFormatTaxplot ( EMode ) protected ◆ TestFormatTextAsn() bool CFormatGuess::TestFormatTextAsn ( EMode ) protected ◆ TestFormatVcf() ◆ TestFormatWiggle() bool CFormatGuess::TestFormatWiggle ( EMode ) protected ◆ TestFormatXml()Definition at line 1278 of file format_guess.cpp.
References ArraySize(), NStr::eCase, NStr::eNocase, EnsureTestBuffer(), NStr::eTrunc_Begin, i, input(), m_iTestDataSize, m_pTestBuffer, NStr::StartsWith(), and NStr::TruncateSpacesInPlace().
Referenced by x_TestFormat().
◆ TestFormatZip() ◆ TestFormatZstd() ◆ x_CheckJsonStart() ◆ x_CheckStripJsonNumbers() bool CFormatGuess::x_CheckStripJsonNumbers ( string & testString ) const private ◆ x_CheckStripJsonPunctuation() bool CFormatGuess::x_CheckStripJsonPunctuation ( string & testString ) const private ◆ x_FindJsonStringLimits() void CFormatGuess::x_FindJsonStringLimits ( const string & testString, list< size_t > & limits ) const private ◆ x_FindNextJsonStringStop() ◆ x_IsBlankOrNumbers() ◆ x_IsNumber() ◆ x_IsTruncatedJsonKeyword() bool CFormatGuess::x_IsTruncatedJsonKeyword ( const string & testString ) const private ◆ x_IsTruncatedJsonNumber() bool CFormatGuess::x_IsTruncatedJsonNumber ( const string & testString ) const private ◆ x_LooksLikeCLUSTALConservedInfo() bool CFormatGuess::x_LooksLikeCLUSTALConservedInfo ( const string & line ) const private ◆ x_StripJsonKeywords() void CFormatGuess::x_StripJsonKeywords ( string & testString ) const private ◆ x_StripJsonPunctuation() size_t CFormatGuess::x_StripJsonPunctuation ( string & testString ) const private ◆ x_StripJsonStrings() void CFormatGuess::x_StripJsonStrings ( string & testString ) const private ◆ x_TestFormat()Definition at line 513 of file format_guess.cpp.
References eAgp, eAlignment, eBam, eBed, eBed15, eBinaryASN, eBZip2, eDistanceMatrix, eFasta, eFiveColFeatureTable, eFlatFileEna, eFlatFileGenbank, eFlatFileSequence, eFlatFileUniProt, eGff2, eGff3, eGffAugustus, eGlimmer3, eGtf, eGvf, eGZip, eHgvs, eJSON, eLzo, eNewick, ePhrapAce, ePsl, eRmo, eSnpMarkers, eSra, eTable, eTaxplot, eTextASN, eUCSCRegion, eVcf, eWiggle, eXml, eZip, eZstd, format, CFormatGuess::CFormatHints::IsDisabled(), m_Hints, NCBI_THROW, NStr::NumericToString(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBam(), TestFormatBed(), TestFormatBed15(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatDistanceMatrix(), TestFormatFasta(), TestFormatFiveColFeatureTable(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileSequence(), TestFormatFlatFileUniProt(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatGZip(), TestFormatHgvs(), TestFormatJson(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatRepeatMasker(), TestFormatSnpMarkers(), TestFormatSra(), TestFormatTable(), TestFormatTaxplot(), TestFormatTextAsn(), TestFormatVcf(), TestFormatWiggle(), TestFormatXml(), TestFormatZip(), and TestFormatZstd().
Referenced by GuessFormat(), and TestFormat().
◆ x_TestInput() ◆ x_TestTableDelimiter() ◆ x_TryProcessCLUSTALSeqData() bool CFormatGuess::x_TryProcessCLUSTALSeqData ( const string & line, string & id, size_t & seg_length ) const private ◆ m_bOwnsStream bool CFormatGuess::m_bOwnsStream protected ◆ m_bSplitDone bool CFormatGuess::m_bSplitDone protected ◆ m_bStatsAreValid bool CFormatGuess::m_bStatsAreValid protected ◆ m_Hints ◆ m_iStatsCountAaChars unsigned int CFormatGuess::m_iStatsCountAaChars protected ◆ m_iStatsCountAlNumChars unsigned int CFormatGuess::m_iStatsCountAlNumChars protected ◆ m_iStatsCountBraces unsigned int CFormatGuess::m_iStatsCountBraces protected ◆ m_iStatsCountData unsigned int CFormatGuess::m_iStatsCountData protected ◆ m_iStatsCountDnaChars unsigned int CFormatGuess::m_iStatsCountDnaChars protected ◆ m_iTestBufferSize streamsize CFormatGuess::m_iTestBufferSize protected ◆ m_iTestDataSize streamsize CFormatGuess::m_iTestDataSize protectedDefinition at line 388 of file format_guess.hpp.
Referenced by EnsureSplitLines(), EnsureStats(), EnsureTestBuffer(), IsAsciiText(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatCLUSTAL(), TestFormatFasta(), TestFormatGZip(), TestFormatHgvs(), TestFormatJson(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatSra(), TestFormatTextAsn(), TestFormatXml(), TestFormatZip(), TestFormatZstd(), and x_TestTableDelimiter().
◆ m_pTestBuffer char* CFormatGuess::m_pTestBuffer protectedDefinition at line 386 of file format_guess.hpp.
Referenced by EnsureSplitLines(), EnsureStats(), EnsureTestBuffer(), Initialize(), IsAsciiText(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatCLUSTAL(), TestFormatFasta(), TestFormatGZip(), TestFormatHgvs(), TestFormatJson(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatSra(), TestFormatTextAsn(), TestFormatXml(), TestFormatZip(), TestFormatZstd(), and ~CFormatGuess().
◆ m_Stream ◆ m_TestLinesDefinition at line 397 of file format_guess.hpp.
Referenced by EnsureSplitLines(), IsAllComment(), IsInputRepeatMaskerWithHeader(), IsInputRepeatMaskerWithoutHeader(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBed(), TestFormatBed15(), TestFormatDistanceMatrix(), TestFormatFiveColFeatureTable(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileSequence(), TestFormatFlatFileUniProt(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatHgvs(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatSnpMarkers(), TestFormatVcf(), TestFormatWiggle(), and x_TestTableDelimiter().
The documentation for this class was generated from the following files:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4