A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCFormatGuess.html below:

NCBI C++ ToolKit: CFormatGuess Class Reference

Search Toolkit Book for CFormatGuess

Class implements different ad-hoc unreliable file format identifications. More...

#include <util/format_guess.hpp>

enum   EFormat {
  eUnknown = 0 , eBinaryASN = 1 , eRmo = 2 , eGtf_POISENED = 3 ,
  eGlimmer3 = 4 , eAgp = 5 , eXml = 6 , eWiggle = 7 ,
  eBed = 8 , eBed15 = 9 , eNewick = 10 , eAlignment = 11 ,
  eDistanceMatrix = 12 , eFlatFileSequence = 13 , eFiveColFeatureTable = 14 , eSnpMarkers = 15 ,
  eFasta = 16 , eTextASN = 17 , eTaxplot = 18 , ePhrapAce = 19 ,
  eTable = 20 , eGtf = 21 , eGff3 = 22 , eGff2 = 23 ,
  eHgvs = 24 , eGvf = 25 , eZip = 26 , eGZip = 27 ,
  eBZip2 = 28 , eLzo = 29 , eSra = 30 , eBam = 31 ,
  eVcf = 32 , eUCSCRegion = 33 , eGffAugustus = 34 , eJSON = 35 ,
  ePsl = 36 , eAltGraphX = 37 , eBed5FloatScore = 38 , eBedGraph = 39 ,
  eBedRnaElements = 40 , eBigBarChart = 41 , eBigBed = 42 , eBigPsl = 43 ,
  eBigChain = 44 , eBigMaf = 45 , eBigWig = 46 , eBroadPeak = 47 ,
  eChain = 48 , eClonePos = 49 , eColoredExon = 50 , eCtgPos = 51 ,
  eDownloadsOnly = 52 , eEncodeFiveC = 53 , eExpRatio = 54 , eFactorSource = 55 ,
  eGenePred = 56 , eLd2 = 57 , eNarrowPeak = 58 , eNetAlign = 59 ,
  ePeptideMapping = 60 , eRmsk = 61 , eSnake = 62 , eVcfTabix = 63 ,
  eWigMaf = 64 , eFlatFileGenbank = 65 , eFlatFileEna = 66 , eFlatFileUniProt = 67 ,
  eZstd = 68 , eFormat_max
}   The formats are checked in the same order as declared here. More...
  enum   ESequenceType { eUndefined , eNucleotide , eProtein }   enum   EMode { eQuick , eThorough }   enum   ESTStrictness { eST_Lax , eST_Default , eST_Strict }   enum   EOnError { eDefault = 0 , eThrowOnBadSource }  

Class implements different ad-hoc unreliable file format identifications.

Definition at line 50 of file format_guess.hpp.

◆ EFormat

The formats are checked in the same order as declared here.

Enumerator eUnknown 

unknown format

eBinaryASN 

Binary ASN.1.

eRmo 

RepeatMasker Output.

eGtf_POISENED 

Old and Dead GFF/GTF style annotations.

eGlimmer3 

Glimmer3 predictions.

eAgp 

AGP format assembly, AgpRead.

eXml 

XML.

eWiggle 

UCSC WIGGLE file format.

eBed 

UCSC BED file format, CBedReader.

eBed15 

UCSC BED15 or microarray format.

eNewick 

Newick file.

eAlignment 

Text alignment.

eDistanceMatrix 

Distance matrix file.

eFlatFileSequence 

GenBank/GenPept/DDBJ/EMBL flat-file sequence portion.

eFiveColFeatureTable 

Five-column feature table.

eSnpMarkers 

SNP Marker flat file.

eFasta 

FASTA format sequence record, CFastaReader.

eTextASN 

Text ASN.1.

eTaxplot 

Taxplot file.

ePhrapAce 

Phrap ACE assembly file.

eTable 

Generic table.

eGtf 

New GTF, CGtfReader.

eGff3 

GFF3, CGff3Reader.

eGff2 

GFF2, CGff2Reader, any GFF-like that doesn't fit the others.

eHgvs 

HGVS, CHgvsParser.

eGvf 

GVF, CGvfReader.

eZip 

zip compressed file

eGZip 

GNU zip compressed file.

eBZip2 

bzip2 compressed file

eLzo 

lzo compressed file

eSra 

INSDC Sequence Read Archive file.

eBam 

Binary alignment/map file.

eVcf 

VCF, CVcfReader.

eUCSCRegion 

USCS Region file format.

eGffAugustus 

GFFish output of Augustus Gene Prediction.

eJSON 

JSON.

ePsl 

PSL alignment format.

eAltGraphX  eBed5FloatScore  eBedGraph  eBedRnaElements  eBigBarChart  eBigBed  eBigPsl  eBigChain  eBigMaf  eBigWig  eBroadPeak  eChain  eClonePos  eColoredExon  eCtgPos  eDownloadsOnly  eEncodeFiveC  eExpRatio  eFactorSource  eGenePred  eLd2  eNarrowPeak  eNetAlign  ePeptideMapping  eRmsk  eSnake  eVcfTabix  eWigMaf  eFlatFileGenbank  eFlatFileEna  eFlatFileUniProt  eZstd 

Zstandard (zstd) compressed data.

eFormat_max 

Max value of EFormat.

Definition at line 54 of file format_guess.hpp.

◆ EMode ◆ EOnError Enumerator eDefault 

Return eUnknown.

eThrowOnBadSource 

Throw an exception if the data source (stream, file) can't be read.

Definition at line 161 of file format_guess.hpp.

◆ ESequenceType ◆ ESTStrictness Enumerator eST_Lax 

Implement historic behavior, risking false positives.

eST_Default 

Be relatively strict, but still allow for typos.

eST_Strict 

Require 100% encodability of printable non-digits.

Definition at line 155 of file format_guess.hpp.

◆ CFormatGuess() [1/3] CFormatGuess::CFormatGuess ( ) ◆ CFormatGuess() [2/3] ◆ CFormatGuess() [3/3] ◆ ~CFormatGuess() CFormatGuess::~CFormatGuess ( ) ◆ EnsureSplitLines() bool CFormatGuess::EnsureSplitLines ( ) protected

Definition at line 3693 of file format_guess.cpp.

References data, NStr::fSplit_Tokenize, i, m_bSplitDone, m_iTestBufferSize, m_iTestDataSize, m_pTestBuffer, m_TestLines, and NStr::Split().

Referenced by IsAllComment(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBed(), TestFormatBed15(), TestFormatDistanceMatrix(), TestFormatFiveColFeatureTable(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileSequence(), TestFormatFlatFileUniProt(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatHgvs(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatRepeatMasker(), TestFormatSnpMarkers(), TestFormatTable(), TestFormatVcf(), and TestFormatWiggle().

◆ EnsureStats() bool CFormatGuess::EnsureStats ( ) protected

Definition at line 688 of file format_guess.cpp.

References EnsureTestBuffer(), fAlpha, fDigit, fDNA_Main_Alphabet, fProtein_Alphabet, fSpace, i, init_symbol_type_table(), m_bStatsAreValid, m_iStatsCountAaChars, m_iStatsCountAlNumChars, m_iStatsCountBraces, m_iStatsCountData, m_iStatsCountDnaChars, m_iTestDataSize, m_pTestBuffer, NcbiGetline(), ncbi::grid::netcache::search::fields::size, and symbol_type_table.

Referenced by TestFormatBed(), TestFormatBed15(), TestFormatFasta(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileUniProt(), TestFormatHgvs(), TestFormatRepeatMasker(), TestFormatTextAsn(), TestFormatVcf(), and TestFormatWiggle().

◆ EnsureTestBuffer() bool CFormatGuess::EnsureTestBuffer ( ) protected

Definition at line 626 of file format_guess.cpp.

References IsAllComment(), m_iTestBufferSize, m_iTestDataSize, m_pTestBuffer, m_Stream, NULL, and CStreamUtils::Stepback().

Referenced by EnsureStats(), GuessFormat(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatCLUSTAL(), TestFormatDistanceMatrix(), TestFormatFiveColFeatureTable(), TestFormatFlatFileSequence(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatGZip(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatSnpMarkers(), TestFormatSra(), TestFormatTable(), TestFormatXml(), TestFormatZip(), and TestFormatZstd().

◆ Format() [1/2] ◆ Format() [2/2] ◆ GetFormatHints() ◆ GetFormatName()

Definition at line 290 of file format_guess.cpp.

References format, NStr::IntToString(), and NCBI_THROW.

Referenced by CMultiReader::LoadGFF3Fasta(), CMultiReader::LoadIndexedAnnot(), ReadProject(), CFormatGuessApp::Run(), CMakeBlastDBApp::x_AddSequenceData(), CFileLoadWizard::x_CheckFormatConflict(), CFileLoadManager::x_CheckFormatConflict(), CAgpconvertApplication::x_LoadTemplate(), CGapStatsApplication::x_ReadFileOrAccn(), xCreateASNStream(), and CMultiReader::xCreateASNStream().

◆ GuessFormat() [1/2]

Definition at line 446 of file format_guess.cpp.

References eDefault.

Referenced by CCompressedFile::CCompressedFile(), Format(), CMacroFunction_UpdateProteinSeqs::TheFunction(), CFileLoadWizard::x_CheckFormatConflict(), CAlignFilter::x_GetRegionMap(), CCompressedFile::x_GuessFormat(), CCompressedFile::x_GuessFormatNetwork(), CGapStatsApplication::x_ReadFileOrAccn(), CUpdateMultipleSeq_Input::x_ReadFromStream(), CUpdateSeq_Input::x_ReadFromStream(), and CMultiReader::xAnnotGetFormat().

◆ GuessFormat() [2/2]
Note
If the instance of the class is built upon std::istream, then on completion this function pushes whatever data it had to read (in order to detect data format) back to the stream – using CStreamUtils::Stepback()

Definition at line 453 of file format_guess.cpp.

References eNewick, EnsureTestBuffer(), eQuick, eUnknown, f, CFormatGuess::CFormatHints::IsDisabled(), CFormatGuess::CFormatHints::IsEmpty(), CFormatGuess::CFormatHints::IsPreferred(), m_Hints, m_Stream, sm_CheckOrder, sm_CheckOrder_Size, TestFormatNewick(), x_TestFormat(), and x_TestInput().

◆ Initialize() void CFormatGuess::Initialize ( void  ) protected ◆ IsAllComment() bool CFormatGuess::IsAllComment ( ) protected ◆ IsAsciiText() bool CFormatGuess::IsAsciiText ( ) protected ◆ IsAsnComment() ◆ IsEnabled() ◆ IsInputRepeatMaskerWithHeader() bool CFormatGuess::IsInputRepeatMaskerWithHeader ( ) protected ◆ IsInputRepeatMaskerWithoutHeader() bool CFormatGuess::IsInputRepeatMaskerWithoutHeader ( ) protected ◆ IsLabelNewick() ◆ IsLineAgp() ◆ IsLineAugustus() ◆ IsLineFlatFileSequence() ◆ IsLineGff2() ◆ IsLineGff3() ◆ IsLineGlimmer3() ◆ IsLineGtf() ◆ IsLineGvf() ◆ IsLineHgvs() ◆ IsLinePhrapId() ◆ IsLinePsl() ◆ IsLineRmo() ◆ IsSampleNewick() ◆ IsSupportedFormat() bool CFormatGuess::IsSupportedFormat ( EFormat  format ) static ◆ SequenceType()

Guess sequence type.

Function calculates sequence alphabet and identifies if the source belongs to nucleotide or protein sequence

Definition at line 308 of file format_guess.cpp.

References eNucleotide, eProtein, eST_Default, eST_Lax, eST_Strict, eUndefined, fAlpha, fDigit, fDNA_Ambig_Alphabet, fDNA_Main_Alphabet, fProtein_Alphabet, fSpace, i, init_symbol_type_table(), str(), and symbol_type_table.

Referenced by CFastaReader::AssignMolType(), CPsiBlastValidate::QueryFactory(), CAlnReader::x_GetSequenceMolType(), and x_TryProcessCLUSTALSeqData().

◆ TestFormat() [1/2] ◆ TestFormat() [2/2] ◆ TestFormatAgp() ◆ TestFormatAlignment() bool CFormatGuess::TestFormatAlignment ( EMode  ) protected ◆ TestFormatAugustus() bool CFormatGuess::TestFormatAugustus ( EMode  ) protected ◆ TestFormatBam() bool CFormatGuess::TestFormatBam ( EMode  mode ) protected ◆ TestFormatBed()

Definition at line 1748 of file format_guess.cpp.

References columns, EnsureSplitLines(), EnsureStats(), NStr::fSplit_Tokenize, ITERATE, m_TestLines, s_IsTokenPosInt(), NStr::Split(), NStr::StartsWith(), str(), and NStr::TruncateSpaces().

Referenced by x_TestFormat().

◆ TestFormatBed15() ◆ TestFormatBinaryAsn() bool CFormatGuess::TestFormatBinaryAsn ( EMode  ) protected ◆ TestFormatBZip2() ◆ TestFormatCLUSTAL() bool CFormatGuess::TestFormatCLUSTAL ( void  ) protected ◆ TestFormatDistanceMatrix() bool CFormatGuess::TestFormatDistanceMatrix ( EMode  ) protected ◆ TestFormatFasta() ◆ TestFormatFiveColFeatureTable() bool CFormatGuess::TestFormatFiveColFeatureTable ( EMode  ) protected ◆ TestFormatFlatFileEna() bool CFormatGuess::TestFormatFlatFileEna ( EMode  ) protected ◆ TestFormatFlatFileGenbank() bool CFormatGuess::TestFormatFlatFileGenbank ( EMode  ) protected ◆ TestFormatFlatFileSequence() bool CFormatGuess::TestFormatFlatFileSequence ( EMode  ) protected ◆ TestFormatFlatFileUniProt() bool CFormatGuess::TestFormatFlatFileUniProt ( EMode  ) protected ◆ TestFormatGff2() ◆ TestFormatGff3() ◆ TestFormatGlimmer3() bool CFormatGuess::TestFormatGlimmer3 ( EMode  ) protected ◆ TestFormatGtf() ◆ TestFormatGvf() ◆ TestFormatGZip() ◆ TestFormatHgvs() ◆ TestFormatJson()

Definition at line 2809 of file format_guess.cpp.

References NStr::eTrunc_Begin, NStr::IsBlank(), m_iTestDataSize, m_pTestBuffer, NStr::TruncateSpacesInPlace(), x_CheckJsonStart(), x_CheckStripJsonNumbers(), x_CheckStripJsonPunctuation(), x_IsTruncatedJsonKeyword(), x_IsTruncatedJsonNumber(), x_StripJsonKeywords(), and x_StripJsonStrings().

Referenced by x_TestFormat().

◆ TestFormatLzo() ◆ TestFormatNewick() bool CFormatGuess::TestFormatNewick ( EMode  ) protected

Definition at line 1062 of file format_guess.cpp.

References EnsureSplitLines(), EnsureTestBuffer(), NStr::FindNoCase(), i, IsSampleNewick(), ITERATE, m_iTestDataSize, m_pTestBuffer, m_Stream, m_TestLines, NPOS, read_size(), and CStreamUtils::Stepback().

Referenced by GuessFormat(), and x_TestFormat().

◆ TestFormatPhrapAce() bool CFormatGuess::TestFormatPhrapAce ( EMode  ) protected ◆ TestFormatPsl() bool CFormatGuess::TestFormatPsl ( EMode  mode ) protected ◆ TestFormatRepeatMasker() bool CFormatGuess::TestFormatRepeatMasker ( EMode  ) protected ◆ TestFormatSnpMarkers() bool CFormatGuess::TestFormatSnpMarkers ( EMode  ) protected ◆ TestFormatSra() ◆ TestFormatTable() ◆ TestFormatTaxplot() bool CFormatGuess::TestFormatTaxplot ( EMode  ) protected ◆ TestFormatTextAsn() bool CFormatGuess::TestFormatTextAsn ( EMode  ) protected ◆ TestFormatVcf() ◆ TestFormatWiggle() bool CFormatGuess::TestFormatWiggle ( EMode  ) protected ◆ TestFormatXml()

Definition at line 1278 of file format_guess.cpp.

References ArraySize(), NStr::eCase, NStr::eNocase, EnsureTestBuffer(), NStr::eTrunc_Begin, i, input(), m_iTestDataSize, m_pTestBuffer, NStr::StartsWith(), and NStr::TruncateSpacesInPlace().

Referenced by x_TestFormat().

◆ TestFormatZip() ◆ TestFormatZstd() ◆ x_CheckJsonStart() ◆ x_CheckStripJsonNumbers() bool CFormatGuess::x_CheckStripJsonNumbers ( stringtestString ) const private ◆ x_CheckStripJsonPunctuation() bool CFormatGuess::x_CheckStripJsonPunctuation ( stringtestString ) const private ◆ x_FindJsonStringLimits() void CFormatGuess::x_FindJsonStringLimits ( const stringtestString, list< size_t > &  limits  ) const private ◆ x_FindNextJsonStringStop() ◆ x_IsBlankOrNumbers() ◆ x_IsNumber() ◆ x_IsTruncatedJsonKeyword() bool CFormatGuess::x_IsTruncatedJsonKeyword ( const stringtestString ) const private ◆ x_IsTruncatedJsonNumber() bool CFormatGuess::x_IsTruncatedJsonNumber ( const stringtestString ) const private ◆ x_LooksLikeCLUSTALConservedInfo() bool CFormatGuess::x_LooksLikeCLUSTALConservedInfo ( const stringline ) const private ◆ x_StripJsonKeywords() void CFormatGuess::x_StripJsonKeywords ( stringtestString ) const private ◆ x_StripJsonPunctuation() size_t CFormatGuess::x_StripJsonPunctuation ( stringtestString ) const private ◆ x_StripJsonStrings() void CFormatGuess::x_StripJsonStrings ( stringtestString ) const private ◆ x_TestFormat()

Definition at line 513 of file format_guess.cpp.

References eAgp, eAlignment, eBam, eBed, eBed15, eBinaryASN, eBZip2, eDistanceMatrix, eFasta, eFiveColFeatureTable, eFlatFileEna, eFlatFileGenbank, eFlatFileSequence, eFlatFileUniProt, eGff2, eGff3, eGffAugustus, eGlimmer3, eGtf, eGvf, eGZip, eHgvs, eJSON, eLzo, eNewick, ePhrapAce, ePsl, eRmo, eSnpMarkers, eSra, eTable, eTaxplot, eTextASN, eUCSCRegion, eVcf, eWiggle, eXml, eZip, eZstd, format, CFormatGuess::CFormatHints::IsDisabled(), m_Hints, NCBI_THROW, NStr::NumericToString(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBam(), TestFormatBed(), TestFormatBed15(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatDistanceMatrix(), TestFormatFasta(), TestFormatFiveColFeatureTable(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileSequence(), TestFormatFlatFileUniProt(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatGZip(), TestFormatHgvs(), TestFormatJson(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatRepeatMasker(), TestFormatSnpMarkers(), TestFormatSra(), TestFormatTable(), TestFormatTaxplot(), TestFormatTextAsn(), TestFormatVcf(), TestFormatWiggle(), TestFormatXml(), TestFormatZip(), and TestFormatZstd().

Referenced by GuessFormat(), and TestFormat().

◆ x_TestInput() ◆ x_TestTableDelimiter() ◆ x_TryProcessCLUSTALSeqData() bool CFormatGuess::x_TryProcessCLUSTALSeqData ( const stringline, stringid, size_t &  seg_length  ) const private ◆ m_bOwnsStream bool CFormatGuess::m_bOwnsStream protected ◆ m_bSplitDone bool CFormatGuess::m_bSplitDone protected ◆ m_bStatsAreValid bool CFormatGuess::m_bStatsAreValid protected ◆ m_Hints ◆ m_iStatsCountAaChars unsigned int CFormatGuess::m_iStatsCountAaChars protected ◆ m_iStatsCountAlNumChars unsigned int CFormatGuess::m_iStatsCountAlNumChars protected ◆ m_iStatsCountBraces unsigned int CFormatGuess::m_iStatsCountBraces protected ◆ m_iStatsCountData unsigned int CFormatGuess::m_iStatsCountData protected ◆ m_iStatsCountDnaChars unsigned int CFormatGuess::m_iStatsCountDnaChars protected ◆ m_iTestBufferSize streamsize CFormatGuess::m_iTestBufferSize protected ◆ m_iTestDataSize streamsize CFormatGuess::m_iTestDataSize protected

Definition at line 388 of file format_guess.hpp.

Referenced by EnsureSplitLines(), EnsureStats(), EnsureTestBuffer(), IsAsciiText(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatCLUSTAL(), TestFormatFasta(), TestFormatGZip(), TestFormatHgvs(), TestFormatJson(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatSra(), TestFormatTextAsn(), TestFormatXml(), TestFormatZip(), TestFormatZstd(), and x_TestTableDelimiter().

◆ m_pTestBuffer char* CFormatGuess::m_pTestBuffer protected

Definition at line 386 of file format_guess.hpp.

Referenced by EnsureSplitLines(), EnsureStats(), EnsureTestBuffer(), Initialize(), IsAsciiText(), TestFormatBinaryAsn(), TestFormatBZip2(), TestFormatCLUSTAL(), TestFormatFasta(), TestFormatGZip(), TestFormatHgvs(), TestFormatJson(), TestFormatLzo(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatSra(), TestFormatTextAsn(), TestFormatXml(), TestFormatZip(), TestFormatZstd(), and ~CFormatGuess().

◆ m_Stream ◆ m_TestLines

Definition at line 397 of file format_guess.hpp.

Referenced by EnsureSplitLines(), IsAllComment(), IsInputRepeatMaskerWithHeader(), IsInputRepeatMaskerWithoutHeader(), TestFormatAgp(), TestFormatAlignment(), TestFormatAugustus(), TestFormatBed(), TestFormatBed15(), TestFormatDistanceMatrix(), TestFormatFiveColFeatureTable(), TestFormatFlatFileEna(), TestFormatFlatFileGenbank(), TestFormatFlatFileSequence(), TestFormatFlatFileUniProt(), TestFormatGff2(), TestFormatGff3(), TestFormatGlimmer3(), TestFormatGtf(), TestFormatGvf(), TestFormatHgvs(), TestFormatNewick(), TestFormatPhrapAce(), TestFormatPsl(), TestFormatSnpMarkers(), TestFormatVcf(), TestFormatWiggle(), and x_TestTableDelimiter().

The documentation for this class was generated from the following files:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4