Search Toolkit Book for CSeqDBIsam
#include <objtools/blast/seqdb_reader/impl/seqdbisam.hpp>
Manages one ISAM file, which will translate either PIGs, GIs, or Accessions to OIDs. Translation in the other direction is done in the CSeqDBVol code. Files managed by this class include those with the extensions pni, pnd, ppi, ppd, psi, psd, nsi, nsd, nni, and nnd. Each instance of this object will manage one pair of these files, including one whose name ends in 'i' and one whose name ends in 'd'.
Definition at line 127 of file seqdbisam.hpp.
◆ TGiOidImport the type representing one GI, OID association.
Definition at line 130 of file seqdbisam.hpp.
◆ TIdType large enough to hold any numerical ID.
Definition at line 158 of file seqdbisam.hpp.
◆ TIndxType which is large enough to span the bytes of an ISAM file.
Definition at line 143 of file seqdbisam.hpp.
◆ TOidThis class works with OIDs relative to a specific volume.
Definition at line 146 of file seqdbisam.hpp.
◆ TTiPIG identifiers for numeric indices over protein volumes.
Genomic IDs, the most common numerical identifier. Identifier type for trace databases.
Definition at line 155 of file seqdbisam.hpp.
◆ EErrorCodeExit conditions occurring in this code.
Enumerator eNotFound eNoErrorThe key was not found.
eBadVersionLookup was successful.
eBadTypeThe format version of the ISAM file is unsupported.
eWrongFileThe requested ISAM type did not match the file.
eInitFailedThe file was not found, or was the wrong length.
Definition at line 489 of file seqdbisam.hpp.
◆ EIsamDbTypeTypes of database this class can access.
Enumerator eNumeric eNumericNoDataNumeric database with Key/Value pairs in the index file.
eStringThis type is not supported.
eStringDatabaseString database type used here.
eStringBinThis type is not supported.
eNumericLongIdThis type is not supported.
Definition at line 133 of file seqdbisam.hpp.
◆ CSeqDBIsam()Constructor.
An ISAM file object corresponds to an index file and a data file, and converts identifiers (string, GI, or PIG) into OIDs relative to a particular database volume.
Definition at line 1102 of file seqdbisam.cpp.
References CSeqDBFileMemMap::Clear(), dbname(), DEFAULT_NISAM_SIZE, DEFAULT_SISAM_SIZE, eGiId, eHashId, eNoError, eNumeric, ePigId, eString, eStringId, eTiId, CSeqDBFileMemMap::Init(), m_DataFname, m_DataLease, m_IndexFname, m_IndexLease, m_Initialized, m_PageSize, m_Type, msg(), NCBI_THROW, x_FindIndexBounds(), x_InitSearch(), and x_MakeFilenames().
◆ ~CSeqDBIsam() CSeqDBIsam::~CSeqDBIsam ( )Destructor.
Releases all resources associated with this object.
Definition at line 1211 of file seqdbisam.cpp.
References UnLease().
◆ GetIdBounds() [1/2] void CSeqDBIsam::GetIdBounds ( Int8 & low_id, Int8 & high_id, int & count )Get Numeric Bounds.
Fetch the lowest, highest, and total number of numeric keys in the database index. If the operation fails, zero will be returned for count.
Definition at line 1625 of file seqdbisam.cpp.
References count, CSeqDBIsam::SIsamKey::GetNumeric(), CSeqDBIsam::SIsamKey::IsSet(), m_FirstKey, m_Initialized, m_LastKey, and m_NumTerms.
Referenced by CSeqDBVol::GetGiBounds(), CSeqDBVol::GetPigBounds(), and CSeqDBVol::GetStringBounds().
◆ GetIdBounds() [2/2] void CSeqDBIsam::GetIdBounds ( string & low_id, string & high_id, int & count ) ◆ HashToOids() void CSeqDBIsam::HashToOids ( unsigned hash, vector< TOid > & oids )Sequence hash lookup.
This methods tries to find sequences associated with a given sequence hash value. The provided value is numeric but the ISAM file uses a string format, because string searches can return multiple results per key, and there may be multiple OIDs for a given hash value due to identical sequences and collisions.
Definition at line 1667 of file seqdbisam.cpp.
References _ASSERT, eHashId, eNoError, eNotFound, ITERATE, ncbi::grid::netcache::search::fields::key, m_IdentType, m_Initialized, NStr::UIntToString(), and x_StringSearch().
Referenced by CSeqDBVol::HashToOids().
◆ IdsToOids() [1/2]Translate Gis and Tis to Oids for the given ID list.
This method iterates over a vector of Gi/OID and/or Ti/OID pairs. For each pair where the OID is -1, the GI or TI will be looked up in the ISAM file, and (if found) the correct OID will be stored (otherwise the -1 will remain). This method will normally be called once for each volume.
Definition at line 1388 of file seqdbisam.cpp.
References eGiId, ePigId, eStringId, eTiId, m_IdentType, and NCBI_THROW.
Referenced by CSeqDBVol::IdsToOids().
◆ IdsToOids() [2/2]Compute list of included OIDs based on a negative ID list.
This method iterates over a vector of Gis or Tis, along with the corresponding ISAM file for this volume. Each OID found in the ISAM file is marked in the negative ID list. For those for which the GI or TI is not mentioned in the negative ID list, the OID will be marked as an 'included' OID in the ID list (that OID will be searched). The OIDs for IDs that are not found in the ID list will be marked as 'visible' OIDs. When this process is done for all volumes, the SeqDB object will use all OIDs that are either marked as 'included' or NOT marked as 'visible'. The 'visible' list is needed because otherwise iteration would skip IDs that are do not have GIs or TIs (whichever is being iterated). To use this method, this volume must have an ISAM file matching the negative ID list's identifier type or an exception will be thrown.
Definition at line 1421 of file seqdbisam.cpp.
References _ASSERT, eGiId, eStringId, eTiId, CSeqDBNegativeList::GetNumGis(), CSeqDBNegativeList::GetNumSis(), CSeqDBNegativeList::GetNumTis(), CSeqDBNegativeList::InsureOrder(), m_IdentType, x_SearchNegativeMulti(), and x_SearchNegativeMultiSeq().
◆ IdToOid() ◆ IndexExists() bool CSeqDBIsam::IndexExists ( const string & dbname, char prot_nucl, char file_ext_char ) static ◆ PigToOid()PIG translation.
A PIG identifier is translated to an OID. PIG identifiers are used exclusively for protein sequences. One PIG corresponds to exactly one sequences of amino acids, and vice versa. They are also stable; the sequence a PIG points to will never be changed.
Definition at line 203 of file seqdbisam.hpp.
References _ASSERT, ePigId, m_IdentType, and x_IdentToOid().
Referenced by CSeqDBVol::PigToOid(), and CSeqDBVol::x_StringToOids().
◆ SeqidToOid()Seq-id translation.
A Seq-id identifier (serialized to a string) is translated into an OID. This routine will attempt to simplify the seqid so as to use the faster numeric lookup techniques whenever possible.
String translation.
A string id is translated to one or more OIDs. String ids are used by some groups which produce sequence data. In some cases, the string may correspond to more than one OID. For this reason, the OIDs are returned in a vector. The string provided is looked up in several ways. If it contains a pipe character ("|") the data will be interpreted as a SeqID. This routine can use faster lookup mechanisms if the simplification routines were able to recognize the sequence as one of several types that have numerical indices. The version_check flag is needed to support sparse indexing. If version_check is true, and the string has a version, and the lookup fails, this method will try to remove the version and search again. On return from this method version_check will be set to true if and only if the first search failed and the versionless search succeeded. CSeqDBVol::x_CheckVersions() can then be called to verify the OIDs; see that method for more information about this scenario.
Definition at line 1236 of file seqdbisam.cpp.
References _ASSERT, CSeq_id::AsFastaString(), eNoError, eNotFound, eStringId, CSeq_id::fParse_AnyLocal, CSeq_id::fParse_RawText, isdigit(), ITERATE, m_IdentType, m_Initialized, ncbi::grid::netcache::search::fields::size, and x_StringSearch().
Referenced by CSeqDBVol::x_StringToOids().
◆ UnLease() void CSeqDBIsam::UnLease ( ) ◆ x_DiffChar()Find the first character to differ in two strings.
This finds the index of the first character to differ in meaningful way between two strings. One of the strings is a term that is passed in; the other is a range of memory represented by two pointers.
Definition at line 589 of file seqdbisam.cpp.
References ch1, ch2, ENDS_ISAM_KEY(), i, int, result, s_SeqDBIsam_NullifyEOLs(), and toupper().
Referenced by x_DiffCharLease(), x_ExtractAllData(), and x_ExtractPageData().
◆ x_DiffCharLease()Find the first character to differ in two strings.
This finds the index of the first character to differ in meaningful way between two strings. One of the strings is a term that is passed in; the other is assumed to be located in the ISAM table, a lease to which is passed to this function.
Definition at line 516 of file seqdbisam.cpp.
References file_name, CSeqDBFileMemMap::GetFileDataPtr(), int, result, and x_DiffChar().
Referenced by x_DiffSample().
◆ x_DiffSample()Find the first character to differ in two strings.
This finds the index of the first character to differ between two strings. The first string is provided, the second is one of the sample strings, indicated by the index of that sample value.
Definition at line 863 of file seqdbisam.cpp.
References CSeqDBFileMemMap::GetFileDataPtr(), m_IndexFileLength, m_IndexFname, m_IndexLease, m_KeySampleOffset, m_MaxLineSize, m_NumSamples, m_PageSize, MEMORY_ONLY_PAGE_SIZE, SeqDB_GetStdOrd(), and x_DiffCharLease().
Referenced by x_StringSearch().
◆ x_ExtractAllData() void CSeqDBIsam::x_ExtractAllData ( const string & term_in, TIndx sample_index, vector< TIndx > & indices_out, vector< string > & keys_out, vector< string > & data_out ) privateFind matches in the given page of a string ISAM file.
This searches the area around a specific page of the data file to find all matches to term_in. The results are returned in vectors. This method may search multiple pages.
Definition at line 688 of file seqdbisam.cpp.
References m_NumSamples, m_PageSize, s_SeqDBIsam_NullifyEOLs(), x_DiffChar(), x_ExtractPageData(), and x_LoadPage().
Referenced by x_StringSearch().
◆ x_ExtractData() void CSeqDBIsam::x_ExtractData ( const char * key_start, const char * entry_end, vector< string > & key_out, vector< string > & data_out ) privateExtract the data from a key-value pair in memory.
Given pointers to a location in mapped memory, and the end of the mapped data, this finds the key and data values for the object at that location.
Definition at line 793 of file seqdbisam.cpp.
References ISAM_DATA_CHAR, and s_SeqDBIsam_NullifyEOLs().
Referenced by x_ExtractPageData(), and x_FindIndexBounds().
◆ x_ExtractPageData()Find matches in the given memory area of a string ISAM file.
This searches the specified section of memory to find all matches to term_in. The results are returned in vectors.
Definition at line 634 of file seqdbisam.cpp.
References s_SeqDBIsam_NullifyEOLs(), x_DiffChar(), and x_ExtractData().
Referenced by x_ExtractAllData(), and x_StringSearch().
◆ x_FindIndexBounds() void CSeqDBIsam::x_FindIndexBounds ( ) privateFind the least and greatest keys in this ISAM file.
Definition at line 1461 of file seqdbisam.cpp.
References _ASSERT, eNumeric, m_FirstKey, m_LastKey, m_NumSamples, m_Type, s_SeqDBIsam_NullifyEOLs(), CSeqDBIsam::SIsamKey::SetNumeric(), CSeqDBIsam::SIsamKey::SetString(), x_ExtractData(), x_GetDataElement(), x_LoadPage(), x_Lower(), and x_MapDataPage().
Referenced by CSeqDBIsam().
◆ x_FindInNegativeList() [1/2]Find ID in the negative GI list using PBS.
Use parabolic binary search to find the specified ID in the negative ID list. The 'index' value is the index to start the search at (this must refer to an index at or before the target data if the search is to succeed). Whether the search was successful or not, the index will be moved forward past any elements with values less than 'key'.
Definition at line 1428 of file seqdbisam.hpp.
References ncbi::grid::netcache::search::fields::key, CSeqDBNegativeList::ListSize(), and x_GetId().
Referenced by x_SearchNegativeMulti(), and x_SearchNegativeMultiSeq().
◆ x_FindInNegativeList() [2/2] ◆ x_GetDataElement() [1/2] void CSeqDBIsam::x_GetDataElement ( const void * dpage, int index, Int8 & key, int & data ) inlineprivate ◆ x_GetDataElement() [2/2] void CSeqDBIsam::x_GetDataElement ( const void * dpage, int index, string & key, int & data ) inlineprivate ◆ x_GetId() [1/2] ◆ x_GetId() [2/2] ◆ x_GetIndexKeyOffset()Get the offset of the specified sample.
For string ISAM indices, the index file contains a table of offsets of the index file samples. This function gets the offset of the specified sample in the index file's table.
Definition at line 823 of file seqdbisam.cpp.
References CSeqDBFileMemMap::GetFileDataPtr(), m_IndexLease, and SeqDB_GetStdOrd().
Referenced by x_StringSearch().
◆ x_GetIndexString() void CSeqDBIsam::x_GetIndexString ( TIndx key_offset, int length, string & prefix, bool trim_to_null ) privateRead a string from the index file.
Given an offset into the index file, and a maximum length, this function returns the bytes in a string object.
Definition at line 836 of file seqdbisam.cpp.
References CSeqDBFileMemMap::GetFileDataPtr(), i, m_IndexLease, and str().
Referenced by x_StringSearch().
◆ x_GetNumericData() int CSeqDBIsam::x_GetNumericData ( const void * p ) inlineprivate ◆ x_GetNumericKey() Uint8 CSeqDBIsam::x_GetNumericKey ( const void * p ) inlineprivate ◆ x_GetNumericSample()Get a sample key value from a numeric index.
Given the index of a sample value, this code will get the key. If data values are stored in the index file, the corresponding data value will also be returned. The offset of the data block is computed and returned as well.
Definition at line 1315 of file seqdbisam.hpp.
References CSeqDBFileMemMap::GetFileDataPtr(), m_KeySampleOffset, m_TermSize, x_GetNumericData(), and x_GetNumericKey().
◆ x_GetPageNumElements() Int4 CSeqDBIsam::x_GetPageNumElements ( Int4 SampleNum, Int4 * Start ) privateDetermine the number of elements in the data page.
The number of elements is determined based on whether this is the last page and the configured page size.
Definition at line 123 of file seqdbisam.cpp.
References m_NumSamples, m_NumTerms, and m_PageSize.
Referenced by x_MapDataPage(), and x_SearchDataNumeric().
◆ x_IdentToOid()Numeric identifier lookup.
Given a numeric identifier, this routine finds the OID.
Definition at line 1222 of file seqdbisam.cpp.
References eNoError, and x_NumericSearch().
Referenced by IdToOid(), and PigToOid().
◆ x_InitSearch()Initialize the search object.
The first identifier search sets up the object by calling this function, which reads the metadata from the index file and sets all the fields needed for ISAM lookups.
Definition at line 59 of file seqdbisam.cpp.
References eBadType, eBadVersion, eNoError, eNumeric, eNumericLongId, eWrongFile, CSeqDBFileMemMap::GetFileDataPtr(), CSeqDBAtlas::GetFileSizeL(), ISAM_VERSION, m_Atlas, m_DataFileLength, m_DataFname, m_IdxOption, m_IndexFileLength, m_IndexFname, m_IndexLease, m_Initialized, m_KeySampleOffset, m_LongId, m_MaxLineSize, m_NumSamples, m_NumTerms, m_PageSize, m_TermSize, m_Type, MEMORY_ONLY_PAGE_SIZE, and SeqDB_GetStdOrd().
Referenced by CSeqDBIsam().
◆ x_LoadData() [1/3] ◆ x_LoadData() [2/3] ◆ x_LoadData() [3/3] ◆ x_LoadIndex() [1/3] ◆ x_LoadIndex() [2/3] ◆ x_LoadIndex() [3/3] ◆ x_LoadPage() void CSeqDBIsam::x_LoadPage ( TIndx SampleNum1, TIndx SampleNum2, const char ** beginp, const char ** endp ) privateMap a page into memory.
Given two indices, this method maps into memory the area starting at the beginning of the first index and extending to the end of the other. (If the indices are equal, only one page would be mapped.)
Definition at line 899 of file seqdbisam.cpp.
References _ASSERT, CSeqDBFileMemMap::GetFileDataPtr(), m_DataFname, m_DataLease, m_IndexLease, m_KeySampleOffset, and SeqDB_GetStdOrd().
Referenced by x_ExtractAllData(), x_FindIndexBounds(), and x_StringSearch().
◆ x_LoadStringData() void CSeqDBIsam::x_LoadStringData ( const char * begin, string & key, int & data ) inlineprivate ◆ x_Lower() ◆ x_MakeFilenames() void CSeqDBIsam::x_MakeFilenames ( const string & dbname, char prot_nucl, char file_ext_char, string & index_name, string & data_name ) staticprivateMake filenames for ISAM file.
Definition at line 1173 of file seqdbisam.cpp.
References dbname(), isalpha(), and NCBI_THROW.
Referenced by CSeqDBIsam(), and IndexExists().
◆ x_MapDataPage() void CSeqDBIsam::x_MapDataPage ( int sample_index, int & start, int & num_elements, const void ** data_page_begin ) inlineprivate ◆ x_NumericSearch()Numeric identifier lookup.
Given a numeric identifier, this routine finds the OID.
Definition at line 498 of file seqdbisam.cpp.
References done, x_SearchDataNumeric(), and x_SearchIndexNumeric().
Referenced by x_IdentToOid().
◆ x_OutOfBounds() [1/2] bool CSeqDBIsam::x_OutOfBounds ( Int8 key ) private ◆ x_OutOfBounds() [2/2] ◆ x_SearchDataNumeric()Data file search.
Given a numeric identifier, this routine finds the OID in the data file.
Definition at line 421 of file seqdbisam.cpp.
References _ASSERT, eNoError, eNotFound, eNumericNoData, first(), CSeqDBFileMemMap::GetFileDataPtr(), last(), m_DataFname, m_DataLease, m_TermSize, m_Type, NULL, x_GetNumericData(), x_GetNumericKey(), and x_GetPageNumElements().
Referenced by x_NumericSearch().
◆ x_SearchIndexNumeric()Index file search.
Given a numeric identifier, this routine finds the OID or the page in the data file where the OID can be found.
Definition at line 140 of file seqdbisam.cpp.
References _ASSERT, done, eInitFailed, eNoError, eNotFound, eNumericNoData, CSeqDBFileMemMap::GetFileDataPtr(), m_IndexFname, m_IndexLease, m_Initialized, m_KeySampleOffset, m_NumSamples, m_PageSize, m_TermSize, m_Type, NULL, x_GetNumericData(), x_GetNumericKey(), and x_OutOfBounds().
Referenced by x_NumericSearch().
◆ x_SearchNegativeMulti()Negative ID List Translation.
Given a Negative ID list, this routine turns on the bits for the OIDs found in the volume but not in the negated ID list.
Definition at line 219 of file seqdbisam.cpp.
References _ASSERT, CSeqDBNegativeList::AddIncludedOid(), CSeqDBNegativeList::AddVisibleOid(), eNumericNoData, CSeqDBNegativeList::GetNumGis(), CSeqDBNegativeList::GetNumTis(), i, m_Initialized, m_NumSamples, m_Type, NCBI_THROW, x_FindInNegativeList(), x_GetDataElement(), and x_MapDataPage().
Referenced by IdsToOids().
◆ x_SearchNegativeMultiSeq()Definition at line 333 of file seqdbisam.cpp.
References CSeqDBNegativeList::AddIncludedOid(), CSeqDBNegativeList::AddVisibleOid(), i, CSeqDBNegativeList::ListSize(), m_DataLease, m_IndexLease, m_Initialized, m_NumSamples, m_NumTerms, m_PageSize, NCBI_THROW, s_IsSameAccession(), x_FindInNegativeList(), x_LoadData(), and x_LoadIndex().
Referenced by IdsToOids().
◆ x_SparseStringToOids()Lookup a string in a sparse table.
This does string lookup in a sparse string table. There is no support (code) for this since there are currently no examples of this kind of table to test against.
Definition at line 1378 of file seqdbisam.cpp.
References _TROUBLE.
◆ x_StringSearch()String identifier lookup.
Given a string identifier, this routine finds the OID(s).
Definition at line 934 of file seqdbisam.cpp.
References NStr::CompareNocase(), eInitFailed, eNoError, eNotFound, CSeqDBFileMemMap::GetFileDataPtr(), int, m_IndexFileLength, m_IndexLease, m_Initialized, m_KeySampleOffset, m_MaxLineSize, m_NumSamples, m_PageSize, MEMORY_ONLY_PAGE_SIZE, tolower(), x_DiffSample(), x_ExtractAllData(), x_ExtractPageData(), x_GetIndexKeyOffset(), x_GetIndexString(), x_LoadPage(), and x_OutOfBounds().
Referenced by HashToOids(), and StringToOids().
◆ x_TestNumericSample()Test a sample key value from a numeric index.
This method reads the key value of an index file sample element from a numeric index file. The calling code should insure that the data is mapped in, and that the file type is correct. The key value found will be compared to the search key. This method will return 0 for an exact match, -1 if the key is less than the sample, or 1 if the key is greater. If the match is exact, it will also return the data in data_out.
Definition at line 1284 of file seqdbisam.hpp.
References CSeqDBFileMemMap::GetFileDataPtr(), m_KeySampleOffset, m_TermSize, x_GetNumericData(), and x_GetNumericKey().
◆ x_TranslateGiList()GiList Translation.
Given a GI list, this routine finds the OID for each ID in the list not already having a translation.
Definition at line 549 of file seqdbisam.hpp.
References CSeqDBGiList::eGi, CSeqDBGiList::GetKey(), CSeqDBGiList::GetSize(), CSeqDBGiList::InsureOrder(), m_DataLease, m_IndexLease, m_Initialized, m_NumSamples, m_NumTerms, m_PageSize, NCBI_THROW, T, x_LoadData(), and x_LoadIndex().
◆ m_Atlas ◆ m_DataFileLength TIndx CSeqDBIsam::m_DataFileLength private ◆ m_DataFname string CSeqDBIsam::m_DataFname private ◆ m_DataLease ◆ m_FileStart char* CSeqDBIsam::m_FileStart private ◆ m_FirstKey ◆ m_FirstOffset Int4 CSeqDBIsam::m_FirstOffset private ◆ m_IdentType ◆ m_IdxOption Int4 CSeqDBIsam::m_IdxOption private ◆ m_IndexFileLength TIndx CSeqDBIsam::m_IndexFileLength private ◆ m_IndexFname string CSeqDBIsam::m_IndexFname private ◆ m_IndexLeaseA persistent lease on the ISAM index file.
Definition at line 1186 of file seqdbisam.hpp.
Referenced by CSeqDBIsam(), UnLease(), x_DiffSample(), x_GetIndexKeyOffset(), x_GetIndexString(), x_InitSearch(), x_LoadPage(), x_SearchIndexNumeric(), x_SearchNegativeMultiSeq(), x_StringSearch(), and x_TranslateGiList().
◆ m_Initialized bool CSeqDBIsam::m_Initialized private ◆ m_KeySampleOffset TIndx CSeqDBIsam::m_KeySampleOffset private ◆ m_LastKey ◆ m_LastOffset Int4 CSeqDBIsam::m_LastOffset private ◆ m_LongId bool CSeqDBIsam::m_LongId private ◆ m_MaxLineSize Int4 CSeqDBIsam::m_MaxLineSize private ◆ m_NumSamples Int4 CSeqDBIsam::m_NumSamples privateNumber of terms in ISAM index.
Definition at line 1212 of file seqdbisam.hpp.
Referenced by x_DiffSample(), x_ExtractAllData(), x_FindIndexBounds(), x_GetPageNumElements(), x_InitSearch(), x_LoadIndex(), x_SearchIndexNumeric(), x_SearchNegativeMulti(), x_SearchNegativeMultiSeq(), x_StringSearch(), and x_TranslateGiList().
◆ m_NumTerms Int4 CSeqDBIsam::m_NumTerms private ◆ m_PageSize Int4 CSeqDBIsam::m_PageSize private ◆ m_TermSize int CSeqDBIsam::m_TermSize private ◆ m_TestNonUnique bool CSeqDBIsam::m_TestNonUnique private ◆ m_TypeThe documentation for this class was generated from the following files:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4