* Adds lovely initial concept for historical data doer
* Adds ability to save tasks. Adds config. Adds startStop to engine
* Has a database microservice without use of globals! Further infrastructure design. Adds readme
* Commentary to help design
* Adds migrations for database
* readme and adds database models
* Some modelling that doesn't work end of day
* Completes datahistoryjob sql.Begins datahistoryjobresult
* Adds datahistoryjob functions to retreive job results. Adapts subsystem
* Adds process for upserting jobs and job results to the database
* Broken end of day weird sqlboiler crap
* Fixes issue with SQL generation.
* RPC generation and addition of basic upsert command
* Renames types
* Adds rpc functions
* quick commit before context swithc. Exchanges aren't being populated
* Begin the tests!
* complete sql tests. stop failed jobs. CLI command creation
* Defines rpc commands
* Fleshes out RPC implementation
* Expands testing
* Expands testing, removes double remove
* Adds coverage of data history subsystem, expands errors and nil checks
* Minor logic improvement
* streamlines datahistory test setup
* End of day minor linting
* Lint, convert simplify, rpc expansion, type expansion, readme expansion
* Documentation update
* Renames for consistency
* Completes RPC server commands
* Fixes tests
* Speeds up testing by reducing unnecessary actions. Adds maxjobspercycle config
* Comments for everything
* Adds missing result string. checks interval supported. default start end cli
* Fixes ID problem. Improves binance trade fetch. job ranges are processed
* adds dbservice coverage. adds rpcserver coverage
* docs regen, uses dbcon interface, reverts binance, fixes races, toggle manager
* Speed up tests, remove bad global usage, fix uuid check
* Adds verbose. Updates docs. Fixes postgres
* Minor changes to logging and start stop
* Fixes postgres db tests, fixes postgres column typo
* Fixes old string typo,removes constraint,error parsing for nonreaders
* prevents dhm running when table doesn't exist. Adds prereq documentation
* Adds parallel, rmlines, err fix, comment fix, minor param fixes
* doc regen, common time range check and test updating
* Fixes job validation issues. Updates candle range checker.
* Ensures test cannot fail due to time.Now() shenanigans
* Fixes oopsie, adds documentation and a warn
* Fixes another time test, adjusts copy
* Drastically speeds up data history manager tests via function overrides
* Fixes summary bug and better logs
* Fixes local time test, fixes websocket tests
* removes defaults and comment,updates error messages,sets cli command args
* Fixes FTX trade processing
* Fixes issue where jobs got stuck if data wasn't returned but retrieval was successful
* Improves test speed. Simplifies trade verification SQL. Adds command help
* Fixes the oopsies
* Fixes use of query within transaction. Fixes trade err
* oopsie, not needed
* Adds missing data status. Properly ends job even when data is missing
* errors are more verbose and so have more words to describe them
* Doc regen for new status
* tiny test tinkering
* str := string("Removes .String()").String()
* Merge fixups
* Fixes a data race discovered during github actions
* Allows websocket test to pass consistently
* Fixes merge issue preventing datahistorymanager from starting via config
* Niterinos cmd defaults and explanations
* fixes default oopsie
* Fixes lack of nil protection
* Additional oopsie
* More detailed error for validating job exchange
12 KiB
GoCryptoTrader package Datahistory manager
This datahistory_manager package is part of the GoCryptoTrader codebase.
This is still in active development
You can track ideas, planned features and what's in progress on this Trello board: https://trello.com/b/ZAhMhpOy/gocryptotrader.
Join our slack to discuss all things related to GoCryptoTrader! GoCryptoTrader Slack
Current Features for Datahistory manager
- The data history manager is an engine subsystem responsible for ensuring that the candle/trade history in the range you define is synchronised to your database
- It is a long running synchronisation task designed to not overwhelm resources and ensure that all data requested is accounted for and saved to the database
- The data history manager is disabled by default and requires a database connection to function
- It can be enabled either via a runtime param, config modification or via RPC command
enablesubsystem
- It can be enabled either via a runtime param, config modification or via RPC command
- The data history manager accepts jobs from RPC commands
- A job is defined in the
Database tablessection below - Jobs will be addressed by the data history manager at an interval defined in your config, this is detailed below in the
Application run time parameterstable below - Jobs will fetch data at sizes you request (which can cater to hardware limitations such as low RAM)
- Jobs are completed once all data has been fetched/attempted to be fetched in the time range
What are the prerequisites?
- Ensure you have a database setup, you can read about that here
- Ensure you have run dbmigrate under
/cmd/dbmigrateviadbmigrate -command=up, you can read about that here - Ensure you have seeded exchanges to the database via the application dbseed under
/cmd/dbseed, you can read about it here - Ensure you have the database setup and enabled in your config, this can also be seen here
- Data retrieval can only be made on exchanges that support it, see the readmes for candles and trades
- Read below on how to enable the data history manager and add data history jobs
What is a data history job?
A job is a set of parameters which will allow GoCryptoTrader to periodically retrieve historical data. Its purpose is to break up the process of retrieving large sets of data for multiple currencies and exchanges into more manageable chunks in a "set and forget" style. For a breakdown of what a job consists of and what each parameter does, please review the database tables and the cycle details below.
What happens during a data history cycle?
- Once the checkInterval ticker timer has finished, the data history manager will process all jobs considered
active. - A job's start and end time is broken down into intervals defined by the
intervalvariable of a job. For a job beginning2020-01-01to2020-01-02with an interval of one hour will create 24 chunks to retrieve - The number of intervals it will then request from an API is defined by the
RequestSizeLimit. ARequestSizeLimitof 2 will mean when processing a job, the data history manager will fetch 2 hours worth of data - When processing a job the
RunBatchLimitdefines how manyRequestSizeLimitsit will fetch. ARunBatchLimitof 3 means when processing a job, the history manager will fetch 3 lots of 2 hour chunks from the API in a run of a job - If the data is successfully retrieved, that chunk will be considered
completeand saved to the database - The
MaxRetryAttemptsdefines how many times the data history manager will attempt to fetch a chunk of data before flagging it asfailed.- A chunk is only attempted once per processing time.
- If it fails, the next attempt will be after the
checkIntervalhas finished again. - The errors for retrieval failures are stored in the database, allowing you to understand why a certain chunk of time is unavailable (eg exchange downtime and missing data)
- All results are saved to the database, the data history manager will analyse all results and ready jobs for the next round of processing
How do I add one?
- First ensure that the data history monitor is enabled, you can do this via the config (see table
dataHistoryManagerunder Config parameters below), via run time parameter (see table Application run time parameters below) or via the RPC commandenablesubsystem --subsystemname="data_history_manager" - The simplest way of adding a new data history job is via the GCTCLI under
/cmd/gctcli.- Modify the following example command to your needs:
.\gctcli.exe datahistory upsertjob --nickname=binance-spot-bnb-btc-1h-candles --exchange=binance --asset=spot --pair=BNB-BTC --interval=3600 --start_date="2020-06-02 12:00:00" --end_date="2020-12-02 12:00:00" --request_size_limit=10 --data_type=0 --max_retry_attempts=3 --batch_size=3
- Modify the following example command to your needs:
Candle intervals and trade fetching
- A candle interval is required for a job, even when fetching trade data. This is to appropriately break down requests into time interval chunks. However, it is restricted to only a small range of times. This is to prevent fetching issues as fetching trades over a period of days or weeks will take a significant amount of time. When setting a job to fetch trades, the allowable range is less than 4 hours and greater than 10 minutes.
Application run time parameters
| Parameter | Description | Example |
|---|---|---|
| datahistorymanager | A boolean value which determines if the data history manager is enabled. Defaults to false |
-datahistorymanager=true |
Config parameters
dataHistoryManager
| Config | Description | Example |
|---|---|---|
| enabled | If enabled will run the data history manager on startup | true |
| checkInterval | A golang time.Duration interval of when to attempt to fetch all active jobs' data |
15000000000 |
| maxJobsPerCycle | Allows you to control how many jobs are processed after the checkInterval timer finishes. Useful if you have many jobs, but don't wish to constantly be retrieving data |
5 |
| verbose | Displays some extra logs to your logging output to help debug | false |
RPC commands
The below table is a summary of commands. For more details, view the commands in /cmd/gctcli or /gctrpc/rpc.swagger.json
| Command | Description |
|---|---|
| UpsertDataHistoryJob | Updates or Inserts a job to the manager and database |
| GetDataHistoryJobDetails | Returns a job's details via its nickname or ID. Can optionally return an array of all run results |
| GetActiveDataHistoryJobs | Will return all jobs that have an active status |
| DeleteJob | Will remove a job for processing. Data is preserved in the database for later reference |
| GetDataHistoryJobsBetween | Returns all jobs, of all status types between the dates provided |
| GetDataHistoryJobSummary | Will return an executive summary of the progress of your job by nickname |
Database tables
datahistoryjob
| Field | Description | Example |
|---|---|---|
| id | Unique ID of the job. Generated at creation | deadbeef-dead-beef-dead-beef13371337 |
| nickname | A custom name for the job that is unique for lookups | binance-xrp-doge-2017 |
| exchange_name_id | The exchange id to fetch data from. The ID should be generated via /cmd/dbmigrate. When creating a job, you only need to provide the exchange name |
binance |
| asset | The asset type of the data to be fetching | spot |
| base | The currency pair base of the data to be fetching | xrp |
| quote | The currency pair quote of the data to be fetching | doge |
| start_time | When to begin fetching data | 01-01-2017T13:33:37Z |
| end_time | When to finish fetching data | 01-01-2018T13:33:37Z |
| interval | A golang time.Duration representation of the candle interval to use. |
30000000000 |
| data_type | The data type to fetch. 0 is candles and 1 is trades |
0 |
| request_size | The number of candles to fetch. eg if 500, the data history manager will break up the request into the appropriate timeframe to ensure the data history run interval will fetch 500 candles to save to the database |
500 |
| max_retries | For an interval period, the amount of attempts the data history manager is allowed to attempt to fetch data before moving onto the next period. This can be useful for determining whether the exchange is missing the data in that time period or, if just one failure of three, just means that the data history manager couldn't finish one request | 3 |
| batch_count | The number of requests to make when processing a job | 3 |
| status | A numerical representation for the status. 0 is active, 1 is failed 2 is complete, 3 is removed and 4 is missing data |
0 |
| created | The date the job was created. | 2020-01-01T13:33:37Z |
datahistoryjobresult
| Field | Description | Example |
|---|---|---|
| id | Unique ID of the job status | deadbeef-dead-beef-dead-beef13371337 |
| job_id | The job ID being referenced | deadbeef-dead-beef-dead-beef13371337 |
| result | If there is an error, it will be detailed here | exchange missing candle data for 2020-01-01 13:37Z |
| status | A numerical representation of the job result status. 1 is failed, 2 is complete and 4 is missing data |
2 |
| interval_start_time | The start date of the period fetched | 2020-01-01T13:33:37Z |
| interval_end_time | The end date of the period fetched | 2020-01-02T13:33:37Z |
| run_time | The time the job was ran | 2020-01-03T13:33:37Z |
Please click GoDocs chevron above to view current GoDoc information for this package
Contribution
Please feel free to submit any pull requests or suggest any desired features to be added.
When submitting a PR, please abide by our coding guidelines:
- Code must adhere to the official Go formatting guidelines (i.e. uses gofmt).
- Code must be documented adhering to the official Go commentary guidelines.
- Code must adhere to our coding style.
- Pull requests need to be based on and opened against the
masterbranch.
Donations
If this framework helped you in any way, or you would like to support the developers working on it, please donate Bitcoin to:
bc1qk0jareu4jytc0cfrhr5wgshsq8282awpavfahc