Changelog
Update VPS Host v2016
- Disabled the fwupd package for Linux images as a workaround for an Ubuntu 24 issue under VPS environments.
- Removed rm_init_failed error from watchdog.
- Set VGA type to none for Windows templates to improve GPU passthrough support.
- Fixed incorrect running status for VPS created with errors (e.g. GPU cold3 state).
- Added audio device for Windows VPS.
- Added automatic scandisk with disk error fixing before VPS resume.
- Fixed script arrays handling in listIdGet for VPS.
- Fixed incorrect handling of resumed VPS being treated as config-only changes cause issue with resume under some conditions.
- Removed hv_ipi=on,hv_synic,hv_stimer from VPS params.
- Added aio=threads to scsi0 options.
Update Docker Host v1068
- Moved GPU purging for frozen jobs to occur before GPU reset.
- Improved detection of non-responsive Docker during veryfying docker instance working.
- Minor fixes in the local logging system.
- Removed rm_init_failed error from watchdog logic.
- Added automatic copying of mount point contents to Persistent Storage when enabled and initially empty.
Instance
- VPS: added Windows 11 image
Instance
- Docker: added "Add Persistence Volume" option in Explore Apps template details
- VPS: added ability to change SSH key in resume process
Update Docker Host v1065
- Updated statistics to use the new Docker API version.
- Restored GPU reset functionality after removing an instance.
- Disabled secure connection for simplepodai and pytorch-cuda Docker templates.
- Disabled secure connection for FileManager.
Update VPS Host v2011
- Improved the VPS shutdown procedure timing.
- Restored qm stop in addition to qm shutdown as a workaround when ACPI shutdown fails on frozen VPS instances.
- VPS instances are now reconfigured when the instanceId changes.
- Disabled KVM errors for users and limited them to smdebug only.
- Added log rotation with cleanup of old atop, pveupload, and other obsolete files on the main disk.
- Disabled KSM (Kernel Samepage Merging).
- Added several improvements for Windows VPS instances.
- Improved VPS creation speed.
- Fixed CPU affinity assignment for VPS instances using HugePages.
- Reassigned cut off memory blocks between NUMA nodes for VPS instances.
- Simplified the HugePage CPU counting procedure.
- Added a "VPS Resuming" status message in the user console.
- Added statLeak for debugging memory and process leaks.
- Added forced AES support to CPU options.
- Reduced the wait time for MAC address verification in cloud-init.
- Updated statVmi to treat the internal-error status as running (possible blue screen) instead of undefined.
- VPS instances with internal-error status are now restarted automatically.
Instance
- Docker: added Share button in Template Details at Explore Apps section
- many fixes
Update Docker Host v1062
- Added a watchdog for ShellInABox.
- Temporarily disabled the OOM check.
- Added bond+ support in the firewall and fixed gateway detection under certain bond-related conditions.
- Fixed an issue where MFS did not detect disks outside a datacenter under specific conditions.
- Updated MFS to version 4.58.2.
- Stats now use 100 percent of system RAM value from the API instead of 95 percent.
- Fixed killing CF tunnels inside Docker.
- Fixed chat websocket screen management.
Update VPS Host v2010
- Improved ping statistics
- Increased debug file history retention
- Plugin messages are suppressed in VPS console
- Simplified VPS shutdown procedure with log suppression
- sysStat now sends 100 percent of host physical memory size to the API letting it to memory managing
- VPS is recreated from scratch (except disk data) when a stopped state is detected instead of just rererun
- Fixed NODELOCK looping issue caused by grabbing VPS config while status is still running
- Fixed NODELOCK grabbing creating and refreshing logic
- Preparation to support HugePages with full NUMA GPU/Memory/CPUs redistribution for maximum performance
Update Docker Host v1061
- Watchdog: added “BAR not claimed” error (code h46)
- Disabled instance management during host shutdown and reboot
- Increased debug log rotation threshold
- Watchdog now resends the error code if the same error reoccurs and unlists the host again
- Added periodic autofix for crashed docker instance SSHD configurations
- Increased sleep duration to 5 minutes in forced reboot/shutdown scripts to ensure standard reboot procedures complete
- Temporarily disabled GPU reset
- Added GPU purge functionality for frozen jobs
- Improved GPU error handling in watchdog by validating PCI IDs against the GPU list
- Fixed SSH restarting inside Docker instances after configuration correction
Update VPS Host v2008
- Improved system process handling by resizing /run/lock
- Improved host reboot and shutdown procedures
- Improved disk performance on the host filesystem storage
- Enhanced GPU error handling in errParseDmesg.sh by validating PCI IDs against the GPU list
Update VPS Host v2007
- Improved and simplified VPS shutdown/stop operations
- Watchdog resend error to dashboard when error on host reoccured (and unlist host again)
Update Docker Host v1059
- Watchdog now reports specific AER errors instead of a generic “hardware error” for easier troubleshooting.
- Upgraded the MFS storage system to improve speed and overall performance.
- Added automatic disk optimization with fstrim running regularly in the background.
Update VPS Host v2005
- Changed "VPS Locked" message to more human
- SatGpu added new NV 3xxx definition
- Fixed removing coudinit and efi disks on mfs
- Added fstrim in crontab
- VPS start changed virtio-scsi-pci to virtio-scsi-single
- VPS start added iothread=1 option for disk
- Fixed not resuming instance when status=unknown (source host offline?)
Instance
- API: removed itemsPerPage and page query parameters in Find instances (/instances/market/list) from docs as they have no effect on returned data
- Added Brazil to countries list in user invoice details
Instance
- improved VPS disk speed by using different network storage system. From previous 130MB/s to 2200MB/s. All newly created VPS will be deployed on new storage. Old VPS will be migrated within few weeks.
Update VPS Host v2004
- Added MooseFS support as default storage
- Watchdog changed "Hardware Error" to "AER" detection string
- Added support for diskResize on host side
Instance
- Added visual charts to display balance changes in the billing section for easier tracking.
- Restricted login and registration access from Russia, Belarus, and Singapore due to unsupported activities in these regions.
- Applied various minor fixes and optimizations across the platform to enhance stability and performance.
Update Docker Host v1057
- back count GPU to simplest lspci due to issues with nvtool, nvidia-smi under overload
- added rriShowGpuReservations.sh script for debug purposes
- added MIG support
- statGpu gpuMemUtil now return max(100%) when invalid (not supported) value returned from driver
- rebootTwice deprecated code purged
- added missing part of log debugs in statusHost
- rriStartPortMissflag removed after send to API to not block host forever
- logRotate fixed checking if shared-disk rri contains mounted MFS resource
- improved GPU reset procedure (added killing binded/freezed apps before reset)
- increase log retention periods to 30 days for various log files and archives
- smdebug added mc config edit internal editor
- statsGpu PCIe gen reading only once
Update VPS Host v2003
- watchdogSystem added error handling h45 AER Hardware Error
- watchdogSystem disabled Reboot
- vmiManager fixed reading new assoc table of pveIds
- Prepared hosts for Hugepages support. With transparent option.
- Hide not matter error message vmiStat empty during VPS creating process
- "VPS stopped. VPS paused." messave changed to just one simple: "VPS paused"
- vmiStart added condition to catch status=stopped with errorCreating flag
- vmiStart removing EFI components moved before importdisk step
- vmiStart/Remove fixed NODELOCK
- vmiStart/Remove check lock (backup?) before doing job
Update VPS Host v2002
- added smdebug command
- better VPS stats errors handling
- vmiRemove using stop instead of shutdown during remove VPS (not pausing)
- statusHost added more debug code
- added ERROR unlisting: device mapping invalid (IOMMU?)
- added ERROR unlisting: vfio BAR not claimed (reset GPU failed?)
- vmiStart improved removing old rbd cloud-init file
- statVmi fixed checking subprocess existing
- statSys static sysHddSharedSize=40TB
- vmiStart new metod of veryfing if VPS is running
- vmiStart added removing rbd stale or locked files
- vmiStart added removing old/prev resources before adding them again
- vmiStart now not break (return) if some step failed
- vmiListIdsGet improved and used faster method
- vmiRemove added missing timeouts for two commands
- rriStart fixed typo in mfs mounting folder name
- rriRemove umount both msf/mfs names folders if present in shared-disk directory
- watchdog added error h45: Hardware Error string detection
- removed deprecated ifSys variable in code
- updated process name updVersion in watchdog code
- disable reboot in watchdog
- added dmesg new method reading based on journalctl with marks
- enabling watchdog script
Update v1056
- Fixed GPU usage percentage with abnormal value for non fully supported GPUs by NVIDIA driver. Value force to 100%
- Fixed docker GPU stats not displaying
- Added more debug for gpuStat and statusHost
- Make sure /shared-disk have proper privs
Update v1055
- Improved network speed, disk speed, stats
- Improved hard drive speed settings
- Fixed internal chat mechanizm doubled
- Fixed stale old rriMissGpuError flag causing keeping unlisted host
- Changed error "h36: GPU missing during starting new instance" from permanent to "once". Can be now cleared by user and user can list again the host
- Improved GPU resetting when GPU not fully purged by previous instance
- Fixed reading gpuCount from system by dropping all method to common lspci command
- Fixed zero sysNetSpeedJSON file causing looped speedtest on hosts
- Switched mfsManager install packages to batch mode (non-intercactive)
- Rclocal/rriStart clear GPU reservation error flagimproved include config.txt file
- Removed mapGpu2Rri. Now rriStart reads realtime GPU usage/reservation from docker daemon
- Improves GPU resetting (trying 3times)
- Changed error type to warning of rriStartGpuMiss temporary
- Fixed and improved syntax and speed of some jq queries
Instance
- Persistent Volumes Pricing Update
- New Volumes: Starting immediately, persistent storage is priced at $0.03/GB per month.
- Existing Volumes: Will remain free ($0.00/GB per month) until September 27, 2025. After this date, the new pricing of $0.03/GB per month will apply.
- Volume Resizing: Any resizing action will trigger the new pricing instantly, regardless of the original creation date.
Update v1050
- Fixed unlisting hosts because of querying NVIDIA GPU count (short time lower count of GPUs issue)
Instance
- Docker: Fixed empty ports data and other stat informations in some conditions
Instance
- Docker: Fixed display of tags in Docker templates
- VPS: Fixed price calculations in instance details
- VPS: Improved resume process for enhanced usability and reduced confusion
- API: added VPS renting methods and reorganized menu structure to support separate usage paths
Instance
- New Policy: If your account balance drops below 0, your VPS servers will be paused and you will be billed for disk usage.
- Upcoming Change: In the future, instances may be deleted if the balance remains negative for an extended period.
Instance
- Fixed: Issue with auto-resume functionality when pausing a server using RTX PRO 6000 Blackwell GPU.
Instance
- Fixed problem with resume process that involved other GPU type
Start Your GPU Cloud Journey Today
SimplePod makes GPU rentals easy. Whatever your needs, we provide the tools to harness cloud AI. Choose your GPU servers, set up your software, and launch projects in minutes. Join us today and discover how effortless AI in cloud can be.