o Hñhv6ã@sÈUddlZddlZddlmZmZddlmZddlmZm Z m Z mZmZddl Z ddlmZddl mZddlmZddlmZgZeeed<e e¡ZGd d „d ejƒZdeedefd d„ZdS)éN)Ú CollectionÚMapping)Údeepcopy)ÚAnyÚCallableÚOptionalÚoverloadÚUnion)Úoptim)Ú ShardedTensor)ÚFullyShardedDataParallelÚ__all__c@sTeZdZdZ d"deeeeje ffde jdee eeefdeejddf dd „Zd d„Zdeeeffdd „Zed#d$dd„ƒZedegefdefdd„ƒZd%deegefdeefdd„Zedeejeffdd„ƒZdeeefddfdd„Zdeeefddfdd„Zd&dd„Zdeeeffdd„Zdeeeffd d!„ZdS)'Ú_NamedOptimizeraì ``_NamedOptimizer`` takes a dict of parameters and exposes ``state_dict`` by parameter key. We replace the original key (number) in an optim to the fully qualified name (FQN) string. User can initialize the optim as they initialize a PyTorch optim, the only difference is that they also need to pass in the FQN of each parameters. Args: named_parameters (Mapping[str, Union[torch.Tensor, ShardedTensor]]): Mapping from FQN to parameter. optimizer_class (optim.Optimizer): The class of optimizer to instantiate. param_groups (Collection[Mapping[str, Any]]): `param_groups` to pass to optimizer if specified. The key of the inner map needs to be FQNs. Default: None module (nn.Module): the module whose parameters to updated by the optimizer. args: arguments to pass to the optimizer constructor. kwargs: arguments to pass to the optimizer constructor. Example:: >>> # xdoctest: +SKIP("distributed") >>> from torch import optim >>> from torch.distributed.optim import _NamedOptimizer >>> >>> # Define the named optimizer. >>> m = Model(...) >>> named_optim = _NamedOptimizer(m.named_parameters(), optim.SGD) >>> # Forward pass + backward pass. >>> named_optim.step() >>> ... >>> # Call state_dict for the named optimizer returns a FQN state_dict. >>> named_optim.state_dict() Warning: This API is still in development and subject to change. TODO: Add tutorial for _NamedOptimizer. TODO: Add documentation in the docstring for the public attributes like self.param_groups and self.named_parameters. NÚnamed_parametersÚoptimizer_classÚparam_groupsÚmoduleÚreturncOsætj d¡||_| ¡t|ƒ|_|dur|j ¡n|}||g|¢Ri|¤Ž|_||_ |dur9t |j ¡ƒ|_n3t d¡dd„|j ¡Dƒ}g} |D]} | dD]}||vr`td|›dƒ‚| ||¡qRqL| |_|jj|_dS)Nz'torch.distributed.optim._NamedOptimizerzvSince we pass in param_groups, we will use param_groups to initialize the optimizer, not all parameters of the module.cSói|]\}}||“qS©r©Ú.0ÚkeyÚparamrrú[/var/www/vscode/kcb/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.pyÚ ]óz,_NamedOptimizer.__init__..ÚparamszExpect param name z% found in param group but is missing.)ÚtorchÚ_CÚ_log_api_usage_oncerÚ_param_groups_checkÚdictrÚvaluesÚ _optimizerrÚlistÚkeysÚordered_param_keysÚwarningsÚwarnÚitemsÚ ValueErrorÚappend)ÚselfrrrrÚargsÚkwargsÚparams_for_optimizerÚparam_to_keyr'ÚgrouprrrrÚ__init__@s> ÿÿþýÿ ÿûz_NamedOptimizer.__init__cCs’|jdurE|jD]>}t|tƒsJdƒ‚d|vsJdƒ‚|d}t|tjƒr(|g}t|ƒ}|D]}t|tjƒs?tdt |¡ƒ‚q.||d<qdSdS)Núparam group must be a dictrz#param group must contain key paramsz>optimizer can only optimize Tensors, but one of the params is )rÚ isinstancer"rÚTensorr%Ú TypeErrorÚtypename)r-Úparam_grouprrrrrr!js& ÿÿÿ òz#_NamedOptimizer._param_groups_checkc sœˆj ¡}|d}‡fdd„|d ¡Dƒ}g}|D]+}‡fdd„|dDƒ}dt|ƒi}| ¡D]\}} |dkr?t| ƒ||<q1| |¡qˆ ||dœ¡S) zµ Return the ``state_dict`` of the optimizer. Instead of using number to index parameters, we will use module fully qualified name (FQN) as the key. rcsi|] \}}ˆj||“qSr©r')rÚst_keyÚ state_val©r-rrr…s ÿÿz._NamedOptimizer.state_dict..Ústatecsg|]}ˆj|‘qSrr:)rrr=rrÚ Œrz._NamedOptimizer.state_dict..r)r>r)r$Ú state_dictr*Úsortedrr,Ú_post_state_dict) r-r@rÚ ret_stateÚ ret_groupsr2Ú param_keysÚ ret_groupÚkÚvrr=rr@{s þ€z_NamedOptimizer.state_dict.ÚclosurecCódS©Nr©r-rIrrrÚstep•óz_NamedOptimizer.stepcCrJrKrrLrrrrM˜rNcCs|jj|dS)z’ Perform a single optimization step. This will call :meth:`torch.optim.Optimizer.step` on the wrapped optimizer. ©rI)r$rMrLrrrrM›scCs|jjSrK)r$r>r=rrrr>¤sz_NamedOptimizer.stater@cCsÌ|j ¡}| |¡}|d}|d}t|ƒdkrtdƒ‚t|jƒD]°\}}|| ¡vr,q!t||ƒt||ƒkrMtdt||ƒ›d|›dt||ƒ›ƒ‚|| ¡D]}\}}|||vrhtd|›d|›dƒ‚|||} t |t ƒr²t | t ƒszJ‚t| ¡ƒ} t| ¡ƒ}| |krštd |›d | ›d|›d|›ƒ‚t| ¡| ¡ƒD] \}} |j ¡ | j ¡q£qSt |tjƒrÈt | tjƒsÀJ‚| ¡ | ¡qSt| ƒ|||<qSq!|d }|d }i}|D]}t|dƒ}||t|ƒ<qÞi}|D]}g}|dD] }| |j|¡qù||t|ƒ<qñ| ¡D]N\}}||vrq||}t|ƒt|ƒkr9tdt|ƒ›d|›d t|ƒ›dƒ‚|D] }||vrMtd|›d|›dƒ‚|dkrZt||ƒ||<q;q|j |¡dS)aè Define the default behavior to load a state_dict for ``_NamedOptimizer``. Sample Code ``` my_model = MyModule() optimizer = _NamedOptimizer(my_model.named_parameters(), Adagrad) ... optim_state_dict = optimizer.state_dict() ... ... optimizer.load_state_dict(optim_state_dict) ... ``` Args: state_dict (Dict[str, Any]) : A ``state_dict`` to load into the optimizer. Note that this state dict update is performed in place. .. note:: PyTorch is using lazy init to initialize the optim states. So it is possible that there is no optim state when user call ``load_state_dict`` and for ``_NamedOptimizer`` we make it stricter that users can only call ``load_state_dict`` after the state is initialized. By doing this, we can validate the optim ``state_dict`` to be loaded. r>rzJExpects the optim to be initialized before load but found not initialized.zExpects equal length as z for parameter z but found: zExpects state z but not found.z"Expects equal number of shards as z but found z for ú/rrz"Expects equal param_group size as z for group Ú.zExpects group key z to be in group z in `state_dict` but is missing.N)r$r@Ú_pre_load_state_dictÚlenr+Ú enumerater'r&r*r5rÚlocal_shardsÚzipÚtensorÚdetachÚcopy_rr6rr%Ú_gen_param_group_keyr,Úload_state_dict)r-r@Únew_state_dictr>Ú new_stateÚidxÚ param_keyÚ state_keyr<Ú src_state_valÚ num_shardsÚnum_new_shardsÚshardÚ src_shardÚsrc_param_groupsÚnew_param_groupsÚ src_group_mapr2rEÚ new_group_mapÚ new_groupÚ group_keyÚ src_grouprGrrrr[¨sŠ ÿ$ÿÿ ÿÿýé ÿ ÿ €úz_NamedOptimizer.load_state_dictr9cCsšt|tƒs Jdƒ‚|d}t|tjƒr|g|d<nt|ƒ|d<dd„|j ¡Dƒ}|dD]}||vr7tdƒ‚|j ||¡q-|j |¡|j j|_dS)zŸ Add a param group to the :class:`_NamedOptimizer` s `param_groups`. Warning: This API is still in development and subject to change. r4rcSrrrrrrrrrz3_NamedOptimizer.add_param_group..z%some parameters are not in the moduleN) r5r"rr6r%rr*r+r'r,r$Úadd_param_groupr)r-r9rr1rrrrrmsz_NamedOptimizer.add_param_groupcCs>|j ¡D]}|jrt |¡}tj |¡|_q|jdddS)z× Run a dummy optimizer step, which allows to initialize optimizer state because we do lazy init for most optimizers. This allows doing in-place loading of optimizer state from a checkpoint. NrO) rr#Ú requires_gradrÚ zeros_likeÚautogradÚVariableÚgradrM)r-rÚtrrrÚ init_state(s €z_NamedOptimizer.init_statecCs&t|jtƒrtj|j|j|ddS|S)NT)Úis_named_optimizer)r5rÚFSDPÚoptim_state_dict_to_loadr$©r-r@rrrrR5s ÿz$_NamedOptimizer._pre_load_state_dictcCs"t|jtƒrt |j|j|¡|SrK)r5rrvÚoptim_state_dictr$rxrrrrB>sz _NamedOptimizer._post_state_dict)NN).)rINrNrK)rN) Ú__name__Ú __module__Ú__qualname__Ú__doc__rÚstrr rr6rr Ú OptimizerrrrÚnnÚModuler3r!r"r@rrMrÚfloatÚpropertyr>r[rmrtrRrBrrrrrs:/ûþýüû ø*$ h rrErcCsd t|ƒ¡S)zGConcatenate all param keys as a unique indentifier for one param group.rP)ÚjoinrA)rErrrrZFsrZ) Úloggingr(Úcollections.abcrrÚcopyrÚtypingrrrrr rÚtorch.nnr€r Ú'torch.distributed._shard.sharded_tensorrÚtorch.distributed.fsdprrvr r%r~Ú__annotations__Ú getLoggerrzÚloggerrrrZrrrrÚs 4